### Uncertainty Propagation in Deep Neural Network Using Active Subspace

Posted on 21/10/2019, in Paper.**Overview**: Monte Carlo estimates of the uncertainty of a prediction need to efficiently sample the inputs and evaluate the model to get the first and second moment of the output. This paper adopt the active subspace method proposed by Constantine, Dow and Wang 2014, and build the entire workflow of propagating the input uncertainty in DNN.**Activate Subspace**: Denote the DNN as $f(x)$, and we try to find a orthogonal matrix $S_{d\times r}$, such that: \begin{equation} f(x) \approx g(S^T x) \end{equation} Then the active space is defined as $span(S)$. Here we can propose the active subspace by an eigenvalue decomposition of the outer product of the gradient: \begin{equation} W\Lambda W^T = C = \int \nabla f(x) \nabla f(x)^T \pi_x(x) dx \end{equation} , here $\pi_x(x)$ is the empirical distribution of x. Then we can work on the space spanned by the first $d$ eigenvectors of $C$ if they captured the most variance.**Response Surface**: Substituting the projection we define the response surface as: \begin{equation} RS(x_r) = g(x_r) \approx f(x) \end{equation}**Estimate Output Distribution**: Now we can still sample $x$ from $\pi_x$ but project it onto subspace $S$ to effectively reduce the computation cost.**Result**: Their experiment on MNIST shows the first 1-2 eigenvectors captured the pixels inputs space really well, and the output uncertainty tends to have a liner relation with the input uncertainty for various noising level. They also showed the MC sampling first two momentum represent the true ones well.

**Review Questions**

- Q1. What is matrix $C$ capturing and how is that being able to capture the active subspace?

**Comments**

- The eigen-decomposition seems to capture only the first order effect at an aggregating level. Not sure how this going to be extended to non-linear case. Student came to me trying to replace the decomposition with an auto-encoder.
- The same framework should be able to work with any model compression techniques: As long as the fixed cost of finding the active subspace/distilling the small model overweight the multiple forward evaluation.