Deep Image Prior

Posted on 25/04/2019, in Paper.
  • Overview: This paper shows empirically deep neural net is possible to enhance a given image with no prior training data other than the image itself. Although not states explicit in the paper, it also connects Bayesian handcrafted recovering method with deep learning method [2] and can be viewed as a zero shot learning. The amazing result indicates the ConvNet architecture itself captures a lot of image priors.
  • Image Prior[2]: Traditional reconstruction method try to model the conditional probability p(x|x’) through the inverse probability p(x|x’) = C * p(x’|x) * p(x) , where p(x) is the prior of the image distribution. And the best reconstruction is given by argmax_x p(x|x’). This can be reformulated as argmin_z E(f_\theta(z); x’)+R(f_\theta(z)) if f_\theta(z) is a reasonable re-parameterization of x. Here z is a fixed 3D tensor with 32 feature maps and of the same spatial size as x filled with uniform noise.
  • Learned-prior/hand-crafted prior: For deep learning method, we usually assume we can learn p(x) or R(f_\theta(z)) rather than handcrafting what is natural. This makes the objective much more simple.
  • Prior of DNN: Here comes the interesting part: In an experiment the author conducted, the deep neural network are reluctant to learn noise than natural image. This is against the common narrative that the neural network learns the prior, rather than, the architecture provides a good prior.
  • Reconstruction: There are three main reconstruction tasks, namely, super-pixel, de-noising and imprinting. The author trained the network on the individual degraded image only and restore the image with best \theta. The result looks fairly good.

The results seem to be against intuition and too good to be true but I think there are still a lot remaining to be explored: 1) After so many years of architecture search, how much prior is captured by the architecture rather than the dataset (quantify?); 2) It could be the case, the individual image prior (how the patch in this image should looks like) is much more useful than the global prior (how a natural image should looks like), is there a way to use both in a consistent way? Right now the former is from data and the later in memorized in the weights — seems not the best way to fuse.

  1. R “Deep Image Prior”: deep super-resolution, inpainting, denoising without learning on a dataset and pretrained networks : MachineLearning
  2. Demystifying — Deep Image Prior – Towards Data Science