Affine Subspace Projection for CNNs in Super-resolution
When we solve superresolution using empirical loss minimisation in a supervised setting (regressing HR images from corresponding LR images), we hope to approximate the statistically optimal behaviour.
We know the optimal behaviour would be the Bayes loss, which has the following components:
- the distribution of HR images $P$, that I will call the image prior
- the likelihood that describes how the LR image is related to the HR image. This is a deterministic linear operation: 2D convolution followed by a subsampling of pixels. Thus, $I^{LR} = A I^{HR}$, where A is a tensor.
- the loss function $\ell(\hat{I}^{HR},I^{HR})$, which describes how poorly any estimate $\hat{I}^{HR}$ approximates $I^{HR}$.
We know that the theoretically optimal superresolution function is given by the following minimisation problem:
$$
f^{*} = \operatorname{argmin}_{f} \frac {\int \ell(f(I^{LR}),I^{HR}) p(I^{LR}\vert I^{HR}) p(I^{HR}) d I^{HR}} {\int p(I^{LR}\vert I^{HR}) p(I^{HR}) d I^{HR}}
$$
We can simplify this to the following form:
$$
f_{\ell}^{*} = \operatorname{argmin}_{f} \frac {\int \ell(f(I^{LR}),I^{HR}) 1(I^{LR} = A I^{HR}) p(I^{HR}) d I^{HR}} {p(I^{LR})}
$$
Importantly, the posterior $p(I^{HR}\vert I^{LR})$ is degenerate and concentrates on the affine subspace defined by the linear system $A I^{HR} = I^{LR}$. The affine subspace is parametrised by the low-resolution image $I^{LR}$.
As affine subspaces are convex sets, we can say that
for any convex loss function $\ell$, the Bayes-optimal solution always lies in the linear subspace.
This statement is just trivially saying that if $\ell$ is convex then
$$
A f_{\ell}^{*} (I^{LR}) = I^{LR}
$$
Subspace projection in CNNs
If we know the Bayes-optimal solution always has this property, it might make sense to restrict our hypothesis space to functions that guarantee this property. For any function $f$ can achieve this by applying an affine projection $\Pi_{A,I^{LR}}$ to its output. $\Pi_{A,I^{LR}}$ projects any vector to the affine subspace defined by $A$ and $I^{LR}$.
I think $\Pi_{A,I^{LR}}$ can be trivially implemented using two components:
- a final linear convolutional layer with periodically varying convolution kernels (I'll have to work this out. This layer would not be unlike our layers in the CNN anyway.
- an additive component to the final layer that is computed from $I^{LR}$ directly, which is some trivially upsampled version of $I^{LR}$ that actually lies in the correct affine space. Again, I think computing this would be a straightforward linear convolution and reshuffling.
- In essense, the convnet would now model the residual from the trivial upsampling, but the last fixed linear layer would cancel out any departure from the correct affine subspace.
MAP estimation with this thing
The squared loss (or PSNR) is not an ideal loss function for superresolution, because it is a pretty poor estimate of perceptual loss. In practice, if the posterior is multimodal, we can end up with a prediction that is between those modes and is in fact is very unlikely to occur under the prior. In superresolution this usually results in blurring where multiple possible HR explanations exist for the LR data.
A better objective function would be a 0-1 loss, so that the optimal solution is the mode (not the mean) of the posterior (this is only strictly true for discrete distributions). Although this also doesn't solve the perceptual loss question, at least we can be sure that our outcome always has high probability under the image prior, in other words that we are outputting plausible images. The problem with 0-1 loss is that it's hard to directly optimise. We can try to solve the following MAP optimisation problem (which is not a Bayes risk optimisation problem anymore):
$$
\operatorname{argmax}f \mathbb{E}_{I^{LR}} \log p(f(I^{LR}) \vert I^{LR})
$$
$$
\operatorname{argmax}f \mathbb{E}_{I^{LR}} \log p(f(I^{LR})) \text{ subject to the constrint } \forall I^{LR}: A f (I^{LR}) = I^{LR}
$$