Affine Subspace Projection for CNNs in Super-resolution

When we solve superresolution using empirical loss minimisation in a supervised setting (regressing HR images from corresponding LR images), we hope to approximate the statistically optimal behaviour.

We know the optimal behaviour would be the Bayes loss, which has the following components:

We know that the theoretically optimal superresolution function is given by the following minimisation problem:

$$ f^{*} = \operatorname{argmin}_{f} \frac {\int \ell(f(I^{LR}),I^{HR}) p(I^{LR}\vert I^{HR}) p(I^{HR}) d I^{HR}} {\int p(I^{LR}\vert I^{HR}) p(I^{HR}) d I^{HR}}

We can simplify this to the following form:

$$ f{\ell}^{*} = \operatorname{argmin}{f} \frac {\int \ell(f(I^{LR}),I^{HR}) 1(I^{LR} = A I^{HR}) p(I^{HR}) d I^{HR}} {p(I^{LR})}

Importantly, the posterior $p(I^{HR}\vert I^{LR})$ is degenerate and concentrates on the affine subspace defined by the linear system $A I^{HR} = I^{LR}$. The affine subspace is parametrised by the low-resolution image $I^{LR}$.

As affine subspaces are convex sets, we can say that

for any convex loss function $\ell$, the Bayes-optimal solution always lies in the linear subspace.

This statement is just trivially saying that if $\ell$ is convex then

$$ A f_{\ell}^{*} (I^{LR}) = I^{LR}

Subspace projection in CNNs

If we know the Bayes-optimal solution always has this property, it might make sense to restrict our hypothesis space to functions that guarantee this property. For any function $f$ can achieve this by applying an affine projection $\Pi{A,I^{LR}}$ to its output. $\Pi{A,I^{LR}}$ projects any vector to the affine subspace defined by $A$ and $I^{LR}$.

I think $\Pi_{A,I^{LR}}$ can be trivially implemented using two components:

MAP estimation with this thing

The squared loss (or PSNR) is not an ideal loss function for superresolution, because it is a pretty poor estimate of perceptual loss. In practice, if the posterior is multimodal, we can end up with a prediction that is between those modes and is in fact is very unlikely to occur under the prior. In superresolution this usually results in blurring where multiple possible HR explanations exist for the LR data.

A better objective function would be a 0-1 loss, so that the optimal solution is the mode (not the mean) of the posterior (this is only strictly true for discrete distributions). Although this also doesn't solve the perceptual loss question, at least we can be sure that our outcome always has high probability under the image prior, in other words that we are outputting plausible images. The problem with 0-1 loss is that it's hard to directly optimise. We can try to solve the following MAP optimisation problem (which is not a Bayes risk optimisation problem anymore):

$$ \operatorname{argmax}f \mathbb{E}_{I^{LR}} \log p(f(I^{LR}) \vert I^{LR}) $$

$$ \operatorname{argmax}f \mathbb{E}_{I^{LR}} \log p(f(I^{LR})) \text{ subject to the constrint } \forall I^{LR}: A f (I^{LR}) = I^{LR} $$