# Adversarial Training for Autoencoders

### Adversarial training for autoencoders

Autoencoders are a class of methods for learning compressed, distributed representations of data in an unsupervised way. They do this by composing two neural networks, the encoder network $e$ and the decoder network $d$. The input to the encoder network is the raw data $x$, and the output is the distributed representation $y$ whose dimensionality is typically lower than that of data $x$. The decoder network maps $y$ back to data space, so that the output $\hat{x}(x) = d(e(x))$ is as close to the original input as possible. The discrepancy between the original and reconstructed data is typically measured as mean squared error $E_{x\sim P} |x - \hat{x}(x)|^2$.

The problem with this approach is that the mean squared error is a very basic loss function, particularly when applied to complex domains, such as images. It is completely naive with respect to the statistics of natural images, and it does not distinguish between natural looking and completely unnatural artefacts in an image. Consider for example the unnatural compression artefacts in jpeg images: while the squared error between the image and its reconstruction may be low, the artefacts are highly unnatural checkerboard patterns and easily recognisable to a human observer. If we had a loss function that has better knowledge of statistical properties of natural images, we may be able to design decoders that introduce more natural-looking, therefore less noticable artefacts.

So here is a new loss function, similar to the adversarial loss function used in generative adversarial networks:

$$\ell(P, \theta) = \sup_{\psi} E_{x\sim P} log(f(x,\hat{x};\psi)) + log(1-f(\hat{x},x;\psi))$$