What is autoencoders ?
Neural networks called autoencoders are mostly used for unsupervised learning, particularly for tasks like data denoising, feature learning, and dimensionality reduction.
The composition of an autoencoders
Three primary components make up an autoencoder:
The encoder
- compresses the incoming data into a representation with fewer dimensions, known as the “bottleneck” or “latent space.”
- discovers the input’s key characteristics.
Bottleneck or Latent Space:
- The input’s compressed knowledge representation.
Decoder:
- reconstructs the compressed form of the original data.
- Tries to make the output as close as possible to the input.
Components and Architecture
Encoders and decoders make up an autoencoder.
Encoders: The encoder section of the network converts input vector x to a hidden representation, y. The mapping is usually a deterministic function, y = fθ(x), with weights and biases, indicated as θ. In advanced probabilistic autoencoders like Variational Autoencoders (VAEs), the encoder qφ(z|x) generates a Gaussian distribution of probable latent code z values based on input x. Typically, this probabilistic encoder uses MLP outputs to define the approximate posterior distribution’s mean and standard deviation.
Hidden Layer: The autoencoder learns a compressed intermediate representation y (or z in VAEs) called the Hidden Layer / Latent Representation. This representation should capture stable structures, relationships, and invariances in the input data to be a “good” representation of the primary variations. This may be a 200-dimensional “graphics code” used to regulate image production in some systems.
Decoders : The decoder component maps the learned hidden representation y (or z) to a “reconstructed” vector, z’, in the original input space. Defined as z’ = gθ'(y), this mapping has parameters θ’. In VAEs, the probabilistic decoder pθ(x|z) generates a distribution of probable x values based on a latent coding z. This decoder can be implemented as a Multi-Layer Perceptron (MLP), such as a Bernoulli MLP for binary data or a Gaussian MLP for real-valued input, employing a single hidden layer to derive distribution parameters from latent variables.
Training Methods
The main goal of autoencoder training is to minimize input-to-output error.
Objective Function: Various loss functions are utilized. Traditional squared error L(x, z) = ||x-z||^2 or reconstruction cross-entropy LIH(x, z) (as a negative log-likelihood) are commonly employed for binary data or probabilities.
Backpropagation: To train autoencoders, gradient-based optimization methods like stochastic gradient descent (SGD) are used to change the parameters (weights and biases) of both the encoder and decoder. The loss function gradients are computed using backpropagation.
Layer-wise Pre-training: Deep autoencoders (networks with numerous hidden layers) have historically proved difficult to train from random initialization because gradient-based optimization often yields poor solutions. Greedy layer-wise unsupervised pre-training works. This method trains each network layer successively as an autoencoder, using its output as input for the next layer. This initializes weights around a reasonable local minimum, improving internal representations and generalization.
Adjustments: The multi-layered autoencoder can be optimized after unsupervised layer-wise pre-training. A final output layer (e.g., logistic regression) is added on top for supervised tasks, and gradient descent is used to fine-tune the network to a supervised training criterion (e.g., cross-entropy for classification).
Important Variants and Ideas
The basic autoencoder design has been modified to achieve several goals:
Variational Autoencoders (VAEs):
VAEs, originally the Auto-Encoding Variational Bayes (AEVB) algorithm, provide efficient inference and learning in directed probabilistic models with continuous latent variables.
VAEs map input to a Gaussian distribution in latent space, unlike typical autoencoders. The encoder qφ(z|x) approximates the intractable posterior distribution pθ(z|x), whereas the decoder pθ(x|z) generates samples x from latent variables z.
The reparameterization strategy (z = gφ(ε,x)) is essential for VAE training, allowing gradients to flow through the stochastic sampling process and optimize with standard gradient methods.
The objective function for training VAEs includes an expected negative reconstruction error and a Kullback-Leibler (KL) divergence term to regularize the encoder’s latent distribution near a prior (e.g., standard Gaussian). Other autoencoder regularization methods require nuisance hyperparameters, whereas this regularization does not.
An example of an application: Deep Convolutional Inverse Graphics Network (DC-IGN): This VAE model uses convolutional layers and max-pooling in the encoder and unpooling and convolution in the decoder. DC-IGN learns an interpretable “graphics code” for images, disentangled from pose and illumination changes. A novel training method enforces disentanglement by altering one latent variable at a time in mini-batches, allowing the model to learn a 3D rendering engine that can re-render images with varied positions or lighting from a single input.
Denoising Autoencoders (DAE):
DAEs are trained to recreate unscathed input x from corrupted input x̃.
The corruption procedure, such as putting a preset proportion of input components to zero via “salt noise” on images, is only used during training.
This architecture encourages the autoencoder to train robust feature detectors by capturing data structure and relationships, rather than only learning an identity mapping, making it more noise-resistant.
DAEs learn data invariances, resulting in useful representations for applications like supervised classification. We found that stacking DAEs for initialization improves classification performance over autoencoder stacking without noise.
Autoencoders Benefits and Purpose
Autoencoders offer many machine learning benefits:

Representation Learning: Exceptional representation learning skills, resulting in hierarchical internal representations of input data.
Dimensionality Reduction: Autoencoders reduce dimensionality by compressing high-dimensional data into a lower-dimensional latent space, enabling visualization and simplifying downstream processes.
Feature Extraction: Hidden representations can be employed for classification or object recognition, outperforming existing methods. Especially denoising autoencoders learn features that mimic smart, separate feature detectors.
Generative Models: VAEs use generative models to generate different data samples by sampling from latent space and running them through the decoder. MLPs can also be used to generate samples in Generative Adversarial Networks (GANs) without Markov chains.
Improved Deep Network Optimization: Autoencoder layer-wise unsupervised pre-training simplifies optimization and improves generalization compared to random initialization.
Limitations of Autoencoders
Autoencoders, especially in their simple forms or bigger structures, have some drawbacks:
Sufficiency of Reconstruction Criterion: Minimizing reconstruction error without regularization might not be enough for effective representation learning. Regularization methods like VAEs and Denoising Autoencoders are essential.
Deep architecture optimization challenges: Layer-wise pre-training helps, but tuning deep autoencoders is difficult. Deep autoencoders may not generalize well or perform worse than shallower networks without pre-training or regularization.
GAN Mode Collapse and Implicit Density: Using MLPs as generators can lead to restricted sample variety and mode collapse. Additionally, MLP-based generators do not explicitly reflect the probability density pg(x).
Robustness in Identity Mapping: Autoencoders may learn the identity function if not adequately limited or regularized (e.g., by adding noise as in DAEs), resulting in poor representation learning.
Computational Cost for Exact Inference: While parallelizable, autoencoder-like models constructed for “exact inference” (without an explicit encoder for the latent space) can be computationally costly during inference.
Autoencoders, especially VAEs and DAEs, are powerful unsupervised learning techniques that extract meaningful data representations for many modern machine learning applications.