Page Content

Tutorials

What is Autoencoders? and Components of Autoencoders

What are autoencoders ?

Neural networks called autoencoders are mostly used for unsupervised learning, particularly for tasks like data denoising, feature learning, and dimensionality reduction. An unusual type of Neural network called an autoencoder is made to effectively condense input data into a small format before reconstructing it to closely match the original. Although they don’t require manually labelled training data, they are regarded as self-supervised learning models as they use the original input as their own “ground truth” to compare their output to. Because of this, they are extremely useful for unsupervised machine learning tasks like data compression and dimensionality reduction.

Autoencoder Architecture

Encoders and decoders make up an autoencoder.

Autoencoders
Autoencoders

Three primary parts usually make up an autoencoder’s architecture, which cooperates to compress and then reconstruct data:

Encoder

This component is in charge of lowering the dimensionality of the incoming data while maintaining crucial information in order to compress it into a smaller, easier-to-manage format.

Input Layer: where the network receives the original data, including text characteristics, pictures, and other structured data.

Hidden Layers: These layers gradually reduce the quantity and complexity of the input data by applying a number of modifications (weights and activation functions). The data is compressed in a standard autoencoder by the encoder’s hidden layers, which contain progressively fewer nodes than the input layer.

Output (Latent Space): The latent representation, or encoding, is a compressed vector that is created by the encoder. This vector helps to eliminate noise and redundancy by condensing the key characteristics of the incoming data. The original picture is deformed in this compressed version.

Bottleneck (Latent Space / Code)

The most condensed form of the input data is represented by this, the network’s smallest layer. The network is forced to priorities the most important features because it acts as an information bottleneck. Better generalization and more effective data encoding result from the model’s ability to learn the input’s fundamental structure and important patterns with its condensed representation. It serves as both the encoder network’s output layer and the decoder network’s input layer. The key characteristics of the input data are captured by the latent code.

Decoder

This part reconstructs the compressed representation back into the original data form from the latent space.

Hidden Layers: The latent vector is gradually expanded back into a higher-dimensional space by these layers. The decoder tries to reconstruct the shape and features of the original data through a series of modifications. The number of nodes in the hidden layers of the decoder usually increases with time.

Output Layer: The rebuilt output, which attempts to closely match the original input, is produced by the last layer. The ability of the encoder-decoder pair to reduce the input-output difference during training determines the quality of the reconstruction. A lossy reconstruction of the original image is what the decoded image is.

How do Autoencoders Work (Training)

How Autoencoders Work
How Autoencoders Work

An autoencoder’s main objective during training is to reduce the reconstruction loss, which quantifies the degree to which the reconstructed output differs from the original input. To accomplish this minimization, the network uses backpropagation to update its weights. By doing this, it has the ability to identify and preserve the key characteristics of the input data, which are stored in the latent space.

The type of data being processed determines which loss function is used:

Mean Squared Error (MSE): It calculates the average squared discrepancies between the input and the reconstructed data and is frequently applied to continuous data.

Binary Cross-Entropy: It determines the probability difference between the original and reconstructed output when applied to binary data (0 or 1 values). Sigmoid activation is used at the output to generate 0 or 1 values for pictures with binary pixel values of 0 or 255.

Latent variables, which are hidden or random factors that essentially influence how data is distributed while not being readily observable, are what autoencoders learn to find. The latent space is the set of these latent variables for a certain input.

Efficient Representations and Constraints

Several strategies are used to motivate autoencoders to learn compact and significant features, resulting in more effective representations:

Keep Small Hidden Layers: By restricting the size of each hidden layer, the network is forced to focus on the most important characteristics, which eliminates duplication and makes efficient encoding possible.

Regularizations: Penalty terms are added to the loss function using methods such as L1 or L2 regularization. By forbidding overly high weights, this helps avoid overfitting and guarantees that the model learns usable and broad representations.

Denoising: Random noise is purposefully introduced into the input during training in denoising autoencoders. In order to improve resilience and concentrate on core, noise-free characteristics, the model then learns to eliminate this noise during reconstruction.

Tuning Activation Functions: Sparsity, or activating a small number of neurons at a time, can be promoted by modifying activation functions. By doing this, the network is forced to record just the most pertinent characteristics, reducing the complexity of the model. Since autoencoders are able to capture intricate non-linear connections, non-linear activation functions such as the sigmoid function are commonly employed.

Types of Autoencoders

Autoencoders are adaptable neural networks with distinct advantages and disadvantages that may be tailored for a range of tasks:

Vanilla / Undercomplete Autoencoders: The primary purpose of these simple autoencoders is dimensionality reduction. The bottleneck capacity is fixed, and its hidden layers have fewer nodes than their input and output layers. By imposing severe data compression, the bottleneck keeps the network from overfitting and guarantees that it only learns the most crucial properties for reconstruction. However, if the encoder/decoder capacity is too great for complicated inputs, they can still learn the identity function.

Denoising Autoencoders (DAEs): DAEs learn to eliminate noise and recreate clean data after being trained to handle noisy or contaminated inputs. DAEs are trained against the original, uncorrupted data rather than the noisy input, in contrast to the majority of autoencoders. This helps the network to learn essential aspects rather than just memorising the input. DAEs have served as the basis for cutting-edge picture creation models such as Stable Diffusion and are helpful in cleaning noisy image and audio data.

Sparse Autoencoders (SAEs): These autoencoders apply a sparsity constraint, which means that only a small number of neurons are active at once, even if they may contain more hidden units than input characteristics. A sparsity penalty can be added to the loss function, activation functions can be changed, or certain hidden units can be zeroed to regulate this sparsity. This eliminates the possibility of overfitting and permits increased capacity in both the encoder and the decoder. KL divergence or L1 regularization can be used to enforce sparsity.

Contractive Autoencoders: They lessen overfitting and better capture important information since they are made to be insensitive to small changes (noise) in input data. Using ideas like the Jacobian matrix and Frobenius norm, this is accomplished by introducing a regularization term during training that penalizes the network for altering its output in response to insufficiently significant input changes.

Variational Autoencoders (VAEs): These generative models create variants of learnt representations to create fresh sample data after learning compressed representations of data as probability distributions. VAEs train continuous latent variable models, which express latent characteristics as a mean (\mu) and standard deviation (\Ϭ) vector, in contrast to other autoencoders that learn discrete latent space models. These generative AI models are utilized for tasks like producing realistic text or graphics because of their stochastic encoding, which permits interpolation and random sampling. VAEs sample random latent vectors for creation using the reparameterization method.

Convolutional Autoencoders (CAEs): These employ convolutional neural networks (CNNs) and are intended for image processing. Convolutional layers are used by the encoder to extract features, while deconvolution (upsampling) is used by the decoder to reconstruct the picture. They are perfect for picture reduction, denoising, and style transfer because of their exceptional ability to capture spatial connections in pixels.

Applications of Autoencoders

There are several real-world uses for autoencoders in a variety of domains and data kinds:

Data Compression / File Compression: Reducing the dimensionality of incoming data is their main function. By preserving important information while compressing photos, movies, or music, this facilitates quicker data viewing and sharing.

Dimensionality Reduction: The speed and efficiency of computation can be increased by using the encodings that autoencoders learn as input for bigger neural networks. Features that are pertinent to other tasks are extracted.

Anomaly Detection: Autoencoders measure the reconstruction loss of new data against a “normal” or “genuine” example to detect abnormalities, fraud, or faults. An abnormality may be indicated by a high reconstruction error.

Image/Audio Denoising: Without requiring human intervention, denoising autoencoders are excellent at eliminating unnecessary artefacts or corruption from noisy picture and audio data.

Image Reconstruction / Inpainting: By learning to rebuild missing pixels based on neighbouring ones, autoencoders can fill in the gaps in a picture. Images can also be coloured with them.

Image Transformation: Autoencoders may be used to upsample and downsample data, as well as to alter visuals, such as turning black and white photos into colour.

Generative Tasks: For generative tasks like producing fresh, realistic visuals (like OpenAI’s initial DALL-E model) or even developing chemical structures for drugs, VAEs and adversarial autoencoders (AAEs) are very useful.

Information Retrieval: Content-based picture retrieval systems can make use of autoencoders.

Disadvantages of Autoencoders

Autoencoders are helpful, however they have certain drawbacks:

Memorizing Instead of Learning Patterns: Autoencoders’ capacity to generalize to new data is diminished when they remember the training data instead of discovering significant underlying patterns.

Reconstructed Data Might Not Be Perfect: Particularly with noisy inputs or if the model architecture isn’t sophisticated enough to capture every detail, the result may be hazy or distorted.

Requires Large Datasets and Good Parameter Tuning: To function successfully, they usually require a lot of data and meticulous adjustment of hyperparameters (such as latent dimension size, learning rate, number of layers, and nodes per layer). Weak feature representations may result from inadequate data or improper tuning.

Example

Putting in Place a Basic Autoencoder for Compressing Image Files

The files offer a useful illustration of how to build a basic autoencoder that compresses and reconstructs MNIST greyscale pictures using TensorFlow and Keras.

Objective: Convert pictures into a 64-dimensional latent vector, then use this compressed form to recover the original image.
Actions Taken:

Steps Involved:

  • Import Necessary Libraries: Together with the MNIST dataset loader, important libraries include Matplotlib for visualization, NumPy for numerical calculations, and TensorFlow/Keras for neural network construction.
  • The handwritten digits from the MNIST dataset are loaded. The data is rearranged to meet the model’s input specifications (28×28 greyscale pictures) and pixel values are normalized to a range.
  • The shapes of the training and testing data are 60000, 2828, and 10,000, respectively.
  • Define a Basic Autoencoder Class: Keras Sequential models for the encoder and decoder are used to generate a SimpleAutoencoder class.
    • After flattening a 28x28x1 input (greyscale picture), the encoder compresses it to the latent_dimensions (e.g., 64) using a Dense layer with ReLU activation.
    • The decoder reshapes this latent_dimensions vector to 28x28x1 after expanding it back to 28*28 pixels using a Dense layer with Sigmoid activation. When working with pixel values between 0 and 1, the output layer employs sigmoid activation.
  • Compile and Fit the Autoencoder: The Mean Squared Error (MSE) loss function and Adam optimizer are used to construct the model. After that, it is trained for a predetermined batch size (e.g., 256) and a predetermined number of epochs (e.g., 10).
  • Visualize Original and Reconstructed Data: Following training, the encoder is used to obtain the compressed representations of the test pictures, and the decoder is used to reconstruct them. For visual comparison, these original and rebuilt pictures are then plotted next to each other.
  • Even when the rebuilt pictures have a little blur, the visualization usually demonstrates that the autoencoder successfully captures important features.

The use of PyTorch to create a linear autoencoder for the MNIST dataset is briefly mentioned in another example.nn for torch and linear layer definition.optimal.Adam for nn optimization. The loss function is MSELoss.

Training Methods of Autoencoders

The main goal of autoencoder training is to minimize input-to-output error.

Objective Function: Various loss functions are utilized. Traditional squared error L(x, z) = ||x-z||^2 or reconstruction cross-entropy LIH(x, z) (as a negative log-likelihood) are commonly employed for binary data or probabilities.

Backpropagation: To train autoencoders, gradient-based optimization methods like stochastic gradient descent (SGD) are used to change the parameters (weights and biases) of both the encoder and decoder. The loss function gradients are computed using backpropagation.

Layer-wise Pre-training: Deep autoencoders (networks with numerous hidden layers) have historically proved difficult to train from random initialization because gradient-based optimization often yields poor solutions. Greedy layer-wise unsupervised pre-training works. This method trains each network layer successively as an autoencoder, using its output as input for the next layer. This initializes weights around a reasonable local minimum, improving internal representations and generalization.

Adjustments: The multi-layered autoencoder can be optimized after unsupervised layer-wise pre-training. A final output layer (e.g., logistic regression) is added on top for supervised tasks, and gradient descent is used to fine-tune the network to a supervised training criterion (e.g., cross-entropy for classification).

Index