Page Content

Tutorials

Variational Autoencoders-Probabilistic Autoencoders

In machine learning, Variational Autoencoders (VAEs) are a potent class of generative models. They are made to produce new data that resembles the input data they have been trained on.

What is Variational Autoencoder?

Variational Autoencoder
Variational Autoencoder

An artificial neural network design known as a Variational Autoencoder (VAE) blends probabilistic techniques with the capabilities of conventional autoencoders. VAEs learn a continuous probabilistic representation of the underlying properties of the input data, in contrast to typical autoencoders that just compress and reconstruct data. VAEs are capable of creating fresh, realistic data samples that closely mirror the original input to this special method. They function as deep learning models that use changes of the input data to create new data.

The idea of latent space the collective latent variables of a particular dataset is crucial to comprehending VAEs. Unobservable underlying variables that affect the distribution of data are known as latent variables. By using dimensionality reduction to compress data into a lower-dimensional space while preserving important information, autoencoders including VAEs model this latent space. Only the pertinent dimensions are retained once background noise is reduced because to this compression.

Variational Autoencoders (VAEs) are useful for approximate inference and learning in directed probabilistic models with continuous latent variables with intractable posterior distributions and big datasets. They solve the problem of efficient inference and learning in such models.

History of VAEs

Diederik P. Kingma and Max Welling originally presented variational autoencoders in their 2013 publication “Auto-Encoding Variational Bayes.” They were introduced around the same time as diffusion models and other generative AI methods like Generative Adversarial Networks (GANs). The “reparameterization trick,” a vital machine learning technique that permits randomness to be utilised as a model input without sacrificing differentiability, was also made prominent by this paper.

How VAEs Work

How does Variational Autoencoders work
How does Variational Autoencoders work

An encoder, a latent space, and a decoder are the three primary parts of a VAE.

Encoder (Input Understanding)

  • The encoder learns the main characteristics of the incoming data, such as text or graphics.
  • The encoder generates two vectors for each feature, rather than a single fixed value or point in the latent space: the mean (μ), which represents a central value, and the standard deviation (σ), also known as log variance, which quantifies the range of values. Instead of defining a single number, these values define a range of possibilities.
  • Every point in a complicated dataset is efficiently mapped into a distribution in the latent space by the encoder network. The encoder can use neural network topologies like convolutional or fully connected neural networks.

Latent Space (Increasing Unpredictability)

  • A continuous probabilistic representation makes up the latent space in a VAE.
  • The VAE selects a random point within the range specified by the mean and standard deviation supplied by the encoder, as opposed to encoding the input as a single fixed point. The model can produce slightly varied versions of the data to this randomisation, which is essential for producing fresh, realistic samples.
  • The latent space must be full (any sampled point gives relevant content) and continuous (nearby points yield similar content when decoded) for generating purposes. Making sure the latent space has a typical normal (Gaussian) distribution is a popular method for doing this.

Decoder (Building New Data or Rebuilding Existing Data)

  • After selecting a random sample from the latent space, the decoder tries to recreate the initial input.
  • The decoder may generate fresh data that is comparable to but distinct from what it saw during training because the encoder gives it a range (a distribution).
  • Although noise is less frequently provided at this point in practice, the decoder maps from the latent space back to the input space likewise in accordance with a distribution.
  • For VAEs, the training goal is to minimize a loss function made up of two primary terms:
    • Reconstruction Loss: This gauges how well the decoder uses the latent representation to recreate the original input data. Mean-squared error (MSE) and binary cross-entropy are common reconstruction loss functions. The model is optimised to correctly reconstruct input x given latent z by minimising this term.
    • Divergence loss: Leibler-Kullback (KL) The difference between a predetermined prior distribution (usually a standard normal distribution p(z)) and the learnt approximate posterior distribution q(z|x) is measured by the regularization term known as “divergence loss.”
  • In order to avoid overfitting and provide smooth interpolation for the creation of fresh data, minimizing KL divergence promotes a continuous and complete latent space.
  • Maximizing the Evidence Lower Bound (ELBO) is the main goal. In order to maximise ELBO, one must simultaneously minimise the KL divergence between the approximation and true posterior distributions and maximise the log-likelihood of the observed data.

Features of VAEs

  • Generative Model: One kind of generative model that can generate fresh data is the VAE.
  • Probabilistic Latent Space: Continuous data representation is made possible by the latent space’s characteristic probability distributions (mean and variance) as opposed to discrete points.
  • VAEs are composed of an encoder network that maps input to latent space and a decoder network that reconstructs data from latent space, just like other autoencoders.
  • Loss Function Components: Reconstruction error and KL divergence are minimised during training.
  • ELBO Maximisation: The Evidence Lower Bound (ELBO) is maximised during the training phase.
  • Reparameterization Trick: By removing unpredictability from the model parameters, this method makes gradient-based optimization possible.

Advantages of VAEs

  • Generative Capability: By producing fresh data samples that resemble the training set, VAEs enable varied outputs and seamless interpolation.
  • Structured Latent Space: To comprehend the underlying data manifold, they acquire a continuous and structured latent representation.
  • Regularisation: By preventing overfitting and guaranteeing a smooth distribution of the latent code, the KL divergence term serves as a regularisation.
  • Training Ease: Because of their robust optimisation process, VAEs are typically simpler to train than Generative Adversarial Networks (GANs).
  • Versatile Learning: Although VAEs were first created for unsupervised learning, they have also shown success in semi-supervised and supervised learning tasks.
  • Non-linear Relationships: Unlike certain conventional dimensionality reduction techniques like PCA, VAEs and other autoencoders are able to describe non-linear relationships between various variables.

Disadvantages of VAEs

  • Blurry Outputs: Compared to models like GANs, classic VAEs have a major disadvantage in that they often create outputs that are less realistic and blurry. This problem stems from the way they compute loss functions and restore data distributions.
  • Initial Drawbacks: Although some enhancements have been suggested to solve these issues, early VAE models included drawbacks such inadequate disentangling effects, unsuitable generation effects, and unsuitability for discrete data.

Also Read About Advantages and Disadvantages of Active Learning

Challenges for VAEs

VAEs still encounter difficulties in spite of their capabilities:

  • Blurry Output Quality: As previously noted, producing blurry images is a recurring problem, especially when contrasted with GANs.
  • Suitability for Various Data Types: Although conventional VAEs work well with continuous data, specialized variations like Binded-VAE are needed to adapt them for discrete data (such as particular chemical structures).
  • Optimisation for Particular Data: VAE architecture must be tailored for particular material data in applications such as materials design, and objective functions must precisely represent the intended material qualities.
  • Relationships and High Dimensionality: VAEs must be able to process and produce high dimensional data, including crystal structures, and model relationships between various attributes.
  • Generated materials must be validated in material design to make sure they have the desired qualities and are stable under practical circumstances.

Types/Variations of VAEs

Types/Variations of VAEs
Types/Variations of VAEs

To overcome these drawbacks and broaden their uses, researchers have created a number of VAE variations:

Conditional VAEs (CVAEs)

By adding label or conditional information into the latent space, these let the user control particular outputs. For example, they can be used to generate just “4s” or “7s” images from MNIST.

β-VAE

This variant forces manifold disentanglement by adding a weighted KL divergence term to promote the finding and interpretability of factorised latent representations.

VAE-GAN

A hybrid model called VAE-GAN blends GANs and VAEs. To lessen blurriness in generated images, it substitutes a GAN’s discriminator network for the VAE’s reconstruction loss.

VAE-GAN is a hybrid generative model that combines GAN and VAE benefits. It uses VAE’s latent space encoding and decoding with GAN’s adversarial training to provide better data. Compared to employing VAEs or GANs alone, this combination enables the creation of data that is more reliable and high-fidelity.

Variational Recurrent Neural Network (VRNN)

In applications such as speech separation, the Variational Recurrent Neural Network (VRNN) extends VAEs to handle sequential data and capture temporal and stochastic information in time-series observations.

Reweighted Autoencoder Variational Bayes (RAVE)

A time-lagged autoencoder called Reweighted Autoencoder Variational Bayes (RAVE) is used to describe nucleation and investigate slow dynamics in condensed matter systems.

Binded-VAE

A shortcoming of conventional VAEs that handle continuous data is addressed by bound-VAE, which is specifically made for producing extremely sparse discrete datasets.

Supramolecular VAE (SmVAE)

Used in material science to encode molecular representations into a continuous latent space for automated design processes, including the creation of Metal-Organic Frameworks (MOFs).

NVAE, VQ-VAE, JointVAE, VQ-VAE-2, FactorVAE, β-TCVAE, Sliced Wasserstein VAE, Radon Sobolev VAE, MMD-VAE, WAEs, and Kernelized VAE are some further versions.

Also Read About Deep networks and Deep Networks Challenges

Applications

VAEs are adaptable generative models that have many uses:

Image Synthesis and Generation

One of the most popular uses is picture synthesis and generation, which enables VAEs to produce new images that resemble learnt samples.

Anomaly Detection

By learning a system’s typical behaviour, VAEs are able to recognise anomalous data points, which makes them appropriate for spotting fraud or system errors.

Data Compression

VAEs may compress high-dimensional data into a lower-dimensional latent space, just like conventional autoencoders.

Personalized Medicine

Using a patient’s genetic composition and medical history, VAEs could be utilised to create novel medications or customised medical interventions.

Design of New Materials

They are promising instruments for creating new materials with special qualities, such more robust aviation materials, effective solar cells, or even molecules that resemble drugs.

Creative AI

Depending on user choices, VAEs can produce original works of art, entertainment, realistic pictures, films, or music.

Scientific Research

They can provide models of physical systems or new scientific data for study, including pictures of proteins or galaxies.

Creation of Synthetic Data

When real-world data is scarce or privacy is an issue, developers utilize VAEs to create synthetic datasets for software development and testing.

Text and Video Creation

Although they are constrained by the size of the training data, VAEs are able to create new text in a preferred style and new video sequences.

Language processing

They enable chatbots and digital assistants speak naturally by identifying and comprehending intricate links between data pieces.

Dimensionality Reduction and Feature Learning

VAEs are useful for finding relevant mathematical relationships within datasets and for more effectively describing data with fewer variables.

Analysis of Time Series Data

VAEs excel at processing and deciphering time series or sequential data, including financial data, biological signals, and Internet of Things data feeds.

In order to improve performance and data quality, VAEs will need to be integrated with other generative AI algorithms and reinforcement learning frameworks, as well as further research into better latent space representations, expressiveness, and interpretability.

Index