Page Content

Tutorials

Generative Adversarial Networks and its Disadvantages

What is Generative Adversarial Networks?

A new framework for estimating generative models using an adversarial process is provided by Generative Adversarial Networks (GANs). Many of the unmanageable probabilistic calculations that are frequently present in conventional maximum likelihood estimation for deep generative models are circumvented by this novel method.

image credit to ASIMOV institute

Core Framework and Purpose of Generative Adversarial Networks

Two separate models are trained simultaneously in the Generative Adversarial Networks framework: a discriminative model ($D$) and a generative model ($G$).

In order to effectively learn to generate samples that closely resemble actual training data, the generative model ($G$) seeks to represent the data distribution. One way to think of it is as a “team of counterfeiters” attempting to produce authentic-looking counterfeit money.

The chance that a particular sample came from the training data rather than the generative model $G$ is estimated by the discriminative model ($D$). Similar to “police” attempting to identify counterfeit currency, it functions as a “adversary” to $G$.

The goal of $G$’s training process is to increase the likelihood that $D$ will make a mistake. This makes $G$ and $D$ play a minimax two-player game.

Mechanism and Training of Generative Adversarial Networks

In a typical specific scenario, multilayer perceptrons define both the discriminative ($D$) and generative ($G$) models.

Backpropagation can be used to train the complete Generative Adversarial Networks system. This is a major benefit because it aligns GANs with popular deep learning methods and eliminates the requirement for unrolled approximate inference networks or Markov chains during sample generation or training.

Training Process of Generative Adversarial Networks

Random noise ($z \sim p_z(z)$) is passed through $G(z; \theta_g)$, its differentiable function, to provide samples that map to the data space.

When given an input ($x$), $D$ returns a single scalar that indicates the likelihood that $x$ originated from the actual data distribution rather than the distribution that $G$ generated ($p_g$).

$D$ is trained to maximise the likelihood that it will properly categorise samples produced by $G$ as well as actual training instances.

Concurrently, $G$ is trained to minimise $\log(1-D(G(z))$, thereby attempting to deceive $D$ into considering generated samples to be authentic.

$V(D,G) = E_{x \sim p_{data}(x)}[\log D(x)] + E_{z \sim p_z(z)} is the value function that is created by this.[\log(1-D(G(z)))]that while $D$ maximises, $G$ minimises.

Training is actually a numerical, iterative process. The training alternates between $k$ stages of optimising $D$ and one step of optimising $G$, as opposed to optimising $D$ to completion in the inner loop, which is computationally prohibitive and can result in overfitting on small datasets. Assuming $G$ varies sufficiently slowly, this maintains $D$ close to its ideal solution. The way SML/PCD training preserves Markov chain samples in between learning stages is comparable to this method.

Worldwide Optimality: Theoretically, there is only one solution in the space of arbitrary functions where $D$ equals 1/2 everywhere and $G$ recovers the training data distribution, i.e., it is unable to discriminate between produced and actual data.

Benefits of Generative Adversarial Networks

The absence of Markov chains Markov chains are not necessary for sampling or training in Generative Adversarial Networks, in contrast to some other generative models (such as Generative Stochastic Networks or RBMs). This eliminates the complications associated with MCMC mixing.

Backpropagation Only: Only the incredibly effective backpropagation method is used to acquire gradients for training both $G$ and $D$.

No Inference Needed: Throughout the learning process, no approximative inference is required.

Leveraging Piecewise Linear Units: Because Generative Adversarial Networks do not require feedback loops during generation, they are better equipped to use piecewise linear units (such as rectifiers and maxout), which helps prevent problems with unbounded activation.

depict crisp Distributions: While Markov chain-based approaches frequently call for distributions to be somewhat “blurry” for mixing, adversarial models can depict extremely crisp, even degenerate distributions.

Statistical Advantage: By avoiding direct copying of input components into the generator’s parameters, the generator network is updated indirectly through gradients from the discriminator rather than directly with data examples.

Generative Adversarial Networks

Disadvantages and Challenges of Generative Adversarial Networks

Implicit Representation: The probability density function $p_g(x)$ is not explicitly represented. The distribution of the samples generated by the generator is implicitly defined as $p_g$.

Synchronisation Challenge: Training requires that $D$ and $G$ be in perfect coordination. “Mode collapse” (also known as “the Helvetica scenario”) can occur when $G$ maps a large number of noise values to a single data point, decreasing diversity, if $G$ is trained excessively without updating $D$.

Connection to Additional Generative Models

Generative Stochastic Networks (GSNs): GSNs build a parameterised Markov chain and expand generalised denoising auto-encoders. One significant distinction is that a Markov chain is not necessary for sampling in the adversarial networks approach.

Latent variable-based directed graphical models, such as restricted Boltzmann machines (RBMs) and deep belief networks (DBNs): During maximum likelihood estimation, these models frequently need unmanageable probabilistic calculations that Generative Adversarial Networks avoid. The gradients of undirected graphical models, such as RBMs and Deep Boltzmann Machines (DBMs), are estimated using MCMC techniques, which may have mixing issues. These models also have intractable partition functions.

Another new technique for back-propagating into a generative machine is Auto-Encoding Variational Bayes (AEVB), which is comparable to Generative Adversarial Networks in that sense. For example, AEVB uses stochastic gradient techniques to focus on effective inference and learning in directed probabilistic models with continuous latent variables and intractable posterior distributions.

Denoising Autoencoders: While noisy-contrastive estimation (NCE) uses a discriminative criterion and some techniques, such as contractive autoencoders and denoising auto-encoders, have learning rules similar to score matching applied to RBMs, Generative Adversarial Networks use a separate discriminative model instead of the generative model itself to discriminate from a fixed noise distribution.

Index