Page Content

Tutorials

Deep networks and Deep Networks Challenges

What are Deep networks?

Artificial neural networks having several layers between the input and output are referred to as deep networks, or deep neural networks (DNNs). These networks are the cornerstone of deep learning because of their capacity to extract hierarchical representations and intricate patterns from input.

Core Concepts and Advantages of Deep networks


Hierarchical Representation Learning: In Hierarchical Representation Learning, deep networks seamlessly incorporate low, mid, and high-level features across several layers. They use additional operations to create a more abstract (hidden) representation of observed patterns at each layer. The ability to learn interpretable and disentangled representations is essential for deep learning.

Computational Efficiency: The computing elements needed to express certain operations can be exponentially more efficient in deep systems, according to complexity theory. This efficiency includes training example count.

Generalization: Deep networks strive for improved generalisation performance, allowing them to respond to inputs not explicitly encountered during training. Initialising hidden layers with more meaningful input representations often does this.

Deep networks

Deep Networks Challenges

Historically, deep networks training was difficult:

Optimisation Issues: Gradient-based optimisation from random initialisation generally finds unsatisfactory local minima.

Exploding/Vanishing Gradients: Error signals moving backward in time tend to rapidly decline or erupt, hindering convergence, especially with lengthy time delays.

Degradation Problem: Accuracy can rapidly saturate and decline as network depth grows, resulting in increased training error, seemingly unrelated to overfitting. This shows that current solvers struggle to optimise deep “plain” networks.

Solutions and Architectures of Deep networks

Innovative deep networks topologies and training methodologies have been developed to tackle these challenges:

The Greedy Layer-Wise Unsupervised Pre-training technique: presented for Deep Belief Networks (DBNs), incorporates unsupervised pre-training of one layer at a time, followed by fine-tuning the entire network.

Implements weight initialisation around a local minimum for enhanced optimisation and generalisation.

Restricted Boltzmann Machines (RBMs) or auto-encoders can be used to build layers utilising this idea.

Denoising Autoencoders actively train strong representations by recovering clean input from faulty input. Classification performance has improved with this method.

ResNets introduces “shortcut connections” for identity mapping, allowing layers to learn a residual mapping F(x) := H(x) – x instead of directly fitting the intended underlying mapping H(x).

This approach simplifies optimisation for deep networks, enhancing accuracy and addressing degradation issues.

Identity shortcuts do not increase computational complexity or parameters. World-class ResNets have won classification, detection, and localisation competitions with depths up to 152 layers.

Gated Recurrent Neural Networks (RNNs):

RNNs are dynamic models that can handle variable-length sequences. Long Short-Term Memory (LSTM) units and Gated Recurrent Units (GRUs) were created to overcome the vanishing/exploding gradient problem in RNNs.

LSTMs enforce constant error flow through “constant error carrousels” and rely on multiplicative gate units to protect memory from irrelevant inputs and outputs. RNNs can learn and capture sequence dependencies with these units.

Neural Turing Machines (NTMs):

Enhance RNN capabilities by integrating external memory resources.

NTMs, like Turing Machines, use “heads” to manage memory read and write operations.

NTMs can infer basic algorithms (e.g., copying, sorting, associative recall) from examples and generalise beyond training data with this architecture.

Introduce VAEs, a stochastic variational inference and learning algorithm (AEVB), for efficient approximate inference in directed probabilistic models with continuous latent variables.

The study used a neural network to approximate an intractable posterior distribution and a decoder network for stochastic observation reconstruction.

The Deep Convolutional Inverse Graphics Network (DC-IGN) is a VAE that learns a 3D rendering engine by separating out-of-plane rotations and lighting variations while learning an interpretable image representation.

Generative Adversarial Networks (GANs):

GANs train generative models (generator G) by competing with discriminative models (discriminator D) in a two-player minimax game.
The generator learns to generate samples from the required data distribution, while the discriminator learns to differentiate between real and generator-generated data.

A major benefit is that both models may be trained using backpropagation without the necessity for Markov chains for sampling.

Spatial Transformer Networks (STNs):

Introduce a learnable module, the Spatial Transformer, for conventional neural network topologies like CNNs.

This module enables spatial manipulation of data within the network, learning invariance to transformations like translation, scaling, rotation, and warping.

The spatial transformer uses a differentiable localisation network, grid generator, and sampler for end-to-end training.

Related Ideas and Differences of Deep networks

Comparison to Shallow Networks: Deep networks offer a more compact representation than shallow architectures like Support Vector Machines or neural networks with only one hidden layer, which can be inefficient for representing complex functions and may require an exponential number of computational elements. Deep networks without pre-training or residual connections perform worse in experiments.

Radial Basis Function (RBF) Networks: These layered networks give a guaranteed learning rule, unlike multi-layer perceptrons, as their nonlinear map is determined by linear algebra, especially for output weights. They resolve discontinuous regions in decision space with one hidden layer, unlike multi-layer perceptrons, which require two.

Hopfield Networks: Although not a “deep networks” in the multi-layered feedforward sense, Hopfield Networks are a forerunner or alternative paradigm. Instead of Hopfield models, autoencoders were used for denoising. These networks have content-addressable memory and emergent collective computation, where memories are stable local minima in phase space flows. Strong back-coupling and asynchronous “on or off” neurones characterise them. They can withstand small details and neurone breakdowns.

Index