Page Content

Tutorials

Feedforward networks in Neural Network and its Types

What is Feedforward networks?

Feedforward neural networks have one-way connections from the input layer to one or more hidden layers to the output layer. A typical feedforward network has no feedback loops. Layers have no internal connections but are fully integrated with adjacent layers. Vector components are accepted by the input layer. Hidden units or layers process this information, either embedding higher-order data restrictions or building high-level abstractions. Network response comes from the output layer. These networks’ units use nonlinear activation functions.

Feedforward networks

Types of Feedforward Networks

Perceptrons are processing units that resemble threshold-logic units and are used in early models to make “forward” connections.

MLPs are multilayer nonlinear networks with “hidden” units. An early MLP training method added a linear layer from the network input to the output. Previous training of deep feedforward neural networks (including deep MLPs) was difficult.

In image categorization, deep convolutional neural networks have made considerable advancements. Deep multi-layer neural networks can compactly express extremely nonlinear and variable functions due to their multiple nonlinearities.

Convolutional Neural Networks (CNNs): A bottom-up approach using numerous layers of convolutions, non-linearities, and sub-sampling. Encoder-only layers form a hierarchy.

Autoencoders: Models with encoder and decoder. Encoders convert input to hidden representations, and decoders reverse them. As building blocks for deep networks. Denoising and contractive autoencoders use score matching-like learning rules.

Radial Basis Function (RBF) Networks: Layered adaptive networks with input, hidden, and output layers. Learning in these networks is like solving linear equations, providing a guaranteed rule. Weighted sums connect nodes in the hidden layer whose output is usually a nonlinear function of the input to the output layer. Traditional multi-layer perceptrons with scalar product fan-in may need two hidden adaptive layers for issues without simple connectivity, while this architecture can resolve disjunct parts in the decision space with a single hidden layer.

Deep Belief Networks (DBNs): They use a Restricted Boltzmann Machine (RBM) as the top layer (an undirected graphical model), while the lower layers are greedily trained layer-wise utilizing a feedforward structure as feature extractors.

In Feedforward Networks, training involves modifying link strengths (weights) using methods like backpropagation. This approach usually minimizes an error function, such as the network’s output minus a target output for a given input (supervised learning). Other objective functions include data reconstruction and input distribution modeling (unsupervised learning).

Advantages and Capabilities

Deep architectures: a form of feedforward network, are more economical (maybe exponentially) in terms of computational elements needed to describe specific functions compared to shallow ones. They can compactly express highly nonlinear and variable functions by composing numerous nonlinearities.

Feature Extraction: Deep networks seamlessly combine low, mid, and high-level features across multiple layers. Rich feature sets and higher-order picture structures can be learned.

Feedforward networks excel in tasks such as image classification, localization, semantic segmentation, and object recognition.
Generalization refers to a network’s ability to provide suitable outputs for inputs outside the training set, similar to interpolation between known data points.

Robustness: Denoising autoencoders learn representations that are resilient to partial input destruction.

Spatial Invariance/Attention: Spatial Transformers provide active spatial transformation of feature maps in convolutional networks, enabling learning invariance to translation, scaling, and rotation. Differentiable attention can be achieved by using multiple spatial transformers in simultaneously.

Challenges in Deep Feedforward Networks

Though theoretically advantageous, deep multi-layer neural networks have been challenging to train.

Optimization problem: Random initialization can lead to unsatisfactory local minima in standard gradient-based optimization approaches like Stochastic Gradient Descent (SGD).

Degradation Problem: Adding layers to “plain” networks without adjustments might increase training and test errors, highlighting optimization challenges that hinder deeper models’ performance. ImageNet and CIFAR-10 have this issue.

Deep plain nets trained with Batch Normalization may experience degradation due to gradient flow issues, which can make learning long-term dependencies challenging in recurrent networks and deep feedforward networks.

The design of architecture (number of layers, units, configuration) and training settings for a certain job is rather empirical.

Solutions for Issues

To simplify deep feedforward network training, several methods have been developed:

Greedy Layer-Wise Pre-training: Unsupervised training of one layer at a time using Restricted Boltzmann Machines or Autoencoders and using the learnt parameters to initialize the deep network improves local minima and generalization. Initializing weights around a suitable local minimum helps optimization.

Residual Learning Framework (ResNets): Redefining layers as learning residual functions with shortcut connections that perform identity mapping simplifies network training much deeper than before. None of these identity shortcuts add parameters or computational complexity.

Batch Normalization (BN): Accelerates deep network training by minimizing internal covariate shift. BN-trained networks have healthy backward propagated gradient norms, demonstrating that vanishing/exploding gradients are not the main cause of deep plain net degeneration.

Partially Supervised Layer-Wise Training: Combines unsupervised and supervised objectives for each layer to improve prediction when input distribution structure is not informative enough for unsupervised pre-training.

Recurrent neural networks (RNNs) may learn and perform complex data transformations over time by keeping a dynamic state that depends on the input and current state, unlike feedforward networks. Feedback links provide RNNs memory.

Index