What is Restricted Boltzmann machine?
One type of undirected graphical model containing latent variables is called a restricted Boltzmann machine (RBM). They act as a substitute for latent variable directed graphical models. Deep Belief Networks (DBNs), which are generative models with several layers of hidden causal variables, are also constructed using RBMs as building blocks.

Restricted Boltzmann machine Structure and Architecture
A visible layer (v) and a hidden layer (h) make up an RBM.
Restricted Boltzmann machine, in contrast to generic Boltzmann machines, have a limited connectivity pattern: no connections exist within the visible layer or the hidden layer, only between visible and hidden units. This architectural restriction is essential since it makes inference easier and more manageable.
The formula P(v,h) = (1/Z) * exp(h’Wv + b’v + c’h) defines the joint probability distribution of an RBM between two layers (for example, gl-1 and gl in a DBN). In this formula, Z is a normalisation constant, b and c are bias vectors for the visible and hidden units, respectively, and W is the weight matrix that connects the layers. The energy function is represented by the expression -(h’Wv + b’v + c’h).
Learning and Training of Restricted Boltzmann machine
- The partition function (Z) and its gradient are intractable, which is the main obstacle to training Restricted Boltzmann machine(and general Boltzmann Machines). The log-likelihood gradient is difficult to calculate directly because of its intractability.
- The Contrastive Divergence (CD) algorithm is commonly used to train RBMs in order to get around this.
- CD uses a modest number of Gibbs sampling steps (often k=1) to approximate the gradient of the log-likelihood.
- For an RBM, Gibbs sampling alternates between sampling visible units given hidden units and sampling hidden units given visible units. These sample steps are made simpler by the fact that the conditional distributions P(v|h) and Q(h|v) factorise because there are no intra-layer connections. P(v_k=1|h) = sigm(b_k + sum_j W_jk h_j) and Q(h_j=1|v) = sigm(c_j + sum_k W_jk v_k) are examples of sigmoidal activation functions that they usually follow.
- ◦ Training seeks to reduce contrastive divergence, which is a stand-in for increasing log-likelihood and is frequently tracked by reconstruction error.
- As part of DBNs, RBMs are frequently trained in an unsupervised, greedy layer-wise fashion. This entails fine-tuning the entire network using a task-specific criterion after pre-training each layer individually, with each layer gaining valuable representations from the input. By giving top layers better initialisations, this technique empirically aids in deep network optimisation.
Capabilities and Characteristics of Restricted Boltzmann machine
Abstract visual representations can be learnt by RBMs.
Through the use of Gaussian units in the input layer, for example, they can adapt their energy function to handle continuous-valued inputs.
The model’s ability to represent non-Gaussian data may be constrained if hidden layers are constructed with exclusively Gaussian units, as this can result in entirely linear mean-field propagation. It can be advantageous to combine Gaussian with other unit types.
RBM-based models have a different training criterion than methods like Auto-Encoding Variational Bayes (AEVB), which employ back-propagation with an objective function based on data reconstruction and variational bounds.
Despite RBMs’ strength, a major issue with their learning is that Monte Carlo Markov Chain (MCMC) techniques might have inadequate mixing.
Comparison to Other Models
RBMs serve as the core modules for Deep Belief Networks (DBNs).
When utilised as building blocks for greedy layer-wise training of deep networks, auto-encoders in particular, denoising auto encoders can produce results that are equivalent to RBMs and share similar learning rules.
For improved performance, each layer must have an unsupervised component (such as RBM training) in contrast to a fully supervised greedy layer-wise technique.
With symmetrically connected binary units, general Boltzmann machines are a larger class of neural networks that can use a stochastic decision rule to avoid local minima. A condensed version of these that makes inference easier are RBMs.
conclusion
despite the computational difficulties caused by their intractable partition functions, RBMs are crucial parts of the creation of deep learning architectures, especially for unsupervised pre-training, because they can learn significant hierarchical representations.