Page Content

Tutorials

What is Hierarchical composition? and Principles of hierarchical composition

What is Hierarchical composition?

A concept in machine learning and cognitive science known as “hierarchical composition” divides complicated information or ideas into groups of smaller, lower-level elements arranged in several layers or tiers. This structure builds on more fundamental properties to allow models to learn abstract patterns. For instance, edges create forms, which in turn create objects in picture recognition. Letters combine to create words, which in turn create sentences. By recording links and dependencies at several levels of abstraction, hierarchical composition facilitates effective learning, generalisation, and interpretability. Deep learning architectures such as recursive and convolutional neural networks (CNNs) are based on it.

Goal and ideas of hierarchical composition

Learning Hierarchical characteristics: A hierarchical composition hierarchy groups information from previous layers to create complex characteristics on a broader scale. A lower layer may detect simple edges, a medium layer may join edges to produce corners or curves, and a higher layer may recognise object pieces or whole objects in image processing.

Addressing Optimisation Challenges: When initialised randomly, gradient-based optimisation often stuck in unsatisfactory solutions, making deep multi-layer neural networks hard to train. Hierarchical composition, especially greedy layer-wise unsupervised pre-training, initialises network weights at a good local minimum, simplifying optimisation and improving generalisation.

Efficiency and Generalization: Deep architectures are more efficient than shallow ones in computational elements and examples, enabling compact representation of complex functions. Structured learning improves untested data generalisation.

Disentanglement: A fundamental goal for learnt representations is “disentanglement,” where data changes are minimal compared to real-world transformations. Hierarchical composition separates data variation components to make representations more interpretable and transformable.

Implementations in Various Models

Different approaches enable hierarchical composition in deep learning architectures:

Sparse Auto-Encoders (SAEs) and Sparse Decomposition: SAEs enable unsupervised learning of hierarchical picture representations . Encoders translate input to latent (hidden) feature spaces, while decoders reconstruct input from these features.

A sparsity constraint is applied to latent feature maps, promoting parsimonious representation at each hierarchy level. This is essential for preventing trivial solutions and letting characteristics naturally form complicated structures.The models are trained unsupervised, greedy, and layer-wise, with each layer learning from the previous layer’s output.

The Deconvolutional Networks (DNs) framework implements unsupervised sparse decomposition for hierarchical picture representations. DNs optimise feature map activations to calculate features perfectly without an encoder, unlike sparse auto-encoders. DNs learn hierarchical filters for mid-level visual ideas including edge junctions, parallel lines, and curves. Sparse decomposition over the entire image is essential for learning rich features, unlike patch-based approaches.

Deep Belief Networks (DBNs): Generative models with hidden causal factors. Restricted Boltzmann Machines (RBMs) are used to greedily train each layer unsupervised.

Lower layers in a DBN extract “low-level features” while higher layers represent “abstract” concepts that explain the input. It starts with elementary notions and then moves on to abstract ones. After learning representations, the network can be fine-tuned to meet supervised training criteria, resulting in improved generalisation.

Convolutional Neural Networks (CNNs): Multiple layers of convolutions, non-linearities, and sub-sampling (pooling) enable hierarchical composition.
Pooling layers, like max-pooling, provide spatial invariance to feature placements, however this is frequently achieved over a deep hierarchy, and intermediate feature mappings may not be invariant to massive transformations.

Deep networks, such as ResNets, emphasise network depth by reformulating layers to learn “residual functions” simplifying training and enabling complicated feature learning.

Deep Convolutional Inverse Graphics Network (DC-IGN): The DC-IGN model learns an interpretable image representation that is disentangled from transformations such as rotations and lighting fluctuations. The SGVB technique is used to train many layers of convolution and de-convolution operators.

The encoder is responsible for extracting scene latent variables (graphics codes), while the decoder reconstructs the image. The model can re-render photographs with varied poses or lighting by changing specific clusters of neurones in the graphics code layer, displaying learnt disentangled and interpretable representations.

Recurrent Neural Networks (RNNs): RNNs process variable-length sequences over several stages, implicitly representing time and composite structures.

They develop context-dependent representations that generalise across item classes by learning internal representations that reflect task demands inside earlier internal states.

In language processing, RNNs can discover hierarchical category structures for words, where proximity in the representational space suggests property similarity and higher-level categories correspond to bigger regions.

Neural Turing Machines (NTMs): NTMs use a neural network controller and external memory bank to run simple algorithms.
They can learn to use content-based and location-based addressing mechanisms to interface with memory, enabling algorithms like copying and sorting to generalise to longer sequences. The process implicitly composes subroutines or operations.

Radial Basis Function Networks: Layered feed-forward models for multi-variable functional interpolation. A single hidden layer with radial basis function centres allows the network to learn non-linear correlations. This context uses single-hidden-layer topologies, yet fitting complicated surfaces to data using these functions implies a hierarchical mapping from input to output.

The Harmony Theory framework for information processing in dynamical systems views knowledge as a hierarchy of “knowledge atoms” and “representational features”.

The system tries to bring together lower-level perceptual processing and higher-level cognitive processes by integrating information across abstraction levels.

The proposed network structure connects lower-level representation nodes to higher-level knowledge atoms, creating a multi-layered hierarchy for complicated cognitive processes.

Key Challenges and Principles of hierarchical composition

Perturbation-resistant capability: Good representations can withstand partial destruction or modest, irrelevant input modifications . Denoising Autoencoders actively learn this by rebuilding clean input from corrupted versions, encouraging the model to capture more meaningful and stable structures.

Information Flow: The architecture of models like LSTMs manages error signal flow over extended time intervals, preventing vanishing or ballooning gradients. Multiplicative gate units learn to open and close “constant error carrousels” (CECs) in specific memory cells to store and safeguard information over long periods, aiding learning of sequential data dependencies.

Generalisation beyond training: Hierarchical composition that generalises beyond training data is beneficial. Learn concise internal programs or disentangled representations that encapsulate data’s generative characteristics.

The complexity of computation: Deep networks are representationally efficient, but poorly conditioned cost functions and many iterations make training them computationally difficult. To overcome these challenges, Stochastic Gradient Variational Bayes (SGVB) and residual learning frameworks make end-to-end training possible.

Index