Kohonen Networks
According to Teuvo Kohonen, Kohonen Networks are a novel self-organizing mechanism. Kohonen networks, sometimes referred to as Self-organizing Maps (SOM) or Self-organizing Feature Maps (SOFM), are a class of neural networks that are mostly employed for dimensionality reduction and unsupervised grouping. By shifting the locations of cluster centers according to how closely they resemble training patterns, they allow data points to self-organize into clusters. This method simplifies the interpretation and visualization of complicated datasets by mapping high-dimensional data onto a lower-dimensional grid, usually two-dimensional.

What are Kohonen Networks?
An unsupervised artificial neural network that groups data points into clusters is called a Kohonen Network. Its primary objective is to maximize the distance between data in various clusters by grouping observations into a “lattice” of “boxes” (clusters). A numerical measure of mapping error cannot be employed directly since, in contrast to traditional neural network models, the proper output is not determined a priori. Rather, network settings for a particular application are determined by the learning process. Because observations in the same cluster are more similar to one another than those in different clusters, Kohonen networks frequently produce findings that are comparable to those produced by k-means clustering.
History

History of Kohonen Networks
Teuvo Kohonen introduced the idea of Kohonen Networks in 1982. In the 1980s, he created this kind of unsupervised learning network. Early neural network models, especially those pertaining to associative memory and adaptive learning, served as the foundation for the algorithm’s development. Explaining the spatial organization of brain activities, particularly those seen in the cerebral cortex, was a major driving force behind its development. Introducing a system model made up of two interdependent subsystems, a competitive neural network that carries out the “winner-take-all” function and a plasticity-control subsystem that alters the local synaptic plasticity of neurons during learning was Kohonen’s crucial innovation. Speech recognition was the initial use case for SOMs. They have since gained popularity in data exploration and analysis.
How Kohonen Networks Work?
Kohonen Networks uses a competitive learning methodology. Usually, the network is made up of a 2D grid of neurons with an input layer and an output layer (the Kohonen layer). A weight vector with the same dimensionality as the input data is associated with each neuron in the output layer.

The algorithm proceeds through several stages:
These are the essential steps of the basic algorithm:
- Initialization: Usually, the neurons’ (or nodes’) weight vectors are initialized at random or by selecting samples from the distribution of input data.
- Training Iteration: Until convergence is achieved, this process is continued for a predetermined number of iterations. For every vector that is used as input:
- Competition (Best Matching Unit (BMU) Search): The neuron that has the weight vector closest to the input vector is identified for a certain input vector. Euclidean distance is a common distance measure used to calculate this. It is known as the Best Matching Unit (BMU) neuron.
- Cooperation: The BMU is located, and then its nearby neurons are located. The neighborhood is specified by a neighborhood function (such as the Gaussian function), which measures how much a neuron is thought to be the winning neuron’s neighbor. With increasing distance from the BMU, the value of this function diminishes, and it is usually symmetrical around the winning neuron. Usually, this neighborhood’s radius begins out large and gets smaller with time.
- Adaptation (Weight Update): The weights of the BMU and its neighboring neurons are adjusted to move closer to the input vector. The degree of adjustment decreases over time (controlled by a learning rate) and with the grid distance from the BMU. A common formula for weight update is Wv(s+1) = Wv(s) + θ(u,v,s) * α(s) * (D(t) – Wv(s)) where α(s) is the learning rate and θ(u,v,s) is the neighborhood function.
- Convergence: The input data structure is reflected in a topologically ordered map created when the weight vectors of the neurons stabilize. When the map doesn’t change much or when new inputs map to the same neurons as in the prior iteration, this happens.
After training is finished, the node whose weight vector is closest to the input observation can be used to classify more input data during the mapping step.
Kohonen Networks Architecture

The input layer and the output layer, sometimes referred to as the map layer or feature map, are the two primary layers that make up a SOM, which is usually a single-layer neural network.
- Input Layer: Represents the features of the data. The number of input nodes is determined by the dimensions of the input vector.
- Output Layer (Map Space): Consists of units called nodes or neurons, which are arranged in a typically two-dimensional grid. Common arrangements include hexagonal or rectangular grids. Each node in the map space is associated with a weight vector that has the same dimension as the input vectors, reflecting the node’s position in the input space. While the nodes within the map space remain fixed, their weight vectors move towards the input data during training. There are generally no interconnections among the computational nodes (output neurons), but intra-layer connections define a topology, and lateral feedback connections are used for competition, often described by a Mexican hat function.
Node numbers and configurations are predetermined according to the objectives of data analysis. If N is the number of data points, then sqrt(N) neurons can be used as a rough estimate for lattice size.
Characteristics and Features
Key characteristics of Kohonen Networks include:
Unsupervised Learning: Without clear output labels or predetermined right responses, they discover patterns and structures in data.
Self-Organization: Data points “self-organize” into clusters, adjusting their positions iteratively.
Competitive Learning: Neurons compete to be the “winner” (BMU) for each input pattern.
Cooperation: Cooperative adjustments result from the winning neuron’s influence on a topological neighborhood of other neurons.
Synaptic Adaptation: Excited neurons adjust their synaptic weights to become more responsive to similar input patterns.
Topology Preservation: The network maintains the distance and proximity relations (topology) of the samples by defining an ordered mapping from a high-dimensional input space to a low-dimensional destination space, which is typically 2D. This indicates that adjacent locations on the Kohonen layer are mapped to samples with comparable input vectors.
Non-linear Relationships: They are able to take into account relationships that are not linear in the data.
No Prior Cluster Information: They don’t need to know how many clusters there are in the data beforehand.
Robust Clustering: The learning method performs well in clustering.
Kohonen Networks Types
Several extensions and changes have been proposed, especially to enable supervised or semi-supervised learning, even though the terms “Kohonen Network” and “Self-organising Map (SOM)” are frequently used interchangeably to refer to the fundamental unsupervised model:
Self-Organizing Map (SOM): This is the foundational architecture, typically a 2D array of neurons fully connected to the input layer, designed for unsupervised nonlinear mapping that preserves topology.
Supervised Kohonen Networks (SKNs): These networks apply the concept of SOM to issues like regression that have a dependent vector. The input map (Xmap) and the output map (Ymap) are combined to create a combined XYmap during the training phase. Concatenating input and desired output vectors creates training data, which enables the learning process to use information from both input and output variables in a fully supervised way.
Counterpropagation Neural Networks (CP-NNs): Considered one of the first Kohonen SOM-based architectures used for supervised learning, CP-NNs combine features from both supervised and unsupervised learning. They consist of an input layer (identical to an unsupervised Kohonen SOM) and an output (Grossberg) layer. The input map is built in an unsupervised stage, and then a supervised stage uses the sample coordinates in this low-dimensional space to predict an output. This makes them semi-supervised.
X-Y Fused Networks (XYFs): XYFs employ a “fused” similarity metric to choose the best neuron, resolving scaling problems that SKNs have. The weighted sum of the similarity between the input vector and the Xmap neurones and the output vector and the Ymap neurones is known as the fused similarity. X-space similarity is given priority early on, and both X and Y terms are then balanced as the weighting parameter changes over time.
Bidirectional Kohonen Networks (BDKs): BDKs use a similarity measurement that incorporates both input (Xmap) and output (Ymap) data, just like XYFs do. But during training, the input and output layers are updated alternately. Xmap neurones are adapted based on similarity (first dominated by the dependent vector) in one pass, and Ymap weights are updated based on similarity (first dominated by the input vector) in a second reverse pass for each epoch.
Additional expansions that have been suggested include adaptive array structures (often known as “growing maps”), information-based SOM models, generative topographic maps, and adjustments for examining dynamic patterns or subspace relations.
Advantages
Kohonen Networks provides a number of advantages:
Effective for Clustering and Dimensionality Reduction: They are widely used for these tasks, simplifying complex datasets.
Visualization: They provide data with an efficient representation, making it possible to see and investigate hidden correlations and patterns in high-dimensional data.
Interpretability: Compared to some other neural network architectures, Kohonen networks are easier to grasp due to their weight maps (which display variable distributions) and top-map (the 2D outcome graph), which help with clustering results and variable contributions.
Non-linear Modelling: Non-linear relationships in data can be accommodated by them.
Robust Performance: Their learning approach helps to provide stable clustering results.
No Predefined Cluster Count: The basic SOM does not require an a priori knowledge of the number of clusters.
Rapid Convergence: Architectures based on supervised Kohonen are renowned for their quick convergence.
Handling High Dimensions: Unlike some other models, such as Multi-Layer Feedforward Networks, they are able to use input and output vectors with greater dimensions.
Generalization: In CP-NNs, even when an unknown sample maps to a neuron that was not directly excited during training, the network can still provide an output because a window function is used during learning.
Statistical Property Maintenance: To improve the representational accuracy of the data points, variants such as KNIES can explicitly incorporate and preserve global statistical features (such as the mean).
Improved Organization: Compared to more straightforward “Winner Takes All” approaches, they produce a more unified network with neurons that more accurately reflect the distribution of input data and a greater convergence rate.
Disadvantages and Challenges
Despite their advantages, Kohonen Networks have certain limitations:
Computational Cost: Due to the substantial computational work needed to construct the models, they are typically more computationally “expensive” than more straightforward techniques like k-nearest neighbor algorithms. This might render them impracticable for very big datasets.
Scalability Issues: Compared to more scalable techniques like k-nearest neighbor or EM algorithm clustering, the additional processing effort required to fit Kohonen networks is frequently not justified by superior insights for very big datasets (such as hundreds of thousands of observations).
Representational Ability (Supervised Variants): Because the number of neurons in the 2D layer directly limits the number of dependent vectors that can be modelled, supervised Kohonen-based architectures lack representational capacity. Thus, very vast networks are needed to solve complex issues.
Variable Scaling: To maintain the topology of the concatenated map, it is essential for SKNs to make sure that the input and output variables are scaled appropriately. The dependent vector’s contribution to the overall topology may be insignificant if variables are not properly “block-scaled,” particularly when there are a large number of input variables and several output variables.
Mathematical Complexity: Only the one-dimensional instance has been thoroughly examined, demonstrating how intricate the mathematical theory underlying SOMs is. Mathematically, SOMs are regarded as “ill-posed” problems.
Initialization Sensitivity: Variations in initial weight values, training vector sequences, and learning parameters can lead to distinct learning processes. Careful initialization and parameter adjustment are necessary to produce an “optimal” map; this sometimes entails experimenting with several random initializations and choosing the one with the lowest quantization error.
“Empty Units”: The equivalent output map of semi-supervised CP-NNs may have “empty units” if a neurone is not activated during the training phase.
Risk of Mis-linking: Neurons may link with specific values before groups are appropriately identified, which could necessitate repeating the learning process with alternative starting weights.
Practical Considerations for Application
Form of the Array: For visual inspection, a hexagonal grid is usually used. The width and height of the array should closely match the principal components or other major dimensions of the input data distribution.
Scaling of Vector Components: This is a subtle yet critical issue. There is no straightforward guideline for the best rescaling before training because data pieces frequently reflect variables of various kinds and scales. One helpful tactic is to normalize all of the input variables (e.g., scaling them to a range or ensuring that their variances are identical).
Training with Limited Samples: Samples can be employed repeatedly (e.g., cyclically or randomly permuted) if there are few accessible training samples but many training steps are required for statistical accuracy.
Quality of Learning: Several runs with various random weight initializations are recommended in order to acquire the “least ambiguous” or “optimal” map. Then, as a performance index, the map with the lowest quantization error (mean of ||x – mc|| over training data) can be chosen.
Applications
Applications for Kohonen Networks are numerous and include:
Data Mining and Analysis: For complicated datasets’ dimensionality reduction and unsupervised clustering.
Speech and Language Processing: Used to solve language theory issues such as identifying word borders, learning phonemes, and comprehending category-specific naming deficits, as well as in speech recognition analysis (mapping speech spectres).
Biomedical and Health Sciences: Used in object identification, EEG data interpretation for emotion identification, and balance and gait analysis to comprehend how the vestibular, proprioceptive, and visual sensory systems affect sway.
Chemometrics: Used to solve regression issues using Supervised Kohonen Networks, such as QSARs and QSPRs.
Cybersecurity: Used to identify network virus intrusions; optimized versions, such as SOS-Kohonen, have lower false alarm rates and higher detection rates than other algorithms.
Robotics and Path Planning: Applied to the development of algorithms for road travel distance prediction and collision-free robot manipulator movement.
Financial Sector: Used in banking research to forecast ATM cash demand and predict bank distress.
Optimization Problems: Adapted to solve combinatorial issues with great accuracy, such as the Euclidean Travelling Salesman Problem (TSP).