Understanding the Maps and Hash Tables in DSA With Example

Maps and Hash Tables

In computer science, maps and hash tables are essential data structures that are essential for effectively arranging and retrieving information.

Introduction to Maps and Dictionaries

A map is an abstract data type (ADT) with “keys” and “values” that can be an associative array or dictionary. Python’s built-in dict class implements map abstraction. A map’s main benefit is its capacity to swiftly access data using a descriptive key as opposed to a numerical index. In a scenario where several nations may have the same currency value, a map might, for instance, record country names as distinct keys mapped to their respective currencies.

The following are the fundamental actions of a map M:

Key k’s related value is returned by M[k]. If the key is missing, a KeyError occurs.
Set key k to value v with M[k]. Alternatively, k is updated or a new key-value pair is inserted.
Other commonly used operations include M.keys(), M.values(), and M.items().Lack of k raises a KeyError. Len(M) to determine the number of items, M.popitem() to eliminate an arbitrary pair, M.clear() to empty the map, and v.
Calculating word frequencies in a document where words serve as keys and their counts as values is a useful use of maps.

Hash Tables: An Efficient Implementation for Maps

Hash tables are helpful data structures for maps like Python’s dict class. A hash function transforms an arbitrary key into a numerical index and points it to a “bucket” in an array where the key-value pair may be stored. In order to guarantee quick access and frequently achieve average-case constant-time performance, the objective is to distribute keys throughout the array as evenly as feasible.

Hash Functions and Hash Codes

An “integer hash code” is first calculated for a given key k by a hash function. This hash code does not have to be inside the array’s index range; it can be any integer, positive or negative. Reducing “collisions,” which happen when distinct keys generate the same hash code, is the main goal. The following “compression function” that converts the hash code to an array index will also cause a collision if hash codes collide. The cyclic-shift hash code is an illustration of a hash code computation for strings.

Collision Handling Schemes

Hash tables use a variety of “collision-handling schemes” because it is difficult to completely prevent collisions. Two or more different keys mapping to the same “bucket” or memory region within the table is known as a “collision” in hash tables. Different “collision-handling schemes” are used to resolve these conflicts and guarantee that all objects can be stored and retrieved because straight insertion into an occupied bucket is not feasible.

Separate Chaining

In this technique, every “bucket” in the internal array of the hash table contains a reference to an auxiliary data structure that contains all key-value pairs that hash to that particular index. This is usually a linked list or another unsorted map. The new item is simply added to the list of the relevant bucket when a collision occurs. Although simple, this method necessitates additional memory for these auxiliary structures. Higher load factors can degrade operations to linear time within the lists and increase collision frequency, hence it is best to keep the “load factor” (ratio of elements n to table size N, λ = n/N) below 0.9 for maximum performance.

Open Addressing

This method stores every item directly in the primary array slots of the hash table, without the need for auxiliary structures. Although this saves space, collision resolution becomes more difficult. The load factor must be no more than 1 in order to use open addressing. In the event of a collision, a “probing” technique finds the following open slot in the table:

Linear Probing: Slots A[(h(k) + i) mod N] are successively checked using linear probing, where i increases by 1 until an empty slot is discovered. “Primary clustering,” in which items form contiguous blocks, is a major disadvantage that can significantly slow down searches.

Quadratic Probing: This technique use A[(h(k) + i^2) mod N] for probing in order to mitigate primary clustering. It can result in “secondary clustering,” where keys with the same initial hash follow identical probe sequences, even though it avoids primary clustering. If N is a prime number and the table is less than half full, it ensures that an empty spot will be found.

Double Hashing: This method creates a more diverse sequence of probes to avoid clustering by using a second hash function to determine the step size for probing. A similar tactic is used by Python’s dict, which bases its probing sequence on a pseudo-random number generator.
The majority of basic map operations (insertion, deletion, and search) achieve average-case O(1) (constant) running time, demonstrating the practical great efficiency of hash tables.

Python Code Structure for Maps and Hash Tables

It show how to build upon a Map Base class to develop map and hash table classes in Python. Python’s collections are extended by this Map Base class. The hierarchical Item class with mutable mapping incorporates key-value pairs. Item class comparison methods like eq and lt use the item’s key.

These Item instances can be stored in a Python list to create a simple Unsorted Table Map. However, because they require scanning the full list, operations like get item, set item, and del item are O(n) in the worst situation.

A Hash Map Base class, which inherits from Map Base to offer similar functionality for both separate chaining and open addressing variations, is introduced for hash table implementations.

An abstract representation of the MapBase framework:

Example:

# Creating a dictionary
map_example = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}

# Inserting a key-value pair
map_example["profession"] = "Engineer"

# Accessing a value
print(map_example["name"])  

# Deleting a key-value pair
del map_example["age"]

# Iterating over key-value pairs
for key, value in map_example.items():
    print(f"{key}: {value}")

Output:

Alice
name: Alice
city: New York
profession: Engineer

(Descriptions in are the basis for this condensed Python code structure. “Data Structures & Algorithms in Python.pdf” contain complete implementations for Hash Map Base, Chain Hash Map, and Probe Hash Map.

Related Concepts

Sorted Maps: Sorted maps preserve keys in a certain order, in contrast to hash tables that disperse keys. Unlike typical hash maps, this ordering makes it possible to perform effective “inexact searches” and “range searches” (such as locating all flights within a given time range). Although updates can be expensive, sorted maps can be done using a sorted array (a “sorted search table”), utilising binary search for O(log n) search times.

Sets, Multisets, and Multimaps: Related abstractions that make use of map-like concepts are sets, multisets, and multimaps.

Set: An unarranged grouping of distinct components. Without corresponding values, elements operate as keys. Hash tables are used to implement Python’s frozen set and built-in set.
Similar to a set, a multiset (bag) allows duplicate components.
A multimap is a type of map in which a single key can represent several values (for example, a book index that links a phrase to several page references).

Page Content

Tutorials