Hash Tables

O(1) average insert/lookup. Key building block for most interview problems.

Hash Tables

A hash table maps keys to values using a hash function. Average O(1) insert, delete, and lookup.

How It Works

Hash function converts key → integer index
Index into an underlying array (bucket)
Collision resolution: chaining (linked list at each bucket) or open addressing

Polynomial Rolling Hash

The classic hash function used for strings:

hash = hash * 31 + c

Multiplying by 31 (a prime) before adding each character ensures order matters — "abc" and "bac" get different hashes. Without the multiply, any anagram would hash to the same value.

Collision Resolution

Chaining (Linked Lists per bucket)

Worst case O(n) if all keys hash to same bucket
Can upgrade buckets to BSTs → worst case O(log n), used by Java 8+

Open Addressing (Linear Probing)

All keys stored directly in the array — no chaining
On collision, check next slots sequentially: i, i+1, i+2... (mod capacity)
Problem: clustering — consecutive filled slots grow, increasing probe time
Improvements: quadratic probing or double hashing reduce clustering

Resizing & Load Factor

Hash tables must resize as they fill up:

| Language | Growth factor | Resize threshold (load factor) | |----------|--------------|-------------------------------| | Java | 2× | 0.75 | | Python | ~1.3–2× | 0.7–0.8 |

Resizing wastes some space but keeps average operations O(1).

BST as Alternative

You can implement a hash map using a binary search tree instead of an array:

Lookup: O(log n) vs O(1) average for array-based
Benefit: keys are ordered → enables range queries and finding neighbours
Tradeoff: worse lookup, but useful when you need ordered iteration

Complexity

| Operation | Average | Worst (all collisions) | |-----------|---------|------------------------| | Insert | O(1) | O(n) | | Delete | O(1) | O(n) | | Lookup | O(1) | O(n) | | Space | O(n) | O(n) |

Python Built-ins

# dict — the Python hash table
d = {}
d["key"] = "value"     # insert O(1)
val = d.get("key")     # lookup O(1), None if missing
"key" in d             # membership O(1)
del d["key"]           # delete O(1)

# Counter — frequency map
from collections import Counter
freq = Counter("abracadabra")  # {'a': 5, 'b': 2, ...}

# defaultdict — no KeyError on missing keys
from collections import defaultdict
graph = defaultdict(list)
graph["a"].append("b")

Classic Interview Patterns

Frequency counting

def has_duplicate(arr):
    seen = set()
    for x in arr:
        if x in seen:
            return True
        seen.add(x)
    return False

Two Sum

def two_sum(nums, target):
    seen = {}                    # value -> index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Group anagrams

from collections import defaultdict

def group_anagrams(words):
    groups = defaultdict(list)
    for w in words:
        key = tuple(sorted(w))   # canonical form
        groups[key].append(w)
    return list(groups.values())

Key Interview Tips

Reach for a dict or set first when you need fast lookup
Always ask: "can I trade O(n) space for O(1) lookup?"
frozenset or tuple can be used as dict keys (immutable/hashable)
Python set is a hash table without values — ideal for membership tests
O(1) average is not guaranteed — pathological inputs can degrade to O(n)