#

Hash Tables

O(1) average insert/lookup. Key building block for most interview problems.

Hash Tables

A hash table maps keys to values using a hash function. Average O(1) insert, delete, and lookup.

How It Works

  1. Hash function converts key → integer index
  2. Index into an underlying array (bucket)
  3. Collision resolution: chaining (linked list at each bucket) or open addressing

Polynomial Rolling Hash

The classic hash function used for strings:

hash = hash * 31 + c

Multiplying by 31 (a prime) before adding each character ensures order matters"abc" and "bac" get different hashes. Without the multiply, any anagram would hash to the same value.

Collision Resolution

Chaining (Linked Lists per bucket)

  • Worst case O(n) if all keys hash to same bucket
  • Can upgrade buckets to BSTs → worst case O(log n), used by Java 8+

Open Addressing (Linear Probing)

  • All keys stored directly in the array — no chaining
  • On collision, check next slots sequentially: i, i+1, i+2... (mod capacity)
  • Problem: clustering — consecutive filled slots grow, increasing probe time
  • Improvements: quadratic probing or double hashing reduce clustering

Resizing & Load Factor

Hash tables must resize as they fill up:

| Language | Growth factor | Resize threshold (load factor) | |----------|--------------|-------------------------------| | Java | 2× | 0.75 | | Python | ~1.3–2× | 0.7–0.8 |

Resizing wastes some space but keeps average operations O(1).

BST as Alternative

You can implement a hash map using a binary search tree instead of an array:

  • Lookup: O(log n) vs O(1) average for array-based
  • Benefit: keys are ordered → enables range queries and finding neighbours
  • Tradeoff: worse lookup, but useful when you need ordered iteration

Complexity

| Operation | Average | Worst (all collisions) | |-----------|---------|------------------------| | Insert | O(1) | O(n) | | Delete | O(1) | O(n) | | Lookup | O(1) | O(n) | | Space | O(n) | O(n) |

Python Built-ins

# dict — the Python hash table
d = {}
d["key"] = "value"     # insert O(1)
val = d.get("key")     # lookup O(1), None if missing
"key" in d             # membership O(1)
del d["key"]           # delete O(1)

# Counter — frequency map
from collections import Counter
freq = Counter("abracadabra")  # {'a': 5, 'b': 2, ...}

# defaultdict — no KeyError on missing keys
from collections import defaultdict
graph = defaultdict(list)
graph["a"].append("b")

Classic Interview Patterns

Frequency counting

def has_duplicate(arr):
    seen = set()
    for x in arr:
        if x in seen:
            return True
        seen.add(x)
    return False

Two Sum

def two_sum(nums, target):
    seen = {}                    # value -> index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Group anagrams

from collections import defaultdict

def group_anagrams(words):
    groups = defaultdict(list)
    for w in words:
        key = tuple(sorted(w))   # canonical form
        groups[key].append(w)
    return list(groups.values())

Key Interview Tips

  • Reach for a dict or set first when you need fast lookup
  • Always ask: "can I trade O(n) space for O(1) lookup?"
  • frozenset or tuple can be used as dict keys (immutable/hashable)
  • Python set is a hash table without values — ideal for membership tests
  • O(1) average is not guaranteed — pathological inputs can degrade to O(n)