Hash Tables
O(1) average insert/lookup. Key building block for most interview problems.
Hash Tables
A hash table maps keys to values using a hash function. Average O(1) insert, delete, and lookup.
How It Works
- Hash function converts key → integer index
- Index into an underlying array (bucket)
- Collision resolution: chaining (linked list at each bucket) or open addressing
Polynomial Rolling Hash
The classic hash function used for strings:
hash = hash * 31 + c
Multiplying by 31 (a prime) before adding each character ensures order matters — "abc" and "bac" get different hashes. Without the multiply, any anagram would hash to the same value.
Collision Resolution
Chaining (Linked Lists per bucket)
- Worst case O(n) if all keys hash to same bucket
- Can upgrade buckets to BSTs → worst case O(log n), used by Java 8+
Open Addressing (Linear Probing)
- All keys stored directly in the array — no chaining
- On collision, check next slots sequentially: i, i+1, i+2... (mod capacity)
- Problem: clustering — consecutive filled slots grow, increasing probe time
- Improvements: quadratic probing or double hashing reduce clustering
Resizing & Load Factor
Hash tables must resize as they fill up:
| Language | Growth factor | Resize threshold (load factor) | |----------|--------------|-------------------------------| | Java | 2× | 0.75 | | Python | ~1.3–2× | 0.7–0.8 |
Resizing wastes some space but keeps average operations O(1).
BST as Alternative
You can implement a hash map using a binary search tree instead of an array:
- Lookup: O(log n) vs O(1) average for array-based
- Benefit: keys are ordered → enables range queries and finding neighbours
- Tradeoff: worse lookup, but useful when you need ordered iteration
Complexity
| Operation | Average | Worst (all collisions) | |-----------|---------|------------------------| | Insert | O(1) | O(n) | | Delete | O(1) | O(n) | | Lookup | O(1) | O(n) | | Space | O(n) | O(n) |
Python Built-ins
# dict — the Python hash table
d = {}
d["key"] = "value" # insert O(1)
val = d.get("key") # lookup O(1), None if missing
"key" in d # membership O(1)
del d["key"] # delete O(1)
# Counter — frequency map
from collections import Counter
freq = Counter("abracadabra") # {'a': 5, 'b': 2, ...}
# defaultdict — no KeyError on missing keys
from collections import defaultdict
graph = defaultdict(list)
graph["a"].append("b")
Classic Interview Patterns
Frequency counting
def has_duplicate(arr):
seen = set()
for x in arr:
if x in seen:
return True
seen.add(x)
return False
Two Sum
def two_sum(nums, target):
seen = {} # value -> index
for i, n in enumerate(nums):
complement = target - n
if complement in seen:
return [seen[complement], i]
seen[n] = i
return []
Group anagrams
from collections import defaultdict
def group_anagrams(words):
groups = defaultdict(list)
for w in words:
key = tuple(sorted(w)) # canonical form
groups[key].append(w)
return list(groups.values())
Key Interview Tips
- Reach for a dict or set first when you need fast lookup
- Always ask: "can I trade O(n) space for O(1) lookup?"
frozensetortuplecan be used as dict keys (immutable/hashable)- Python
setis a hash table without values — ideal for membership tests - O(1) average is not guaranteed — pathological inputs can degrade to O(n)