Skip to main content

Hash Tables

1. Introduction

Definition

A hash table (hash map) is a data structure that maps keys to values using a hash function to compute an index into an array of buckets. It provides average-case O(1)O(1) time for insert, delete, and search.

The Core Idea

Given a key kk, compute index=h(k)modm\mathrm{index} = h(k) \bmod m, where hh is the hash function and mm is the table size. Store the key-value pair at this index.


2. Hash Functions

Properties of a Good Hash Function

  1. Deterministic: The same key always produces the same hash
  2. Efficient: Computed in O(1)O(1) time
  3. Uniform distribution: Maps keys evenly across all indices
  4. Minimal collisions: Different keys rarely hash to the same index

Common Hash Functions

Division Method

h(k)=kmodmh(k) = k \bmod m

Choosing mm. Avoid powers of 2 (patterns in keys align with binary structure). Choose mm to be a prime not close to a power of 2.

warning

warning the hash function will map many keys to the same bucket.

Multiplication Method

h(k)=m(kAmod1)h(k) = \lfloor m \cdot (k \cdot A \bmod 1) \rfloor

where AA is a constant 0<A<10 \lt{} A \lt{} 1 (Knuth suggests A=LB51RB◆◆LB2RB0.618A = \frac◆LB◆\sqrt{5} - 1◆RB◆◆LB◆2◆RB◆ \approx 0.618).

Advantage: Works well with any value of mm.

Polynomial Rolling Hash (for strings)

h(s)=(i=0n1s[i]pi)modmh(s) = \left(\sum_{i=0}^{n-1} s[i] \cdot p^i\right) \bmod m

where pp is a prime (typically 31 or 37) and mm is a large prime or 2642^{64}.

def polynomial_hash(s, p=31, m=10**9 + 7):
h = 0
for ch in s:
h = (h * p + ord(ch)) % m
return h

3. Collisions

Definition

A collision occurs when two distinct keys k1k2k_1 \neq k_2 hash to the same index: h(k1)=h(k2)h(k_1) = h(k_2).

Theorem (Pigeonhole Principle). If a hash table has mm buckets and stores more than mm keys, at least one collision must occur.

Proof. By the pigeonhole principle: nn keys mapped to mm buckets where n>mn \gt{} m means at least one bucket contains n/m2\lceil n/m \rceil \geq 2 keys. \square


4. Collision Resolution: Chaining

Method

Each bucket contains a linked list of all key-value pairs that hash to that index.

class ChainedHashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]

def _hash(self, key):
return hash(key) % self.size

def insert(self, key, value):
index = self._hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))

def search(self, key):
index = self._hash(key)
for k, v in self.table[index]:
if k == key:
return v
return None

def delete(self, key):
index = self._hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
del self.table[index][i]
return

Complexity Analysis

Theorem. Under simple uniform hashing (each key is equally likely to hash to any of the mm buckets, independently), the expected length of a chain for nn keys and mm buckets is α=n/m\alpha = n/m (the load factor).

Proof. By linearity of expectation, the expected number of keys in any particular bucket is n(1/m)=n/mn \cdot (1/m) = n/m. \square

OperationAverage caseWorst case
SearchO(1+α)O(1 + \alpha)O(n)O(n)
InsertO(1)O(1)O(1)O(1)
DeleteO(1+α)O(1 + \alpha)O(n)O(n)

Theorem. Chaining requires O(n+m)O(n + m) total memory.

Proof. mm bucket headers plus nn nodes, each storing a key, value, and pointer. \square


5. Collision Resolution: Open Addressing

Method

All key-value pairs are stored in the table itself (no linked lists). When a collision occurs, the algorithm probes for the next available slot.

Linear Probing

h(k,i)=(h(k)+i)modmh(k, i) = (h'(k) + i) \bmod m

where i=0,1,2,i = 0, 1, 2, \ldots is the probe sequence.

class LinearProbeHashTable:
def __init__(self, size):
self.size = size
self.keys = [None] * size
self.values = [None] * size
self.deleted = [False] * size

def insert(self, key, value):
i = 0
while i < self.size:
index = (hash(key) + i) % self.size
if self.keys[index] is None or self.deleted[index]:
self.keys[index] = key
self.values[index] = value
self.deleted[index] = False
return
if self.keys[index] == key:
self.values[index] = value
return
i += 1
raise Exception("Table full")

def search(self, key):
i = 0
while i < self.size:
index = (hash(key) + i) % self.size
if self.keys[index] is None:
return None
if self.keys[index] == key and not self.deleted[index]:
return self.values[index]
i += 1
return None

Clustering in Linear Probing

Theorem. Linear probing suffers from primary clustering: consecutive occupied slots form clusters, and new insertions are more likely to land in larger clusters, making them grow even faster.

Proof of clustering. Consider a cluster of length LL. The probability that the next insertion lands in this cluster is (L+1)/m(L + 1)/m (one more than the cluster length, since the slot after the cluster also triggers probing into the cluster). When an insertion lands in the cluster, the cluster grows by at least 1. This positive feedback means large clusters grow disproportionately, leading to degraded performance. The expected number of probes for an unsuccessful search grows rapidly as the load factor approaches 1 (vs O(1)O(1) for uniform hashing).

More formally, after inserting nn keys into a table of size mm with linear probing, the expected number of probes for an unsuccessful search is approximately:

12(1+LB1RB◆◆LB(1α)2RB)\frac{1}{2}\left(1 + \frac◆LB◆1◆RB◆◆LB◆(1 - \alpha)^2◆RB◆\right)

where α=n/m\alpha = n/m is the load factor. As α1\alpha \to 1, this grows to \infty. \square

Quadratic Probing

h(k,i)=(h(k)+c1i+c2i2)modmh(k, i) = (h'(k) + c_1 i + c_2 i^2) \bmod m

Eliminates primary clustering but may cause secondary clustering (keys with the same initial hash follow the same probe sequence).

Theorem. If the table size mm is prime and the load factor α<0.5\alpha \lt{} 0.5, quadratic probing will always find an empty slot.

Double Hashing

h(k,i)=(h1(k)+ih2(k))modmh(k, i) = (h_1(k) + i \cdot h_2(k)) \bmod m

Uses a second hash function h2h_2 to determine the probe step. Eliminates both primary and secondary clustering.

Requirement: h2(k)0h_2(k) \neq 0 and h2(k)h_2(k) is coprime with mm. Common choice: h2(k)=1+(kmod(m1))h_2(k) = 1 + (k \bmod (m - 1)).


6. Load Factor and Resizing

Definition

α=nm\alpha = \frac{n}{m}

where nn is the number of stored keys and mm is the table size.

Resizing (Rehashing)

When α\alpha exceeds a threshold (typically 0.7 for chaining, 0.5 for open addressing), allocate a new table of size 2m\approx 2m and rehash all keys.

Theorem. Resizing maintains O(1)O(1) amortised time per operation.

Proof. Similar to dynamic array analysis. The total cost of nn insertions with resizing at powers of 2 is dominated by the last resize: k=0logn2kO(1)=O(n)\sum_{k=0}^{\log n} 2^k \cdot O(1) = O(n). Amortised per insertion: O(1)O(1). \square


7. Comparison of Collision Resolution Methods

PropertyChainingLinear ProbingDouble Hashing
MemoryO(n+m)O(n + m)O(m)O(m)O(m)O(m)
Cache performancePoor (pointers)Good (contiguous)Good
ClusteringNonePrimaryNone
DeleteO(1)O(1)Lazy deletionLazy deletion
Load factor limitNo hard limitα<1\alpha \lt{} 1α<1\alpha \lt{} 1
info

Board-specific

  • AQA requires understanding of hash functions, collision resolution (linear probing, quadratic probing, rehashing), and calculating hash table load factor
  • CIE (9618) covers hashing and collision handling; may use different terminology
  • OCR (A) requires hash tables with collision resolution using linear probing and rehashing
  • Edexcel covers hash tables and collision resolution methods

Problem Set

Problem 1. Using the division method with table size m=7m = 7, compute the hash values for keys: 14, 21, 28, 35, 42. What do you observe?

Answer

h(k)=kmod7h(k) = k \bmod 7:

  • 14 mod 7 = 0
  • 21 mod 7 = 0
  • 28 mod 7 = 0
  • 35 mod 7 = 0
  • 42 mod 7 = 0

All keys hash to index 0 — maximum collisions. This demonstrates why mm should not divide common key patterns. If m=7m = 7 and all keys are multiples of 7, every key collides. Choose mm to be a prime not dividing common key values.

Problem 2. Insert keys 10, 22, 31, 4, 15, 28, 17 into a hash table of size 11 using linear probing. Show the table after each insertion.

Answer

h(k)=kmod11h(k) = k \bmod 11

KeyHashProbe sequenceFinal index
10101010
22000
31999
4444
1544, 55
28666
1766, 77

Final table:

Index012345678910
Key2241528173110

Problem 3. Using the same keys as Problem 2, insert into a hash table using chaining with m=5m = 5. Show each bucket.

Answer

h(k)=kmod5h(k) = k \bmod 5

BucketKeys (chain)
010 → 15 → 28
131
222 → 17
3
44

Load factor: α=7/5=1.4\alpha = 7/5 = 1.4.

Problem 4. Compute the polynomial rolling hash of the string "abc" using base p=31p = 31 and modulus m=101m = 101.

Answer

h=(a312+b311+c310)mod101h = (a \cdot 31^2 + b \cdot 31^1 + c \cdot 31^0) \bmod 101

=(97961+9831+991)mod101= (97 \cdot 961 + 98 \cdot 31 + 99 \cdot 1) \bmod 101

=(93217+3038+99)mod101= (93217 + 3038 + 99) \bmod 101

=96354mod101= 96354 \bmod 101

96354/101=95396354 / 101 = 953 remainder 96354953×101=9635496253=10196354 - 953 \times 101 = 96354 - 96253 = 101... let me recalculate.

953×101=96253953 \times 101 = 96253. 9635496253=10196354 - 96253 = 101. So 96354mod101=101mod101=096354 \bmod 101 = 101 \bmod 101 = 0.

Alternatively, computing iteratively:

  • h=0h = 0
  • h=(0×31+97)mod101=97h = (0 \times 31 + 97) \bmod 101 = 97
  • h=(97×31+98)mod101=(3007+98)mod101=3105mod101h = (97 \times 31 + 98) \bmod 101 = (3007 + 98) \bmod 101 = 3105 \bmod 101
  • 3105/101=303105 / 101 = 30 remainder 31053030=753105 - 3030 = 75
  • h=(75×31+99)mod101=(2325+99)mod101=2424mod101h = (75 \times 31 + 99) \bmod 101 = (2325 + 99) \bmod 101 = 2424 \bmod 101
  • 2424/101=242424 / 101 = 24 remainder 24242424=02424 - 2424 = 0

h("abc")=0h(\mathrm{"abc"}) = 0.

Problem 5. Explain why the load factor must be less than 1 for open addressing but can exceed 1 for chaining.

Answer

In open addressing, all keys are stored in the table itself. If α=n/m1\alpha = n/m \geq 1, there are more keys than slots, making it impossible to store all keys (the table is full). With linear probing, the search for an empty slot may never terminate.

In chaining, each bucket can hold an arbitrary number of keys (via a linked list). The table never "fills up" — chains simply grow longer. However, performance degrades as α\alpha increases, since search time is proportional to chain length.

Problem 6. Prove that with double hashing, the probe sequence visits all mm slots before repeating if h2(k)h_2(k) and mm are coprime.

Answer

Proof. The probe sequence is: h1(k), h1(k)+h2(k), h1(k)+2h2(k), (modm)h_1(k),\ h_1(k) + h_2(k),\ h_1(k) + 2h_2(k),\ \ldots \pmod m.

This is an arithmetic progression modulo mm with common difference d=h2(k)d = h_2(k). The sequence visits distinct values as long as dd is coprime with mm. If gcd(d,m)=g>1\gcd(d, m) = g \gt{} 1, the sequence cycles through only m/gm/g distinct values. When gcd(d,m)=1\gcd(d, m) = 1, by the properties of modular arithmetic, the sequence covers all mm residues before repeating. \square

This is why h2(k)h_2(k) must be chosen so that gcd(h2(k),m)=1\gcd(h_2(k), m) = 1.

Problem 7. A hash table with chaining has 1000 slots and contains 500 elements. What is the expected number of probes for a successful search? For an unsuccessful search?

Answer

Load factor: α=500/1000=0.5\alpha = 500/1000 = 0.5.

Unsuccessful search: Expected probes = α=0.5\alpha = 0.5 (we traverse the entire chain on average).

Successful search: Expected probes = 1+LBαRB◆◆LB2RBLBαRB◆◆LB2nRB1+LBαRB◆◆LB2RB=1+0.25=1.251 + \frac◆LB◆\alpha◆RB◆◆LB◆2◆RB◆ - \frac◆LB◆\alpha◆RB◆◆LB◆2n◆RB◆ \approx 1 + \frac◆LB◆\alpha◆RB◆◆LB◆2◆RB◆ = 1 + 0.25 = 1.25.

More precisely: for successful search, we examine half the chain on average (the target is equally likely to be at any position in the chain). Expected chain length = α=0.5\alpha = 0.5, so expected probes = 1+0.5/2=1.251 + 0.5/2 = 1.25.

Problem 8. Design a hash table that maps student IDs (7-digit integers) to names. Specify the hash function, table size, and collision resolution method. Justify your choices.

Answer

Table size: m=10007m = 10007 (a prime number, allowing for up to ~7000 students with load factor < 0.7).

Hash function: Division method: h(k)=kmod10007h(k) = k \bmod 10007.

Collision resolution: Chaining with linked lists.

Justification:

  • Division method is simple and efficient (O(1)O(1))
  • Prime table size reduces collisions from patterns in student IDs (e.g., sequential IDs)
  • Chaining handles variable load gracefully and simplifies deletion
  • Load factor < 0.7 ensures average O(1)O(1) operations

For revision on complexity, see Complexity Analysis.


Problems

Problem 1. A hash table uses the division method with table size m=13m = 13. Calculate the hash value for each of the following keys: 26, 42, 93, 17, 77, 31, 55, 20.

Hint

Apply h(k)=kmod13h(k) = k \bmod 13 to each key. Remember that the modulo operation gives the remainder after division.

Answer

h(k)=kmod13h(k) = k \bmod 13:

KeyCalculationHash value
2626 ÷ 13 = 2 r 00
4242 ÷ 13 = 3 r 33
9393 ÷ 13 = 7 r 22
1717 ÷ 13 = 1 r 44
7777 ÷ 13 = 5 r 1212
3131 ÷ 13 = 2 r 55
5555 ÷ 13 = 4 r 33
2020 ÷ 13 = 1 r 77

Note: 42 and 55 both hash to index 3 — a collision occurs.

Problem 2. A hash function for strings uses the polynomial rolling hash: h(s)=(s[0]+s[1]31+s[2]312)mod100h(s) = (s[0] + s[1] \cdot 31 + s[2] \cdot 31^2) \bmod 100, where s[i]s[i] is the ASCII code of the ii-th character. Calculate the hash value for the string "Cat".

Hint

Look up the ASCII values: C = 67, a = 97, t = 116. Apply the formula: h=(67+9731+116312)mod100h = (67 + 97 \cdot 31 + 116 \cdot 31^2) \bmod 100.

Answer

ASCII values: C = 67, a = 97, t = 116

h=(s[0]+s[1]31+s[2]312)mod100h = (s[0] + s[1] \cdot 31 + s[2] \cdot 31^2) \bmod 100

=(67+97×31+116×961)mod100= (67 + 97 \times 31 + 116 \times 961) \bmod 100

Calculate each term:

  • 97×31=300797 \times 31 = 3007
  • 116×961=111476116 \times 961 = 111476

h=(67+3007+111476)mod100h = (67 + 3007 + 111476) \bmod 100 =114550mod100= 114550 \bmod 100 =50= 50

The hash value of "Cat" is 50.

Problem 3. Insert the keys 44, 17, 31, 88, 61, 5, 22 into a hash table of size 7 using linear probing with h(k)=kmod7h(k) = k \bmod 7. Show the state of the table after each insertion.

Hint

For linear probing, if index h(k)h(k) is occupied, try h(k)+1h(k)+1, then h(k)+2h(k)+2, etc., wrapping around using modulo 7.

Answer

h(k)=kmod7h(k) = k \bmod 7

Keyh(k)h(k)Probe sequenceFinal index
44222
17333
3133 (occupied), 44
8844 (occupied), 55
6155 (occupied), 66
555, 6, 00
22111

Step-by-step table states:

After inserting 44: [—, —, 44, —, —, —, —] After inserting 17: [—, —, 44, 17, —, —, —] After inserting 31: [—, —, 44, 17, 31, —, —] After inserting 88: [—, —, 44, 17, 31, 88, —] After inserting 61: [—, —, 44, 17, 31, 88, 61] After inserting 5: [5, —, 44, 17, 31, 88, 61] After inserting 22: [5, 22, 44, 17, 31, 88, 61]

Final table:

Index0123456
Key5224417318861

Problem 4. Using the final hash table from Problem 3, trace the search for key 61 using linear probing. Show each probe.

Hint

To search, compute h(k)h(k) and probe sequentially. Stop when you find the key or reach an empty slot.

Answer

Search for 61 with h(k)=kmod7h(k) = k \bmod 7:

h(61)=61mod7=5h(61) = 61 \bmod 7 = 5

Probe sequence:

  1. Index 5: value is 88. 88 ≠ 61, continue probing.
  2. Index 6: value is 61. 61 = 61. Found!

The key 61 was found at index 6 after 2 probes.

Note: This illustrates a drawback of linear probing — even though h(61)=5h(61) = 5, the key is stored at index 6 due to earlier collisions. We must continue probing past occupied slots until we find the key or an empty slot.

Problem 5. Insert the keys 19, 36, 50, 5, 69, 14, 75 into a hash table of size 11 using quadratic probing with h(k,i)=(h(k)+i2)mod11h(k, i) = (h'(k) + i^2) \bmod 11 where h(k)=kmod11h'(k) = k \bmod 11. Show the table after all insertions.

Hint

Quadratic probing tries offsets of 0,1,4,9,16,0, 1, 4, 9, 16, \ldots (i.e., i2i^2 for i=0,1,2,3,i = 0, 1, 2, 3, \ldots). If the target index is occupied, try (h(k)+i2)modm(h'(k) + i^2) \bmod m.

Answer

h(k)=kmod11h'(k) = k \bmod 11, h(k,i)=(h(k)+i2)mod11h(k, i) = (h'(k) + i^2) \bmod 11

Keyh(k)h'(k)Probe sequence (i=0,1,2,i=0, 1, 2, \ldots)Final index
198(8+0)=8(8+0)=88
363(3+0)=3(3+0)=33
506(6+0)=6(6+0)=66
55(5+0)=5(5+0)=55
693(3+0)=3(3+0)=3 occupied, (3+1)=4(3+1)=44
143(3+0)=3(3+0)=3 occupied, (3+1)=4(3+1)=4 occupied, (3+4)=7(3+4)=77
759(9+0)=9(9+0)=99

Final table:

Index012345678910
Key3669550141975

Load factor: α=7/110.636\alpha = 7/11 \approx 0.636

Problem 6. A hash table of size 7 contains the keys {14, 21, 7, 28, 35} using the division method h(k)=kmod7h(k) = k \bmod 7. The load factor threshold is 0.7. A new key 42 needs to be inserted. Describe the rehashing process: choose a new table size, rehash all existing keys plus the new key, and show the final table using linear probing.

Hint

First check if rehashing is needed. If so, double the table size (to a prime), then reinsert every key into the new table using the new modulo.

Answer

Current state: 5 keys in table of size 7. Load factor α=5/70.714>0.7\alpha = 5/7 \approx 0.714 > 0.7.

After inserting 42: 6 keys, α=6/70.857>0.7\alpha = 6/7 \approx 0.857 > 0.7. Rehashing is needed.

Step 1: Choose new table size. Double and find nearest prime: 2×7=142 \times 7 = 14, but 14 is not prime. Nearest prime ≥ 14 is 17. New m=17m = 17.

Step 2: Rehash all keys using h(k)=kmod17h(k) = k \bmod 17.

Keykmod17k \bmod 17Index
141414
2144
777
281111
3511
4288

No collisions occur with the new table size.

Final table (size 17):

Index0123456789101112131416
Key35217422814

New load factor: α=6/170.353<0.7\alpha = 6/17 \approx 0.353 < 0.7

Problem 7. A hash table using chaining has 500 buckets and currently stores 350 key-value pairs. Calculate the load factor. If the table is resized to 1001 buckets (a prime), what is the new load factor? What is the expected number of comparisons for an unsuccessful search before and after resizing?

Hint

Load factor α=n/m\alpha = n/m. For chaining with unsuccessful search, expected comparisons = α\alpha.

Answer

Before resizing:

  • Load factor: α=350/500=0.7\alpha = 350/500 = 0.7
  • Expected comparisons for unsuccessful search: α=0.7\alpha = 0.7

After resizing to 1001 buckets:

  • Load factor: α=350/10010.350\alpha = 350/1001 \approx 0.350
  • Expected comparisons for unsuccessful search: α0.350\alpha \approx 0.350

For successful search, the expected comparisons before resizing is 1+α/2=1+0.35=1.351 + \alpha/2 = 1 + 0.35 = 1.35, and after resizing is 1+0.175=1.1751 + 0.175 = 1.175.

Resizing roughly halves the expected search time, demonstrating why maintaining a low load factor is important for performance.

Problem 8. Compare linear probing and chaining as collision resolution methods. For each method, discuss: (a) the best-case and worst-case time complexity for search, (b) how deletion is handled, and (c) the effect of increasing load factor on performance.

Hint

Consider how each method stores data (in the table itself vs. in linked lists) and how this affects probe sequences, memory usage, and deletion.

Answer
AspectLinear ProbingChaining
Best-case searchO(1)O(1) — key at its hash indexO(1)O(1) — key is alone in its bucket
Worst-case searchO(n)O(n) — all keys cluster togetherO(n)O(n) — all keys hash to same bucket
DeletionRequires "lazy deletion" (mark slot as deleted, not empty). If simply emptied, search would break by stopping early at the gap.O(1)O(1) — simply remove node from linked list
Load factor effectPerformance degrades rapidly as α1\alpha \to 1. At α>0.7\alpha > 0.7, clustering causes significant slowdown. Must keep α<1\alpha < 1.Performance degrades gradually. Chains grow linearly with α\alpha. No hard upper limit on α\alpha (but should keep < 1 for efficiency).
MemoryO(m)O(m) — fixed array sizeO(n+m)O(n + m) — array + linked list nodes
Cache performanceGood — contiguous memory accessPoor — following pointers to scattered nodes

Key trade-off: Linear probing has better cache performance but suffers from clustering and requires careful load factor management. Chaining is simpler to implement and handles deletion cleanly but uses extra memory for pointers.

Problem 9. A library system needs a hash table to store book records, where each book has a 10-digit ISBN (e.g., 9780132350884). Design a suitable hash function. Explain your choice and show worked examples for at least three ISBNs.

Hint

Consider using the division method or a digit-selection technique. Think about what properties the hash function should have (uniform distribution, efficiency, minimal collisions).

Answer

Design choice: Use the division method with a prime table size. Since ISBNs are large integers, we can select specific digit groups to reduce the number while preserving distribution.

Hash function: Extract the last 5 digits of the ISBN and use the division method with m=10007m = 10007 (a prime):

h(isbn)=(isbnmod100000)mod10007h(\mathrm{isbn}) = (\mathrm{isbn} \bmod 100000) \bmod 10007

Rationale:

  • Taking the last 5 digits (mod 100000) gives values from 0–99999, which is manageable
  • Using a prime m=10007m = 10007 ensures good distribution (avoids patterns in ISBNs aligning with divisors of mm)
  • The function is O(1)O(1) and deterministic

Worked examples:

ISBNLast 5 digitsmod100000\bmod 100000mod10007\bmod 10007Hash
9780132350884508845088450884 mod 10007 = 50884 − 5×10007 = 50884 − 50035 = 849849
97805960071260712671267126 mod 10007 = 71267126
9780201896831968319683196831 mod 10007 = 96831 − 9×10007 = 96831 − 90063 = 67686768

This gives good distribution across the table. With m=10007m = 10007 and a load factor target of 0.7, the table can hold approximately 7000 books before needing resizing.

Problem 10. (Exam-style multi-step question) A hash table of size 11 uses double hashing with h1(k)=kmod11h_1(k) = k \bmod 11 and h2(k)=1+(kmod9)h_2(k) = 1 + (k \bmod 9). The following keys are to be inserted in order: 47, 25, 63, 14, 80, 36, 52.

(a) Show the probe sequence for each key and the final state of the hash table. (b) What is the load factor of the final table? (c) Trace the search for key 80 in the final table. How many probes are needed? (d) If key 25 is deleted, explain why lazy deletion (tombstone marking) is necessary in open addressing, and show what happens if a search for key 47 is performed after deletion without tombstones.

Hint

For double hashing: h(k,i)=(h1(k)+ih2(k))mod11h(k, i) = (h_1(k) + i \cdot h_2(k)) \bmod 11. Calculate h2(k)h_2(k) for each key first, then trace the probe sequence.

Answer

(a) Insertion with double hashing:

h1(k)=kmod11h_1(k) = k \bmod 11, h2(k)=1+(kmod9)h_2(k) = 1 + (k \bmod 9)

Keyh1(k)h_1(k)h2(k)h_2(k)Probe sequence (h1+ih2h_1 + i \cdot h_2) mod 11Final index
4731+(47mod9)=1+2=31+(47 \bmod 9) = 1+2 = 3(3+0)=3(3+0)=33
2531+(25mod9)=1+7=81+(25 \bmod 9) = 1+7 = 8(3+0)=3(3+0)=3 occupied, (3+8)=0(3+8)=00
6381+(63mod9)=1+0=11+(63 \bmod 9) = 1+0 = 1(8+0)=8(8+0)=88
1431+(14mod9)=1+5=61+(14 \bmod 9) = 1+5 = 6(3+0)=3(3+0)=3 occ, (3+6)=9(3+6)=99
8031+(80mod9)=1+8=91+(80 \bmod 9) = 1+8 = 9(3+0)=3(3+0)=3 occ, (3+9)=1(3+9)=11
3631+(36mod9)=1+0=11+(36 \bmod 9) = 1+0 = 1(3+0)=3(3+0)=3 occ, (3+1)=4(3+1)=44
5281+(52mod9)=1+7=81+(52 \bmod 9) = 1+7 = 8(8+0)=8(8+0)=8 occ, (8+8)=5(8+8)=55

Final table:

Index012345678910
Key25804736526314

(b) Load factor:

α=n/m=7/110.636\alpha = n/m = 7/11 \approx 0.636

(c) Search for key 80:

h1(80)=80mod11=3h_1(80) = 80 \bmod 11 = 3 h2(80)=1+(80mod9)=9h_2(80) = 1 + (80 \bmod 9) = 9

Probe sequence:

  1. (3+0×9)mod11=3(3 + 0 \times 9) \bmod 11 = 3 → value is 47 ≠ 80, continue
  2. (3+1×9)mod11=1(3 + 1 \times 9) \bmod 11 = 1 → value is 80 = 80. Found!

Key 80 found at index 1 after 2 probes.

(d) Deletion and lazy deletion:

If key 25 at index 0 is deleted by simply setting the slot to empty, a subsequent search for key 47 would fail:

h1(47)=3h_1(47) = 3, h2(47)=3h_2(47) = 3

Search for 47:

  1. Index 3 → 47 found immediately. (This actually works for 47.)

But consider searching for key 25 after re-inserting it at a different location — or consider searching for key 80 if index 0 were emptied: h1(80)=3h_1(80) = 3, probing goes to index 1 (found). However, the problem arises with keys that were inserted after the deleted key and probed past it.

Consider searching for a hypothetical key that hashed to index 0 and was placed further along due to probing. If index 0 is simply emptied, the search would stop prematurely at the empty slot, incorrectly concluding the key is not in the table.

Solution: Use lazy deletion (tombstone). Mark index 0 as DELETED (not EMPTY). During search, treat DELETED slots as occupied (continue probing past them). During insert, treat DELETED slots as available for insertion.

:::

:::