Searching Algorithms
1. Linear Search
Algorithm
Problem: Given an array and a target value , determine whether exists in and return its index (or if not found).
def linear_search(A, x):
for i in range(len(A)):
if A[i] == x:
return i
return -1
Correctness
Theorem. linear_search(A, x) returns the index of the first occurrence of in , or
if is not present.
Proof. The algorithm examines elements in order. If , it immediately returns , which is the first occurrence since all earlier elements were checked and found not equal to . If the loop completes without finding , then , and is returned.
Complexity Analysis
Theorem. Linear search has worst-case time complexity and best-case time complexity .
Proof of worst case. In the worst case, is at index or absent. The algorithm performs comparisons. Since each comparison takes time, the total is .
Proof of lower bound. Linear search requires comparisons in the worst case.
Proof. Consider an adversary argument. An adversary can answer "not equal" to the first comparisons. Only after checking all elements can the algorithm conclude that is absent. Any algorithm that does not check all positions can be fooled: the unchecked position could contain . Therefore, at least comparisons are necessary in the worst case.
| Case | Comparisons | Time |
|---|---|---|
| Best | 1 | |
| Average | ||
| Worst |
2. Binary Search
Algorithm
Problem: Given a sorted array and a target value , find the index of (or determine that is not present).
def binary_search(A, x):
low = 0
high = len(A) - 1
while low <= high:
mid = (low + high) // 2
if A[mid] == x:
return mid
elif A[mid] < x:
low = mid + 1
else:
high = mid - 1
return -1
Correctness Proof
Theorem. binary_search(A, x) returns the index of in the sorted array , or if
is not present.
Proof. We prove by invariant.
Invariant: At the start of each loop iteration, if exists in , then .
Base case. Initially, low = 0 and high = n-1, so
. If , the invariant holds.
Maintenance. Three cases:
- : Return mid. Correct. ✓
- : Since is sorted,
, so . Setting
low = mid + 1restricts the search to . If was in the old range, it is in the new range. - : Since is sorted,
, so . Setting
high = mid - 1restricts the search to . If was in the old range, it is in the new range.
Termination. The loop terminates when low > high, meaning is
empty. By the invariant, . Return . ✓
Complexity Analysis
Theorem. Binary search performs comparisons.
Proof. At each iteration, the search range is halved. Starting with a range of size , after iterations the range size is at most . The algorithm terminates when the range is empty, which happens when , i.e., . Therefore, the maximum number of iterations is .
Formal derivation. Let be the number of comparisons for an array of size .
By the Master Theorem (case 2): .
Theorem (Binary search lower bound). Any comparison-based search algorithm on a sorted array requires comparisons in the worst case.
Proof. A decision tree for searching a sorted array of elements has at least leaves ( possible positions for , plus "not found"). A binary tree of height has at most leaves, so:
Pitfall Binary search only works on sorted arrays. Applying it to an unsorted array
gives incorrect results. Also, beware of integer overflow when computing mid = (low + high) // 2 —
use mid = low + (high - low) // 2 for safety.
Example: Trace binary search for x = 7 in [1, 3, 5, 7, 9, 11, 13]
| Iteration | low | high | mid | A[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 6 | 3 | 7 | Found! Return 3 |
Result: index 3. ✓
Example: Trace binary search for x = 6 in [1, 3, 5, 7, 9, 11, 13]
| Iteration | low | high | mid | A[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 6 | 3 | 7 | 7 > 6, high = 2 |
| 2 | 0 | 2 | 1 | 3 | 3 < 6, low = 2 |
| 3 | 2 | 2 | 2 | 5 | 5 < 6, low = 3 |
| 4 | 3 | 2 | — | — | low > high, return -1 |
Result: -1 (not found). ✓
Recursive Binary Search
def binary_search_recursive(A, x, low, high):
if low > high:
return -1
mid = low + (high - low) // 2
if A[mid] == x:
return mid
elif A[mid] < x:
return binary_search_recursive(A, x, mid + 1, high)
else:
return binary_search_recursive(A, x, low, mid - 1)
Board-specific AQA requires linear search and binary search; binary search must be on sorted data and may require trace tables. CIE (9618) requires linear search and binary search with pseudocode. OCR (A) requires linear and binary search; may also cover hash-based searching. Edexcel covers linear and binary search algorithms.
3. Comparison of Search Algorithms
| Property | Linear Search | Binary Search |
|---|---|---|
| Precondition | None | Array must be sorted |
| Best case | ||
| Average case | ||
| Worst case | ||
| Data structure | Array, list | Array (random access) |
| Works on linked list? | Yes | No (no random access) |
4. Variants
Binary Search for Insertion Point
Find the position where should be inserted to maintain sorted order:
def binary_search_insert_position(A, x):
low, high = 0, len(A)
while low < high:
mid = (low + high) // 2
if A[mid] < x:
low = mid + 1
else:
high = mid
return low
Binary Search on a Answer Space
Binary search can be used to find a threshold in a continuous or discrete answer space (e.g., "minimum maximum", "maximum minimum" problems).
Exam tip For exam questions, always state the precondition (sorted array) for binary search and trace through the algorithm step by step. Show the low, high, mid values at each iteration.
Problem Set
Problem 1. Trace linear search for the value 8 in the array [3, 1, 4, 1, 5, 9, 2, 6]. How many
comparisons are made?
Answer
The value 8 is not in the array. All 8 elements are checked:
| Step | Index | A[index] | Comparison | Count |
|---|---|---|---|---|
| 1 | 0 | 3 | 3 ≠ 8 | 1 |
| 2 | 1 | 1 | 1 ≠ 8 | 2 |
| 3 | 2 | 4 | 4 ≠ 8 | 3 |
| 4 | 3 | 1 | 1 ≠ 8 | 4 |
| 5 | 4 | 5 | 5 ≠ 8 | 5 |
| 6 | 5 | 9 | 9 ≠ 8 | 6 |
| 7 | 6 | 2 | 2 ≠ 8 | 7 |
| 8 | 7 | 6 | 6 ≠ 8 | 8 |
Total comparisons: 8. Return -1.
Problem 2. Trace binary search for the value 25 in the sorted array
[2, 5, 8, 12, 16, 23, 38, 56, 72, 91]. Show all iterations.
Answer
| Iteration | low | high | mid | A[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 9 | 4 | 16 | 16 < 25, low = 5 |
| 2 | 5 | 9 | 7 | 56 | 56 > 25, high = 6 |
| 3 | 5 | 6 | 5 | 23 | 23 < 25, low = 6 |
| 4 | 6 | 6 | 6 | 38 | 38 > 25, high = 5 |
| 5 | 6 | 5 | — | — | low > high → -1 |
4 comparisons. Result: -1.
Problem 3. An array of 1024 elements is searched using binary search. What is the maximum number of comparisons required?
Answer
comparisons.
More precisely, binary search on elements requires at most comparisons.
Problem 4. Prove that binary search cannot be directly applied to a singly linked list, and explain what alternative approach could achieve search on a linked list.
Answer
Binary search requires access to the middle element (A[mid]). In a singly linked list, accessing the -th element requires traversing nodes from the head, which is . Finding the middle of a list of elements takes time, eliminating the benefit of halving.
Alternative: Jump list / Skip list — a data structure with multiple levels of linked lists that allows search by "skipping" ahead at higher levels, analogous to binary search.
Problem 5. Explain why the worst case for linear search is using an adversary argument.
Answer
An adversary constructs the worst case dynamically. The adversary maintains that the target is not at any position already examined by the algorithm. After comparisons, all positions except one have been checked. The adversary places at the remaining unchecked position (or declares it absent). Therefore, any correct algorithm must check all positions in the worst case, requiring comparisons.
Problem 6. Write a function to count the number of occurrences of a value in a sorted array using binary search. Your function should run in time.
Answer
Find the leftmost and rightmost occurrence using binary search, then compute the difference.
def count_occurrences(A, x):
left = binary_search_insert_position(A, x)
right = binary_search_insert_position(A, x + 1) - 1
if left <= right and left < len(A) and A[left] == x:
return right - left + 1
return 0
Two binary searches: .
Problem 7. Given an array that is sorted but rotated (e.g., [4, 5, 6, 7, 0, 1, 2]), write a
modified binary search that runs in time.
Answer
def search_rotated(A, x):
low, high = 0, len(A) - 1
while low <= high:
mid = (low + high) // 2
if A[mid] == x:
return mid
if A[low] <= A[mid]:
if A[low] <= x < A[mid]:
high = mid - 1
else:
low = mid + 1
else:
if A[mid] < x <= A[high]:
low = mid + 1
else:
high = mid - 1
return -1
The key insight: one half of the array (left or right of mid) is always sorted. Determine which half is sorted and whether the target lies within it.
Problem 8. A binary search implementation has the following bug: mid = (low + high) / 2 (using
floating-point division instead of integer division). What goes wrong?
Answer
In Python, / produces a float, and using a float as an array index raises a TypeError. In
languages like C/Java, int mid = (low + high) / 2 truncates toward zero, which works correctly for
positive values but is technically implementation-dependent.
The more serious bug is integer overflow: if low + high > INT_MAX, the sum overflows. The
correct form is mid = low + (high - low) / 2, which cannot overflow since high - low is always
non-negative and less than INT_MAX.
For revision on sorting, see Sorting Algorithms.
Problems
Problem 1. Trace linear search for the value 14 in the array [7, 3, 14, 2, 9, 6, 1, 8]. How
many comparisons are made until the item is found?
Hint
Step through each element from index 0, comparing each with the target 14. Count each comparison until a match is found.
Answer
| Step | Index | A[index] | Comparison | Count |
|---|---|---|---|---|
| 1 | 0 | 7 | 7 ≠ 14 | 1 |
| 2 | 1 | 3 | 3 ≠ 14 | 2 |
| 3 | 2 | 14 | 14 = 14 ✓ | 3 |
3 comparisons are made. The value 14 is found at index 2. The algorithm returns 2.
Problem 2. Trace linear search for the value 5 in the array
[10, 20, 30, 40, 50, 60, 70, 80, 90]. How many comparisons are made?
Hint
The value 5 is not in the array, so the algorithm must check every single element before returning -1.
Answer
| Step | Index | A[index] | Comparison | Count |
|---|---|---|---|---|
| 1 | 0 | 10 | 10 ≠ 5 | 1 |
| 2 | 1 | 20 | 20 ≠ 5 | 2 |
| 3 | 2 | 30 | 30 ≠ 5 | 3 |
| 4 | 3 | 40 | 40 ≠ 5 | 4 |
| 5 | 4 | 50 | 50 ≠ 5 | 5 |
| 6 | 5 | 60 | 60 ≠ 5 | 6 |
| 7 | 6 | 70 | 70 ≠ 5 | 7 |
| 8 | 7 | 80 | 80 ≠ 5 | 8 |
| 9 | 8 | 90 | 90 ≠ 5 | 9 |
9 comparisons are made. The value 5 is not found, so the algorithm returns -1. This is the worst case for an array of 9 elements — every element must be checked.
Problem 3. Trace binary search for the value 42 in the sorted array
[3, 11, 19, 27, 35, 42, 50, 58, 66, 74]. Show all iterations with low, high, mid, and the action
taken.
Hint
Start with low = 0, high = 9. Calculate mid = (0 + 9) // 2 = 4. Compare A[4] with 42 and adjust the range accordingly.
Answer
| Iteration | low | high | mid | A[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 9 | 4 | 35 | 35 < 42, low = 5 |
| 2 | 5 | 9 | 7 | 58 | 58 > 42, high = 6 |
| 3 | 5 | 6 | 5 | 42 | Found! Return 5 |
3 comparisons are made. The value 42 is found at index 5.
Problem 4. Trace binary search for the value 15 in the sorted array
[2, 6, 10, 14, 18, 22, 26, 30]. Show all iterations.
Hint
The value 15 lies between 14 (index 3) and 18 (index 4). The algorithm will narrow down to this gap and then terminate with low > high.
Answer
| Iteration | low | high | mid | A[mid] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 7 | 3 | 14 | 14 < 15, low = 4 |
| 2 | 4 | 7 | 5 | 22 | 22 > 15, high = 4 |
| 3 | 4 | 4 | 4 | 18 | 18 > 15, high = 3 |
| 4 | 4 | 3 | — | — | low > high, return |
4 comparisons are made. The value 15 is not in the array, so the algorithm returns -1.
Problem 5. An unsorted array of 10,000 elements must be searched repeatedly. Compare the total cost of using linear search directly for 1,000 queries versus sorting the array once then using binary search for 1,000 queries.
Hint
Calculate the cost of each approach: (a) 1,000 linear searches, and (b) one sort plus 1,000 binary searches. Use O(n log n) for sorting and O(log n) for each binary search.
Answer
Linear search approach: 1,000 × O(10,000) = O(10,000,000) total comparisons.
Sort + binary search approach:
- One-time sort: O(10,000 log₂ 10,000) ≈ O(10,000 × 13.3) ≈ O(133,000) comparisons
- 1,000 binary searches: 1,000 × O(log₂ 10,000) ≈ 1,000 × 14 = O(14,000) comparisons
- Total: O(133,000) + O(14,000) = O(147,000) comparisons
Sort + binary search is approximately 68 times more efficient in total. The one-time cost of sorting is quickly amortised over multiple queries. The more queries needed, the greater the advantage of sorting first.
Problem 6. A database contains 500,000 records sorted by a unique key field. Explain which search algorithm is more efficient and calculate the maximum number of comparisons for each algorithm.
Hint
Since the data is already sorted, binary search can be applied directly. Calculate ⌊log₂(n)⌋ + 1 for the binary search worst case.
Answer
Linear search: Worst case = 500,000 comparisons. Time complexity: .
Binary search: Worst case = comparisons. Time complexity: .
Binary search is dramatically more efficient — at most 19 comparisons versus 500,000 for linear search, an improvement factor of approximately 26,000×. Since the data is already sorted, there is no additional preprocessing cost.
Problem 7. Calculate the maximum number of comparisons required for binary search on arrays of sizes 15, 100, 500, and 1,000,000. Show your working using the formula .
Hint
Apply the formula to each array size. Remember that means the greatest integer less than or equal to .
Answer
Using :
| Max comparisons | |||
|---|---|---|---|
| 15 | 3.91 | 3 | 3 + 1 = 4 |
| 100 | 6.64 | 6 | 6 + 1 = 7 |
| 500 | 8.97 | 8 | 8 + 1 = 9 |
| 1,000,000 | 19.93 | 19 | 19 + 1 = 20 |
This demonstrates the power of logarithmic growth: searching through a million elements requires only 20 comparisons maximum.
Problem 8. Write pseudocode for (a) a linear search that returns the index of the first occurrence of a target value in an array, and (b) a binary search on a sorted array that returns the index of the target or -1 if not found.
Hint
Linear search uses a simple FOR loop checking each element. Binary search uses a WHILE loop with low and high pointers, calculating mid each iteration.
Answer
(a) Linear search:
FUNCTION LinearSearch(A, x)
FOR i ← 0 TO LEN(A) - 1
IF A[i] = x THEN
RETURN i
ENDIF
ENDFOR
RETURN -1
ENDFUNCTION
(b) Binary search:
FUNCTION BinarySearch(A, x)
low ← 0
high ← LEN(A) - 1
WHILE low ≤ high
mid ← (low + high) DIV 2
IF A[mid] = x THEN
RETURN mid
ELSE IF A[mid] < x THEN
low ← mid + 1
ELSE
high ← mid - 1
ENDIF
ENDWHILE
RETURN -1
ENDFUNCTION
Note: In the binary search, DIV 2 performs integer division (floor division), which is equivalent
to // in Python.
Problem 9. Trace binary search for the value 17 in the sorted array
[4, 8, 12, 15, 17, 20, 24, 28, 32, 36, 40]. Show low, high, mid, and the comparison at each step.
Hint
The array has 11 elements (indices 0–10). Start with low = 0, high = 10. The first mid will be (0 + 10) // 2 = 5.
Answer
| Iteration | low | high | mid | A[mid] | Comparison | Action |
|---|---|---|---|---|---|---|
| 1 | 0 | 10 | 5 | 20 | 20 > 17 | high = 4 |
| 2 | 0 | 4 | 2 | 12 | 12 < 17 | low = 3 |
| 3 | 3 | 4 | 3 | 15 | 15 < 17 | low = 4 |
| 4 | 4 | 4 | 4 | 17 | 17 = 17 ✓ | Found! Return 4 |
4 comparisons are made. The value 17 is found at index 4.
Problem 10. (Exam-style) A school library system stores 20,000 book records. The librarian needs to: (a) search for a book by its ISBN (the catalogue is sorted by ISBN), (b) check whether a specific book ID exists in an unsorted list of 50 recently returned books, (c) find the price of a book given its ISBN in a sorted price catalogue. For each scenario, justify which search algorithm is most appropriate, stating your assumptions about the data structure and ordering.
Hint
Consider three factors for each scenario: (1) Is the data sorted? (2) How large is the dataset? (3) How many searches will be performed? The cost of sorting must be weighed against the benefit of binary search.
Answer
(a) Binary search. The ISBN catalogue is sorted and stored in an array with random access. Binary search requires at most comparisons, compared to 20,000 for linear search. This is efficient and appropriate since no preprocessing is needed.
(b) Linear search. The list of 50 recently returned books is unsorted and small. Linear search takes at most 50 comparisons — negligible cost. Sorting first would cost operations, which exceeds the 50 comparisons needed for a single search. For a single check, linear search is optimal. If many repeated searches were needed, sorting first and using binary search (7 comparisons max) would become worthwhile after approximately 6 searches ().
(c) Binary search. The price catalogue is sorted by ISBN with random access. Binary search finds the ISBN in comparisons, then retrieves the price at that index in . Linear search would require comparisons — unnecessary when the data is already sorted.
Summary:
| Scenario | Data size | Sorted? | Best algorithm | Max comparisons |
|---|---|---|---|---|
| (a) ISBN lookup | 20,000 | Yes | Binary search | 15 |
| (b) Recently returned | 50 | No | Linear search | 50 |
| (c) Price lookup | 20,000 | Yes | Binary search | 15 |
:::
:::
:::