Graphs
1. Graph Fundamentals
Definition
A graph consists of a set of vertices (nodes) and a set of edges .
- Undirected graph: Edges have no direction;
- Directed graph (digraph): Edges are ordered pairs;
- Weighted graph: Each edge has an associated weight
Terminology
| Term | Definition |
|---|---|
| Adjacent | Two vertices connected by an edge |
| Degree | Number of edges incident to a vertex (undirected) |
| In-degree | Number of incoming edges (directed) |
| Out-degree | Number of outgoing edges (directed) |
| Path | Sequence of vertices where consecutive vertices are adjacent |
| Cycle | Path that starts and ends at the same vertex |
| Connected | There is a path between every pair of vertices |
| DAG | Directed Acyclic Graph — a directed graph with no cycles |
Theorem (Handshaking Lemma). The sum of all vertex degrees equals .
Proof. Each edge contributes 1 to the degree of each of its two endpoints. Summing degrees counts each edge exactly twice.
2. Graph Representations
Adjacency Matrix
An matrix where if edge exists (0 otherwise). For weighted graphs, .
| Property | Value |
|---|---|
| Space | |
| Add edge | |
| Remove edge | |
| Check adjacency | |
| List neighbours | |
| Iterate all edges |
Best for: Dense graphs ().
Adjacency List
An array of lists, where adj[i] contains all vertices adjacent to vertex .
class Graph:
def __init__(self, n, directed=False):
self.n = n
self.adj = [[] for _ in range(n)]
self.directed = directed
def add_edge(self, u, v, weight=1):
self.adj[u].append((v, weight))
if not self.directed:
self.adj[v].append((u, weight))
| Property | Value |
|---|---|
| Space | |
| Add edge | |
| Remove edge | |
| Check adjacency | |
| List neighbours | |
| Iterate all edges |
Best for: Sparse graphs ().
Comparison
| Property | Adjacency Matrix | Adjacency List |
|---|---|---|
| Space | ||
| Edge lookup | ||
| Add edge | ||
| Sparse graphs | Wasteful | Efficient |
| Dense graphs | Efficient | Slightly slower |
Board-specific
- AQA requires adjacency matrix and adjacency list representations; Dijkstra's algorithm for shortest path
- CIE (9618) requires graph representations and traversal; may include minimum spanning tree algorithms (Kruskal's, Prim's)
- OCR (A) requires adjacency matrix/list; Dijkstra's, Kruskal's, and Prim's algorithms
- Edexcel covers basic graph representations and traversals
3. Graph Traversals
3.1 Breadth-First Search (BFS)
BFS explores vertices level by level, using a queue.
from collections import deque
def bfs(graph, start):
visited = [False] * graph.n
distance = [-1] * graph.n
queue = deque([start])
visited[start] = True
distance[start] = 0
while queue:
u = queue.popleft()
for v, w in graph.adj[u]:
if not visited[v]:
visited[v] = True
distance[v] = distance[u] + 1
queue.append(v)
return distance
Theorem (BFS finds shortest paths). In an unweighted graph, BFS from source computes the shortest-path distance for every vertex .
Proof. We prove by induction on the distance that all vertices at distance from are discovered (enqueued) at distance , and no vertex is discovered at a distance greater than its true shortest distance.
Base case (). is at distance 0 and is discovered at distance 0.
Inductive step. Assume all vertices at distance are discovered at distance . When a vertex at distance is dequeued, all unvisited neighbours are at distance at most (by edge relaxation). If were at distance , it would have been discovered earlier (by the inductive hypothesis or a previous BFS level). So is at distance exactly and is discovered at that distance. No vertex can be discovered at distance through , since each edge adds exactly 1 to the path length.
Complexity: — each vertex is visited once, each edge is examined at most twice (once from each endpoint in undirected graphs).
3.2 Depth-First Search (DFS)
DFS explores as far as possible along each branch before backtracking, using a stack (or recursion).
def dfs(graph, start):
visited = [False] * graph.n
def dfs_visit(u):
visited[u] = True
for v, w in graph.adj[u]:
if not visited[v]:
dfs_visit(v)
dfs_visit(start)
Complexity: — same analysis as BFS.
BFS vs DFS
| Property | BFS | DFS |
|---|---|---|
| Data structure | Queue | Stack (recursion) |
| Exploration | Level by level | Branch by branch |
| Shortest path | Yes (unweighted) | No |
| Memory | (queue) | (recursion stack) |
| Use cases | Shortest path, level order | Cycle detection, topological sort, connected components |
4. Dijkstra's Algorithm
Problem
Find the shortest path from a source vertex to all other vertices in a weighted graph with non-negative weights.
Algorithm
import heapq
def dijkstra(graph, source):
dist = [float('inf')] * graph.n
dist[source] = 0
pq = [(0, source)]
visited = [False] * graph.n
while pq:
d, u = heapq.heappop(pq)
if visited[u]:
continue
visited[u] = True
for v, w in graph.adj[u]:
if not visited[v] and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
heapq.heappush(pq, (dist[v], v))
return dist
Correctness Proof
Theorem. Dijkstra's algorithm computes the shortest-path distance from to every vertex in a graph with non-negative edge weights.
Proof. We prove by induction that when a vertex is extracted from the priority queue (marked visited), equals the true shortest distance .
Let be the set of visited vertices. We maintain the invariant: for every , .
Base case. is extracted first with . ✓
Inductive step. Let be the next vertex extracted. Assume for contradiction that . Then there exists a shortest path from to . Let be the first vertex on not in , and let be the predecessor of on (). Then:
Since , would have been extracted from the priority queue before — contradiction. Therefore .
Complexity: With a binary heap: . Each vertex is extracted once ( each), and each edge causes at most one decrease-key ( each).
warning Bellman-Ford algorithm instead for graphs that may contain negative weights.
5. Minimum Spanning Tree (MST)
Definition
A spanning tree of a connected graph is a subgraph that is a tree and includes all vertices. A minimum spanning tree is the spanning tree with the minimum total edge weight.
Cut Property
Lemma (Cut Property). For any cut of the graph, the minimum-weight edge crossing the cut belongs to some MST.
Proof. Let be the minimum-weight edge crossing cut . Suppose is not in MST . Adding to creates a cycle. This cycle must cross the cut at least twice (once via ), so there exists another edge in the cycle crossing the cut. Since is the minimum-weight crossing edge, . Replacing with in yields a spanning tree with weight . Since is minimum, , and the new tree is also an MST containing .
Kruskal's Algorithm
def kruskal(graph):
edges = []
for u in range(graph.n):
for v, w in graph.adj[u]:
if u < v:
edges.append((w, u, v))
edges.sort()
parent = list(range(graph.n))
rank = [0] * graph.n
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return False
if rank[rx] < rank[ry]:
rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
return True
mst = []
for w, u, v in edges:
if union(u, v):
mst.append((u, v, w))
if len(mst) == graph.n - 1:
break
return mst
Correctness. Kruskal's algorithm is correct by the cut property. When an edge is added, the vertices it connects are in different components — this defines a cut where is the minimum crossing edge (since edges are processed in sorted order). By the cut property, belongs to some MST.
Complexity: Sorting: . Union-Find operations: , where is the inverse Ackermann function (effectively ). Total: .
Prim's Algorithm
def prim(graph, start):
dist = [float('inf')] * graph.n
in_mst = [False] * graph.n
dist[start] = 0
pq = [(0, start)]
mst_weight = 0
while pq:
d, u = heapq.heappop(pq)
if in_mst[u]:
continue
in_mst[u] = True
mst_weight += d
for v, w in graph.adj[u]:
if not in_mst[v] and w < dist[v]:
dist[v] = w
heapq.heappush(pq, (w, v))
return mst_weight
Correctness. Prim's is correct by the cut property. At each step, the cut separates MST vertices from non-MST vertices, and the algorithm selects the minimum-weight crossing edge.
Complexity: with a binary heap.
6. Topological Sort
A topological ordering of a DAG is a linear ordering of vertices such that for every directed edge , comes before .
Algorithm: DFS-based. When a vertex finishes, prepend it to the result.
def topological_sort(graph):
visited = [False] * graph.n
result = []
def visit(u):
visited[u] = True
for v, _ in graph.adj[u]:
if not visited[v]:
visit(v)
result.append(u)
for u in range(graph.n):
if not visited[u]:
visit(u)
return result[::-1]
Complexity: .
Problem Set
Problem 1. Represent the following graph using both an adjacency matrix and an adjacency list.
Vertices: {A, B, C, D}. Edges: {A-B, A-C, B-C, C-D, D-A}
Answer
Adjacency matrix (index: A=0, B=1, C=2, D=3):
Adjacency list:
- A: [B, C, D]
- B: [A, C]
- C: [A, B, D]
- D: [C, A]
Problem 2. Trace BFS starting from vertex A on the graph from Problem 1. List the order in which vertices are visited and their distances.
Answer
| Step | Dequeue | Visit | Neighbours | Queue (front→rear) | Distances |
|---|---|---|---|---|---|
| 0 | — | — | — | [A] | A:0 |
| 1 | A | B, C, D | B, C, D | [B, C, D] | B:1, C:1, D:1 |
| 2 | B | A, C | A (visited) | [C, D] | — |
| 3 | C | A, B, D | A, B (visited), D (visited) | [D] | — |
| 4 | D | C, A | C, A (visited) | [] | — |
Visit order: A, B, C, D. Distances: A:0, B:1, C:1, D:1.
Problem 3. Apply Dijkstra's algorithm to find the shortest paths from vertex A in the following weighted graph. Edges: A→B (4), A→C (2), B→C (1), B→D (5), C→B (1), C→D (8), C→E (10), D→E (2).
Answer
| Step | Extract | dist[A] | dist[B] | dist[C] | dist[D] | dist[E] |
|---|---|---|---|---|---|---|
| 0 | — | 0 | ∞ | ∞ | ∞ | ∞ |
| 1 | A (0) | 0 | 4 | 2 | ∞ | ∞ |
| 2 | C (2) | 0 | 3 | 2 | 10 | 12 |
| 3 | B (3) | 0 | 3 | 2 | 8 | 12 |
| 4 | D (8) | 0 | 3 | 2 | 8 | 10 |
| 5 | E (10) | 0 | 3 | 2 | 8 | 10 |
Shortest paths: A→A: 0, A→B: 3 (A→C→B), A→C: 2, A→D: 8 (A→C→B→D), A→E: 10 (A→C→B→D→E).
Problem 4. Find the MST of the following graph using Kruskal's algorithm. Edges with weights: A-B (4), A-C (2), B-C (1), B-D (5), C-D (8), D-E (3).
Answer
Sorted edges: B-C (1), A-C (2), D-E (3), A-B (4), B-D (5), C-D (8)
| Edge | Weight | Union? | MST edges |
|---|---|---|---|
| B-C | 1 | Yes | {B-C} |
| A-C | 2 | Yes | {B-C, A-C} |
| D-E | 3 | Yes | {B-C, A-C, D-E} |
| A-B | 4 | No (cycle A-B-C-A) | — |
| B-D | 5 | Yes | {B-C, A-C, D-E, B-D} |
MST weight: . 4 edges for 5 vertices. ✓
Problem 5. Prove that BFS uses space in the worst case.
Answer
In the worst case, all vertices at the same distance from the source are in the queue simultaneously. In a graph where the source is connected to all other vertices, at distance 1 there are vertices in the queue. In a star graph, the maximum queue size is . In a complete graph, BFS visits one level at a time, and the maximum queue size is bounded by the number of vertices at the maximum depth, which is at most . Hence the space is .
Problem 6. Given a DAG, explain why topological sort is possible but BFS-based shortest path (for unweighted graphs) might not produce correct results if cycles exist.
Answer
Topological sort is only defined for DAGs (graphs without directed cycles). If a directed cycle exists, there is no valid topological ordering because for any edge in the cycle, must come before , and following the cycle leads to a contradiction.
For shortest paths: in an unweighted graph with cycles, BFS still works correctly because BFS visits each vertex at most once (it marks vertices as visited). The shortest path distance is still well-defined even with cycles, since a cycle would only increase the path length. However, for weighted graphs with negative cycles, shortest paths are undefined (you can keep going around the cycle to decrease the distance).
Problem 7. A graph has 6 vertices and 9 edges. What is the sum of all vertex degrees? Is this graph necessarily connected?
Answer
By the Handshaking Lemma: sum of degrees = .
The graph is not necessarily connected. For example, it could consist of a (complete graph on 4 vertices, 6 edges) plus a path of 3 vertices (2 edges) plus an isolated vertex, totalling edges — but we need 9 edges. A valid disconnected example: (6 edges) + a triangle (3 edges) = 9 edges, 7 vertices... that's too many. Actually, 6 vertices: (6 edges, 4 vertices) + an edge between the remaining 2 vertices (1 edge) + 2 more edges within the remaining 2 vertices is impossible. Let me reconsider: 6 vertices, 9 edges. Minimum edges for connected = 5 (tree). 9 > 5, so it could be connected but isn't necessarily connected. Example: a on vertices 1-4 (6 edges) and a on vertices 4-6... no, they share vertex 4, making it connected. Two separate components: component 1 has 4 vertices with 6 edges (), component 2 has 2 vertices with 1 edge, but that's only 7 edges. To get 9: (6 edges, 4 vertices) + minus 1 edge = 2 edges, 3 vertices. But that requires 7 vertices. With 6 vertices: 5 in one component, 1 isolated. has 10 edges, too many. So with 6 vertices and 9 edges, the graph must be connected (minimum edges to disconnect would leave one isolated vertex, requiring edges among the other 5, but 9 < 10, so it's possible: 9 edges among 5 vertices and 1 isolated). Yes, it's possible to be disconnected: 5 vertices with 9 edges + 1 isolated vertex. 9 edges among 5 vertices means the graph on those 5 vertices has 9 edges, which is possible ( has 10). So the answer is: no, not necessarily connected.
Problem 8. Explain why Dijkstra's algorithm fails with negative edge weights. Give a counterexample.
Answer
Dijkstra's algorithm relies on the greedy choice property: once a vertex is extracted from the priority queue, is assumed to be final. This is valid only when all edge weights are non-negative, because any alternative path to must pass through an unvisited vertex whose distance is at least .
With negative edges, a shorter path to an already-visited vertex may be discovered later through a not-yet-visited vertex, invalidating the greedy choice.
Counterexample (CLRS, Exercise 24.3-5). Consider vertices with edges:
Dijkstra execution:
| Step | Extract | dist[S] | dist[A] | dist[B] | dist[C] |
|---|---|---|---|---|---|
| 0 | — | 0 | |||
| 1 | S (0) | 0 | 1 | 4 | |
| 2 | A (1) | 0 | 1 | 3 | 4 |
| 3 | B (3) | 0 | 1 | 3 | 4 |
| 4 | C (4) | 0 | 1 | 3 | 4 |
Dijkstra outputs . But the true shortest path is . The algorithm fails because when is extracted at distance 4, it finds a shorter path to (), but has already been extracted and marked as visited.
Recovery: Use the Bellman-Ford algorithm, which correctly handles negative edge weights by relaxing all edges times. Bellman-Ford also detects negative-weight cycles.
Problem 9. Perform a topological sort on the following DAG: edges A→B, A→C, B→D, C→D, D→E.
Answer
Using DFS-based topological sort:
DFS from A: visit A → visit B → visit D → visit E → E finished, D finished, B finished → visit C → C finished, A finished.
Finish order (reverse): A, C, B, D, E → reversed: E, D, B, C, A... wait.
DFS finishes: E, D, B, C, A. Reverse: A, C, B, D, E.
Check: A before B ✓, A before C ✓, B before D ✓, C before D ✓, D before E ✓.
Valid topological order: A, B, C, D, E (or A, C, B, D, E).
Problem 10. Compare Kruskal's and Prim's MST algorithms in terms of time complexity for dense and sparse graphs.
Answer
| Graph type | Kruskal | Prim (binary heap) |
|---|---|---|
| Sparse () | ||
| Dense () |
Kruskal: . Prim (binary heap): .
For dense graphs: Prim with an adjacency matrix (no heap) runs in , which is better than Kruskal's .
Problem 11. Prove that a tree with vertices has exactly edges using induction.
Answer
Proof by induction on .
Base case: . A single vertex has 0 edges. . ✓
Inductive step: Assume all trees with vertices have edges. Consider a tree with vertices. Since has at least 2 vertices (for ), it has at least one leaf (a tree with vertices always has a leaf — otherwise every vertex has degree , and with no cycles, we'd need edges, contradicting ). Remove leaf and its single incident edge. The resulting graph is still a tree (removing a leaf cannot create a cycle, and is still connected since was only connected to one vertex). has vertices, so by the inductive hypothesis, has edges. Adding back and its edge gives edges. ✓
Problem 12. Given a graph represented as an adjacency list, write a function to detect whether it contains a cycle using DFS.
Answer
def has_cycle(graph):
visited = [False] * graph.n
rec_stack = [False] * graph.n
def dfs(u):
visited[u] = True
rec_stack[u] = True
for v, _ in graph.adj[u]:
if not visited[v]:
if dfs(v):
return True
elif rec_stack[v]:
return True
rec_stack[u] = False
return False
for u in range(graph.n):
if not visited[u]:
if dfs(u):
return True
return False
The recursion stack (rec_stack) tracks the current DFS path. If we encounter a vertex that is
already on the current path, we've found a back edge → cycle.
For undirected graphs, we additionally check that the back edge doesn't go to the parent vertex.
For revision on algorithms, see Graph Algorithms.
:::
:::