Skip to main content

Graphs

1. Graph Fundamentals

Definition

A graph G=(V,E)G = (V, E) consists of a set of vertices (nodes) VV and a set of edges EV×VE \subseteq V \times V.

  • Undirected graph: Edges have no direction; (u,v)=(v,u)(u, v) = (v, u)
  • Directed graph (digraph): Edges are ordered pairs; (u,v)(v,u)(u, v) \neq (v, u)
  • Weighted graph: Each edge has an associated weight w:ERw: E \to \mathbb{R}

Terminology

TermDefinition
AdjacentTwo vertices connected by an edge
DegreeNumber of edges incident to a vertex (undirected)
In-degreeNumber of incoming edges (directed)
Out-degreeNumber of outgoing edges (directed)
PathSequence of vertices where consecutive vertices are adjacent
CyclePath that starts and ends at the same vertex
ConnectedThere is a path between every pair of vertices
DAGDirected Acyclic Graph — a directed graph with no cycles

Theorem (Handshaking Lemma). The sum of all vertex degrees equals 2E2|E|.

Proof. Each edge contributes 1 to the degree of each of its two endpoints. Summing degrees counts each edge exactly twice. \square


2. Graph Representations

Adjacency Matrix

An n×nn \times n matrix AA where A[i][j]=1A[i][j] = 1 if edge (i,j)(i, j) exists (0 otherwise). For weighted graphs, A[i][j]=w(i,j)A[i][j] = w(i,j).

PropertyValue
SpaceO(n2)O(n^2)
Add edgeO(1)O(1)
Remove edgeO(1)O(1)
Check adjacencyO(1)O(1)
List neighboursO(n)O(n)
Iterate all edgesO(n2)O(n^2)

Best for: Dense graphs (En2|E| \approx n^2).

Adjacency List

An array of nn lists, where adj[i] contains all vertices adjacent to vertex ii.

class Graph:
def __init__(self, n, directed=False):
self.n = n
self.adj = [[] for _ in range(n)]
self.directed = directed

def add_edge(self, u, v, weight=1):
self.adj[u].append((v, weight))
if not self.directed:
self.adj[v].append((u, weight))
PropertyValue
SpaceO(n+E)O(n + \lvert E \rvert)
Add edgeO(1)O(1)
Remove edgeO(degree)O(\mathrm{degree})
Check adjacencyO(degree)O(\mathrm{degree})
List neighboursO(degree)O(\mathrm{degree})
Iterate all edgesO(n+E)O(n + \lvert E \rvert)

Best for: Sparse graphs (En2|E| \ll n^2).

Comparison

PropertyAdjacency MatrixAdjacency List
SpaceO(V2)O(V^2)O(V+E)O(V + E)
Edge lookupO(1)O(1)O(deg)O(\mathrm{deg})
Add edgeO(1)O(1)O(1)O(1)
Sparse graphsWastefulEfficient
Dense graphsEfficientSlightly slower
info

Board-specific

  • AQA requires adjacency matrix and adjacency list representations; Dijkstra's algorithm for shortest path
  • CIE (9618) requires graph representations and traversal; may include minimum spanning tree algorithms (Kruskal's, Prim's)
  • OCR (A) requires adjacency matrix/list; Dijkstra's, Kruskal's, and Prim's algorithms
  • Edexcel covers basic graph representations and traversals

3. Graph Traversals

3.1 Breadth-First Search (BFS)

BFS explores vertices level by level, using a queue.

from collections import deque

def bfs(graph, start):
visited = [False] * graph.n
distance = [-1] * graph.n
queue = deque([start])
visited[start] = True
distance[start] = 0

while queue:
u = queue.popleft()
for v, w in graph.adj[u]:
if not visited[v]:
visited[v] = True
distance[v] = distance[u] + 1
queue.append(v)

return distance

Theorem (BFS finds shortest paths). In an unweighted graph, BFS from source ss computes the shortest-path distance d(s,v)d(s, v) for every vertex vv.

Proof. We prove by induction on the distance dd that all vertices at distance dd from ss are discovered (enqueued) at distance dd, and no vertex is discovered at a distance greater than its true shortest distance.

Base case (d=0d = 0). ss is at distance 0 and is discovered at distance 0.

Inductive step. Assume all vertices at distance dd are discovered at distance dd. When a vertex uu at distance dd is dequeued, all unvisited neighbours vv are at distance at most d+1d + 1 (by edge relaxation). If vv were at distance <d+1\lt{} d + 1, it would have been discovered earlier (by the inductive hypothesis or a previous BFS level). So vv is at distance exactly d+1d + 1 and is discovered at that distance. No vertex can be discovered at distance >d+1\gt{} d + 1 through uu, since each edge adds exactly 1 to the path length. \square

Complexity: O(V+E)O(V + E) — each vertex is visited once, each edge is examined at most twice (once from each endpoint in undirected graphs).

3.2 Depth-First Search (DFS)

DFS explores as far as possible along each branch before backtracking, using a stack (or recursion).

def dfs(graph, start):
visited = [False] * graph.n

def dfs_visit(u):
visited[u] = True
for v, w in graph.adj[u]:
if not visited[v]:
dfs_visit(v)

dfs_visit(start)

Complexity: O(V+E)O(V + E) — same analysis as BFS.

BFS vs DFS

PropertyBFSDFS
Data structureQueueStack (recursion)
ExplorationLevel by levelBranch by branch
Shortest pathYes (unweighted)No
MemoryO(V)O(V) (queue)O(V)O(V) (recursion stack)
Use casesShortest path, level orderCycle detection, topological sort, connected components

4. Dijkstra's Algorithm

Problem

Find the shortest path from a source vertex ss to all other vertices in a weighted graph with non-negative weights.

Algorithm

import heapq

def dijkstra(graph, source):
dist = [float('inf')] * graph.n
dist[source] = 0
pq = [(0, source)]
visited = [False] * graph.n

while pq:
d, u = heapq.heappop(pq)
if visited[u]:
continue
visited[u] = True
for v, w in graph.adj[u]:
if not visited[v] and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
heapq.heappush(pq, (dist[v], v))

return dist

Correctness Proof

Theorem. Dijkstra's algorithm computes the shortest-path distance from ss to every vertex vv in a graph with non-negative edge weights.

Proof. We prove by induction that when a vertex uu is extracted from the priority queue (marked visited), dist[u]\mathrm{dist}[u] equals the true shortest distance d(s,u)d(s, u).

Let SS be the set of visited vertices. We maintain the invariant: for every uSu \in S, dist[u]=d(s,u)\mathrm{dist}[u] = d(s, u).

Base case. ss is extracted first with dist[s]=0=d(s,s)\mathrm{dist}[s] = 0 = d(s, s). ✓

Inductive step. Let uu be the next vertex extracted. Assume for contradiction that dist[u]>d(s,u)\mathrm{dist}[u] \gt{} d(s, u). Then there exists a shortest path PP from ss to uu. Let xx be the first vertex on PP not in SS, and let yy be the predecessor of xx on PP (ySy \in S). Then:

dist[x]dist[y]+w(y,x)=d(s,y)+w(y,x)=d(s,x)d(s,u)<dist[u]\mathrm{dist}[x] \leq \mathrm{dist}[y] + w(y, x) = d(s, y) + w(y, x) = d(s, x) \leq d(s, u) < \mathrm{dist}[u]

Since dist[x]<dist[u]\mathrm{dist}[x] \lt{} \mathrm{dist}[u], xx would have been extracted from the priority queue before uu — contradiction. Therefore dist[u]=d(s,u)\mathrm{dist}[u] = d(s, u). \square

Complexity: With a binary heap: O((V+E)logV)O((V + E) \log V). Each vertex is extracted once (O(logV)O(\log V) each), and each edge causes at most one decrease-key (O(logV)O(\log V) each).

warning

warning Bellman-Ford algorithm instead for graphs that may contain negative weights.


5. Minimum Spanning Tree (MST)

Definition

A spanning tree of a connected graph GG is a subgraph that is a tree and includes all vertices. A minimum spanning tree is the spanning tree with the minimum total edge weight.

Cut Property

Lemma (Cut Property). For any cut of the graph, the minimum-weight edge crossing the cut belongs to some MST.

Proof. Let ee be the minimum-weight edge crossing cut (S,VS)(S, V \setminus S). Suppose ee is not in MST TT. Adding ee to TT creates a cycle. This cycle must cross the cut at least twice (once via ee), so there exists another edge ee' in the cycle crossing the cut. Since ee is the minimum-weight crossing edge, w(e)w(e)w(e) \leq w(e'). Replacing ee' with ee in TT yields a spanning tree with weight w(T)\leq w(T). Since TT is minimum, w(e)=w(e)w(e) = w(e'), and the new tree is also an MST containing ee. \square

Kruskal's Algorithm

def kruskal(graph):
edges = []
for u in range(graph.n):
for v, w in graph.adj[u]:
if u < v:
edges.append((w, u, v))
edges.sort()

parent = list(range(graph.n))
rank = [0] * graph.n

def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x

def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return False
if rank[rx] < rank[ry]:
rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
return True

mst = []
for w, u, v in edges:
if union(u, v):
mst.append((u, v, w))
if len(mst) == graph.n - 1:
break

return mst

Correctness. Kruskal's algorithm is correct by the cut property. When an edge ee is added, the vertices it connects are in different components — this defines a cut where ee is the minimum crossing edge (since edges are processed in sorted order). By the cut property, ee belongs to some MST.

Complexity: Sorting: O(ElogE)O(E \log E). Union-Find operations: O(Eα(V))O(E \cdot \alpha(V)), where α\alpha is the inverse Ackermann function (effectively O(1)O(1)). Total: O(ElogE)=O(ElogV)O(E \log E) = O(E \log V).

Prim's Algorithm

def prim(graph, start):
dist = [float('inf')] * graph.n
in_mst = [False] * graph.n
dist[start] = 0
pq = [(0, start)]
mst_weight = 0

while pq:
d, u = heapq.heappop(pq)
if in_mst[u]:
continue
in_mst[u] = True
mst_weight += d
for v, w in graph.adj[u]:
if not in_mst[v] and w < dist[v]:
dist[v] = w
heapq.heappush(pq, (w, v))

return mst_weight

Correctness. Prim's is correct by the cut property. At each step, the cut separates MST vertices from non-MST vertices, and the algorithm selects the minimum-weight crossing edge.

Complexity: O((V+E)logV)O((V + E) \log V) with a binary heap.


6. Topological Sort

A topological ordering of a DAG is a linear ordering of vertices such that for every directed edge (u,v)(u, v), uu comes before vv.

Algorithm: DFS-based. When a vertex finishes, prepend it to the result.

def topological_sort(graph):
visited = [False] * graph.n
result = []

def visit(u):
visited[u] = True
for v, _ in graph.adj[u]:
if not visited[v]:
visit(v)
result.append(u)

for u in range(graph.n):
if not visited[u]:
visit(u)

return result[::-1]

Complexity: O(V+E)O(V + E).


Problem Set

Problem 1. Represent the following graph using both an adjacency matrix and an adjacency list.

Vertices: {A, B, C, D}. Edges: {A-B, A-C, B-C, C-D, D-A}

Answer

Adjacency matrix (index: A=0, B=1, C=2, D=3):

(0111101011011010)\begin{pmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{pmatrix}

Adjacency list:

  • A: [B, C, D]
  • B: [A, C]
  • C: [A, B, D]
  • D: [C, A]

Problem 2. Trace BFS starting from vertex A on the graph from Problem 1. List the order in which vertices are visited and their distances.

Answer
StepDequeueVisitNeighboursQueue (front→rear)Distances
0[A]A:0
1AB, C, DB, C, D[B, C, D]B:1, C:1, D:1
2BA, CA (visited)[C, D]
3CA, B, DA, B (visited), D (visited)[D]
4DC, AC, A (visited)[]

Visit order: A, B, C, D. Distances: A:0, B:1, C:1, D:1.

Problem 3. Apply Dijkstra's algorithm to find the shortest paths from vertex A in the following weighted graph. Edges: A→B (4), A→C (2), B→C (1), B→D (5), C→B (1), C→D (8), C→E (10), D→E (2).

Answer
StepExtractdist[A]dist[B]dist[C]dist[D]dist[E]
00
1A (0)042
2C (2)0321012
3B (3)032812
4D (8)032810
5E (10)032810

Shortest paths: A→A: 0, A→B: 3 (A→C→B), A→C: 2, A→D: 8 (A→C→B→D), A→E: 10 (A→C→B→D→E).

Problem 4. Find the MST of the following graph using Kruskal's algorithm. Edges with weights: A-B (4), A-C (2), B-C (1), B-D (5), C-D (8), D-E (3).

Answer

Sorted edges: B-C (1), A-C (2), D-E (3), A-B (4), B-D (5), C-D (8)

EdgeWeightUnion?MST edges
B-C1Yes{B-C}
A-C2Yes{B-C, A-C}
D-E3Yes{B-C, A-C, D-E}
A-B4No (cycle A-B-C-A)
B-D5Yes{B-C, A-C, D-E, B-D}

MST weight: 1+2+3+5=111 + 2 + 3 + 5 = 11. 4 edges for 5 vertices. ✓

Problem 5. Prove that BFS uses O(V)O(V) space in the worst case.

Answer

In the worst case, all vertices at the same distance from the source are in the queue simultaneously. In a graph where the source is connected to all other vertices, at distance 1 there are V1V - 1 vertices in the queue. In a star graph, the maximum queue size is V1V - 1. In a complete graph, BFS visits one level at a time, and the maximum queue size is bounded by the number of vertices at the maximum depth, which is at most V1V - 1. Hence the space is O(V)O(V). \square

Problem 6. Given a DAG, explain why topological sort is possible but BFS-based shortest path (for unweighted graphs) might not produce correct results if cycles exist.

Answer

Topological sort is only defined for DAGs (graphs without directed cycles). If a directed cycle exists, there is no valid topological ordering because for any edge (u,v)(u, v) in the cycle, uu must come before vv, and following the cycle leads to a contradiction.

For shortest paths: in an unweighted graph with cycles, BFS still works correctly because BFS visits each vertex at most once (it marks vertices as visited). The shortest path distance is still well-defined even with cycles, since a cycle would only increase the path length. However, for weighted graphs with negative cycles, shortest paths are undefined (you can keep going around the cycle to decrease the distance).

Problem 7. A graph has 6 vertices and 9 edges. What is the sum of all vertex degrees? Is this graph necessarily connected?

Answer

By the Handshaking Lemma: sum of degrees = 2E=2×9=182|E| = 2 \times 9 = 18.

The graph is not necessarily connected. For example, it could consist of a K4K_4 (complete graph on 4 vertices, 6 edges) plus a path of 3 vertices (2 edges) plus an isolated vertex, totalling 6+2=86 + 2 = 8 edges — but we need 9 edges. A valid disconnected example: K4K_4 (6 edges) + a triangle (3 edges) = 9 edges, 7 vertices... that's too many. Actually, 6 vertices: K4K_4 (6 edges, 4 vertices) + an edge between the remaining 2 vertices (1 edge) + 2 more edges within the remaining 2 vertices is impossible. Let me reconsider: 6 vertices, 9 edges. Minimum edges for connected = 5 (tree). 9 > 5, so it could be connected but isn't necessarily connected. Example: a K4K_4 on vertices 1-4 (6 edges) and a K3K_3 on vertices 4-6... no, they share vertex 4, making it connected. Two separate components: component 1 has 4 vertices with 6 edges (K4K_4), component 2 has 2 vertices with 1 edge, but that's only 7 edges. To get 9: K4K_4 (6 edges, 4 vertices) + K3K_3 minus 1 edge = 2 edges, 3 vertices. But that requires 7 vertices. With 6 vertices: 5 in one component, 1 isolated. K5K_5 has 10 edges, too many. So with 6 vertices and 9 edges, the graph must be connected (minimum edges to disconnect would leave one isolated vertex, requiring (52)=10\leq \binom{5}{2} = 10 edges among the other 5, but 9 < 10, so it's possible: 9 edges among 5 vertices and 1 isolated). Yes, it's possible to be disconnected: 5 vertices with 9 edges + 1 isolated vertex. 9 edges among 5 vertices means the graph on those 5 vertices has 9 edges, which is possible (K5K_5 has 10). So the answer is: no, not necessarily connected.

Problem 8. Explain why Dijkstra's algorithm fails with negative edge weights. Give a counterexample.

Answer

Dijkstra's algorithm relies on the greedy choice property: once a vertex uu is extracted from the priority queue, dist[u]\mathrm{dist}[u] is assumed to be final. This is valid only when all edge weights are non-negative, because any alternative path to uu must pass through an unvisited vertex whose distance is at least dist[u]\mathrm{dist}[u].

With negative edges, a shorter path to an already-visited vertex may be discovered later through a not-yet-visited vertex, invalidating the greedy choice.

Counterexample (CLRS, Exercise 24.3-5). Consider vertices S,A,B,CS, A, B, C with edges:

S1A2B1C,S4C3BS \xrightarrow{1} A \xrightarrow{2} B \xrightarrow{1} C, \quad S \xrightarrow{4} C \xrightarrow{-3} B

Dijkstra execution:

StepExtractdist[S]dist[A]dist[B]dist[C]
00\infty\infty\infty
1S (0)01\infty4
2A (1)0134
3B (3)0134
4C (4)0134

Dijkstra outputs dist[B]=3\mathrm{dist}[B] = 3. But the true shortest path is SCB=4+(3)=1S \to C \to B = 4 + (-3) = 1. The algorithm fails because when CC is extracted at distance 4, it finds a shorter path to BB (43=14 - 3 = 1), but BB has already been extracted and marked as visited.

Recovery: Use the Bellman-Ford algorithm, which correctly handles negative edge weights by relaxing all edges V1|V| - 1 times. Bellman-Ford also detects negative-weight cycles.

Problem 9. Perform a topological sort on the following DAG: edges A→B, A→C, B→D, C→D, D→E.

Answer

Using DFS-based topological sort:

DFS from A: visit A → visit B → visit D → visit E → E finished, D finished, B finished → visit C → C finished, A finished.

Finish order (reverse): A, C, B, D, E → reversed: E, D, B, C, A... wait.

DFS finishes: E, D, B, C, A. Reverse: A, C, B, D, E.

Check: A before B ✓, A before C ✓, B before D ✓, C before D ✓, D before E ✓.

Valid topological order: A, B, C, D, E (or A, C, B, D, E).

Problem 10. Compare Kruskal's and Prim's MST algorithms in terms of time complexity for dense and sparse graphs.

Answer
Graph typeKruskalPrim (binary heap)
Sparse (EVE \approx V)O(VlogV)O(V \log V)O(VlogV)O(V \log V)
Dense (EV2E \approx V^2)O(V2logV)O(V^2 \log V)O(V2logV)O(V^2 \log V)

Kruskal: O(ElogE)=O(ElogV)O(E \log E) = O(E \log V). Prim (binary heap): O((V+E)logV)O((V+E) \log V).

For dense graphs: Prim with an adjacency matrix (no heap) runs in O(V2)O(V^2), which is better than Kruskal's O(V2logV)O(V^2 \log V).

Problem 11. Prove that a tree with nn vertices has exactly n1n - 1 edges using induction.

Answer

Proof by induction on nn.

Base case: n=1n = 1. A single vertex has 0 edges. 0=110 = 1 - 1. ✓

Inductive step: Assume all trees with kk vertices have k1k - 1 edges. Consider a tree TT with k+1k + 1 vertices. Since TT has at least 2 vertices (for k1k \geq 1), it has at least one leaf vv (a tree with 2\geq 2 vertices always has a leaf — otherwise every vertex has degree 1\geq 1, and with no cycles, we'd need n\geq n edges, contradicting E=n1|E| = n - 1). Remove leaf vv and its single incident edge. The resulting graph TT' is still a tree (removing a leaf cannot create a cycle, and TT' is still connected since vv was only connected to one vertex). TT' has kk vertices, so by the inductive hypothesis, TT' has k1k - 1 edges. Adding back vv and its edge gives (k1)+1=k(k - 1) + 1 = k edges. ✓ \square

Problem 12. Given a graph represented as an adjacency list, write a function to detect whether it contains a cycle using DFS.

Answer
def has_cycle(graph):
visited = [False] * graph.n
rec_stack = [False] * graph.n

def dfs(u):
visited[u] = True
rec_stack[u] = True
for v, _ in graph.adj[u]:
if not visited[v]:
if dfs(v):
return True
elif rec_stack[v]:
return True
rec_stack[u] = False
return False

for u in range(graph.n):
if not visited[u]:
if dfs(u):
return True
return False

The recursion stack (rec_stack) tracks the current DFS path. If we encounter a vertex that is already on the current path, we've found a back edge → cycle.

For undirected graphs, we additionally check that the back edge doesn't go to the parent vertex.

For revision on algorithms, see Graph Algorithms.

:::

:::