Testing

1. Introduction to Testing

Definition

Software testing is the process of evaluating a program to determine whether it meets specified requirements and to identify defects.

Why Test?

Correctness: Verify the program produces expected outputs
Reliability: Ensure consistent behaviour under various conditions
Security: Identify vulnerabilities
Performance: Verify the program meets efficiency requirements
Compliance: Meet regulatory and safety standards

Verification vs Validation

Aspect	Verification	Validation
Question	"Are we building the product right?"	"Are we building the right product?"
Focus	Conformance to specification	Meets user needs and expectations
Activity	Reviews, inspections, walkthroughs	Testing with real-world scenarios

2. Levels of Testing

2.1 Unit Testing

Definition: Testing individual components (functions, methods, classes) in isolation.

def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0
    assert add(100, 200) == 300

Characteristics:

Written by developers
Fast to execute
Isolate dependencies using mocks and stubs
High coverage of individual code paths

2.2 Integration Testing

Definition: Testing interactions between integrated components or modules.

Approaches:

Approach	Description
Top-down	Test from the top module down, using stubs for lower modules
Bottom-up	Test from the bottom up, using drivers for higher modules
Big Bang	Integrate all modules at once and test

2.3 System Testing

Definition: Testing the complete, integrated system against its requirements.

Types:

Functional testing: Does the system do what it should?
Non-functional testing: Performance, usability, security, reliability
Regression testing: Re-run tests after changes to ensure nothing broke

2.4 Acceptance Testing

Definition: Testing by the customer or end-user to determine if the system meets their requirements.

Type	Description
Alpha testing	Testing by the development team at the developer's site
Beta testing	Testing by selected users at their own sites
User acceptance test	Formal testing to determine if requirements are met

info

Board-specific

AQA requires unit testing, integration testing, system testing, acceptance testing; requires understanding of test data (normal, boundary, erroneous, extreme)
CIE (9618) covers testing strategies; requires test plans and test data design
OCR (A) requires unit, integration, system, and acceptance testing; may require traceability between requirements and tests
Edexcel covers testing types and test data design

3. Black-Box Testing

Definition

Black-box testing tests the functionality of a system without knowledge of its internal implementation. Tests are based on requirements and specifications.

Equivalence Partitioning

Divide input data into equivalence classes — groups of inputs that the system should treat the same way. Test one representative from each class.

Example: A function accepts ages 0-120.

Equivalence class	Range	Test value
Valid	$[0, 120]$	25
Invalid (too low)	$\lt{} 0$	-1
Invalid (too high)	$\gt{} 120$	150

Boundary Value Analysis

Test values at the boundaries of equivalence classes, where errors are most likely to occur.

Rules: Test the boundary value, and the values immediately above and below.

Example: Age range $[0, 120]$ :

Boundary	Test values
Lower	-1, 0, 1
Upper	119, 120, 121

Why boundaries? Off-by-one errors are among the most common programming mistakes. If a developer writes age < 120 instead of age <= 120, boundary testing catches it immediately.

Decision Table Testing

Create a table listing all combinations of conditions and the expected actions.

Condition 1	Condition 2	Action
True	True	A
True	False	B
False	True	C
False	False	D

State Transition Testing

Test the transitions between states of a system that can be in different states.

Example: A login system has states: Logged Out → Authenticating → Logged In → Locked.

4. White-Box Testing

Definition

White-box (structural) testing uses knowledge of the internal code structure to design tests. Tests are based on code paths, branches, and conditions.

Statement Coverage

Definition: The percentage of executable statements that have been executed by the test suite.

$\mathrm{Statement coverage} = \frac◆LB◆\mathrm{Statements executed}◆RB◆◆LB◆\mathrm{Total statements}◆RB◆ \times 100\%$

Branch (Decision) Coverage

Definition: The percentage of decision outcomes (true/false branches) that have been taken.

$\mathrm{Branch coverage} = \frac◆LB◆\mathrm{Branches taken}◆RB◆◆LB◆\mathrm{Total branches}◆RB◆ \times 100\%$

Theorem. 100% statement coverage does not imply 100% branch coverage.

Proof. Consider:

if condition:
    x = 1
x = 2

A single test with condition = True achieves 100% statement coverage (all 2 statements executed) but only 50% branch coverage (the false branch of the if-statement is never taken). $\square$

Path Coverage

Definition: The percentage of distinct execution paths through the code.

Theorem. 100% path coverage is infeasible for programs with loops (exponentially many paths).

5. Test-Driven Development (TDD)

Process

Red: Write a failing test for the desired functionality
Green: Write the minimum code to make the test pass
Refactor: Improve the code while keeping tests green

Benefits

Forces consideration of the interface before implementation
Comprehensive test suite as a by-product
Confidence in refactoring
Self-documenting code through tests

6. Traceability

Definition

Traceability links requirements to test cases, ensuring every requirement is tested and every test case maps to a requirement.

Requirement → Design → Code → Test Case → Test Result

A traceability matrix maps each requirement to the test cases that verify it.

Problem Set

Problem 1. A function calculate_discount(price, age) applies a discount based on age:

Children (0-12): 50% discount
Seniors (65+): 30% discount
Teenagers (13-17): 10% discount
Adults (18-64): no discount

Using equivalence partitioning and boundary value analysis, identify all test cases.

Answer

Equivalence classes:

Class	Range	Test value
Child	$[0, 12]$	6
Teen	$[13, 17]$	15
Adult	$[18, 64]$	40
Senior	$[65, \infty)$	70
Invalid (negative)	$\lt{} 0$	-1

Boundary value analysis:

Boundary	Values
0	-1, 0, 1
12/13	12, 13, 14
17/18	17, 18, 19
64/65	64, 65, 66

Total test cases: 5 (equivalence) + 12 (boundary) = 17 (some overlap).

Problem 2. Explain the difference between a stub and a mock in unit testing.

Answer

A stub is a simple replacement for a dependency that returns predefined responses. It provides canned answers to calls.

A mock is a more sophisticated replacement that verifies how it was called — it records the calls and can assert that specific methods were called with specific arguments.

Feature	Stub	Mock
Purpose	Provide test data	Verify interactions
Asserts	On return values	On method calls
Complexity	Simple	More complex
Example	Fake database returning fixed records	Verify `send_email()` was called once

Problem 3. Consider the following code. What is the minimum number of test cases to achieve 100% branch coverage?

def classify(x, y):
    if x > 0:
        if y > 0:
            return "Q1"
        else:
            return "Q4"
    else:
        if y > 0:
            return "Q2"
        else:
            return "Q3"

Answer

There are 2 decision points, each with 2 branches → 4 branches total.

2 test cases achieve 100% branch coverage:

classify(1, 1) → "Q1" (both conditions true)
classify(-1, -1) → "Q3" (both conditions false)

This covers all 4 branches: x > 0 (true and false), y > 0 (true and false).

However, for 100% path coverage, we need 4 test cases (one per quadrant):

(1, 1) → Q1
(-1, 1) → Q2
(-1, -1) → Q3
(1, -1) → Q4

Problem 4. Write unit tests for a stack's push, pop, and peek operations. Include edge cases.

Answer

def test_stack():
    s = ArrayStack(5)

    s.push(10)
    assert s.peek() == 10
    assert s.size() == 1

    s.push(20)
    assert s.peek() == 20
    assert s.pop() == 20
    assert s.peek() == 10

    assert s.pop() == 10
    assert s.is_empty()

    try:
        s.pop()
        assert False, "Should have raised"
    except Exception:
        pass

    try:
        s.peek()
        assert False, "Should have raised"
    except Exception:
        pass

Tests cover: push/peek, push/pop order, empty after all pops, pop from empty, peek from empty.

Problem 5. Explain why 100% statement coverage does not guarantee bug-free code. Give a concrete example.

Answer

100% statement coverage means every line of code has been executed at least once, but it does not guarantee:

All combinations of conditions are tested
All data flows are tested
All timing/ordering issues are caught
Integration issues between modules are found

Example:

def process(data):
    result = []
    for item in data:
        if item > 0:
            result.append(item * 2)
        result.append(item)
    return result

Test: process([3]) → [6, 3]. Statement coverage: 100% (all lines executed). But this doesn't test:

Negative items (different branch)
Empty list (edge case)
Zero (boundary)

Problem 6. Describe the difference between top-down and bottom-up integration testing. What are the advantages of each?

Answer

Top-down integration:

Start with the top-level module and integrate downward
Lower-level modules are replaced by stubs (simple stand-ins)
Advantages: High-level design flaws are found early; the system skeleton is visible early
Disadvantages: Stubs may not represent lower modules accurately; testing lower modules in isolation is difficult

Bottom-up integration:

Start with the lowest-level modules and integrate upward
Higher-level modules are replaced by drivers (test harnesses)
Advantages: Low-level modules are thoroughly tested; drivers are simpler than stubs
Disadvantages: The complete system is not visible until late; interface defects between high-level modules may be found late

Problem 7. Create a decision table for a login system where a user can be:

Valid or invalid
Have correct or incorrect password
Account may be locked (after 3 failed attempts)

Answer

Rule	User	Password	Locked	Action
1	Valid	Correct	No	Login success
2	Valid	Incorrect	No	Show error, increment attempts
3	Valid	Correct	Yes	Show "account locked"
4	Valid	Incorrect	Yes	Show "account locked"
5	Invalid	Any	Any	Show "user not found"

Rules 3, 4, 5 could potentially be merged (locked or invalid user always shows an error), but for completeness, they're listed separately.

Problem 8. Explain the concept of regression testing and why it is necessary in iterative development.

Answer

Regression testing is the re-execution of existing test cases after a code change to verify that previously working functionality has not been broken (regressed).

Why necessary in iterative development:

Each sprint modifies existing code → risk of breaking existing features
New features may interact with old features in unexpected ways
Refactoring (improving code structure without changing behaviour) must not introduce bugs
Without regression testing, each iteration could degrade quality, making the system increasingly unstable

Best practices:

Automate regression tests (run them as part of CI/CD pipeline)
Prioritise tests for critical functionality
Run a subset of tests after each change (smoke tests) and the full suite nightly
Use version control to track which tests fail after each change

For revision on software development, see SDLC.

7. Worked Examples: Writing Test Cases

Worked Example: Boundary Value Analysis for a Password Validator

A system requires passwords to be 8-20 characters long, containing at least one uppercase letter, one digit, and one special character.

Boundary value analysis for length:

Boundary	Values	Expected
Min length (8)	7, 8, 9	Reject, Accept, Accept
Max length (20)	19, 20, 21	Accept, Accept, Reject

Equivalence partitioning for character requirements:

Class	Test input	Expected
Valid password	`Abcdef1!`	Accept
No uppercase	`abcdef1!`	Reject
No digit	`Abcdefgh`	Reject
No special char	`Abcdefg1`	Reject
Too short	`Abc1!`	Reject
Too long	`Abcdefghijklmnopqr1!x` (22 chars)	Reject

Worked Example: Equivalence Partitioning for a Date Validator

A function accepts dates in the format DD/MM/YYYY where the year must be between 1900 and 2100.

Equivalence classes:

Class	Description	Test value	Expected
Valid date	Day 1-31, month 1-12, year 1900-2100	`15/06/2024`	Accept
Invalid day	Day > 31 or < 1	`32/01/2024`	Reject
Invalid month	Month > 12 or < 1	`15/13/2024`	Reject
Year too early	Year < 1900	`15/06/1899`	Reject
Year too late	Year > 2100	`15/06/2101`	Reject
Invalid format	Wrong separator or order	`06-15-2024`	Reject

Boundary values for year: 1899, 1900, 1901 and 2099, 2100, 2101

Additional boundary values for day: Test February 29 (leap year): 29/02/2024 (accept), 29/02/2023 (reject)

Worked Example: Test Cases for a Stack with Fixed Size

A stack has a maximum capacity of 5 elements. Operations: push(item), pop(), peek(), is_empty(), is_full().

Test case	Input	Expected output
Push single item	`push(10)`	Stack: `[10]`, size = 1
Push to full	`push` 5 items, then `push(6)`	Error/exception
Pop from full	Push 3 items, `pop()`	Returns 3rd item, size = 2
Pop from empty	`pop()` on empty stack	Error/exception
Peek does not remove	`push(10)`, `peek()`, size	Returns 10, size = 1
Is empty	New stack, `is_empty()`	True
Is full	Push 5 items, `is_full()`	True
Pop then push	Push 5, pop 1, push 1	Stack has 5 items, no error
Order preserved	Push 1, 2, 3; pop twice	Returns 3, then 2

8. Test-Driven Development Workflow

Detailed TDD Cycle

Write failing test → See it fail (Red) → Write minimum code → See it pass (Green) → Refactor → Repeat

Worked Example: TDD for an `is_prime` Function

Step 1 (Red): Write a test.

def test_is_prime():
    assert is_prime(2) == True
    assert is_prime(17) == True
    assert is_prime(1) == False
    assert is_prime(4) == False

Run the test — it fails because is_prime does not exist.

Step 2 (Green): Write the minimum code to pass.

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

Run the test — all tests pass.

Step 3 (Refactor): The loop can be optimised.

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(n**0.5) + 1, 2):
        if n % i == 0:
            return False
    return True

Run the test — still passes. The refactored version is more efficient.

Step 4 (Add more tests): Test edge cases discovered during refactoring.

def test_is_prime():
    assert is_prime(2) == True
    assert is_prime(3) == True
    assert is_prime(17) == True
    assert is_prime(1) == False
    assert is_prime(0) == False
    assert is_prime(-5) == False
    assert is_prime(4) == False
    assert is_prime(9) == False
    assert is_prime(49) == False

Benefits of TDD in Practice

Benefit	Explanation
Design improvement	Writing tests first forces you to think about the interface before implementation
Comprehensive coverage	Every feature has at least one test (written before the feature)
Safe refactoring	Existing tests catch regressions when code is changed
Living documentation	Tests demonstrate how the code is intended to be used
Smaller code	Writing only enough code to pass the test discourages over-engineering

9. Common Pitfalls

Pitfall	Explanation	Avoidance
Testing only the happy path	Edge cases and error conditions are where most bugs hide	Include boundary values, invalid inputs, and empty inputs
Writing tests after the code	Tests become biased toward the implementation, not the specification	Use TDD or write tests from the requirements document
Insufficient branch coverage	A single test passing through `if` does not test the `else`	Use branch coverage analysis to identify untested paths
Test interdependence	Tests that depend on execution order produce false passes/failures	Each test should set up its own state and clean up after itself
Ignoring non-functional testing	Performance, security, and usability bugs reach production	Include load tests, security scans, and user acceptance tests
Over-reliance on code coverage	100% coverage does not mean 100% correctness	Combine coverage metrics with manual test design (equivalence partitioning, BVA)

10. Additional Problem Set

Problem 1. A function calculate_bmi(weight_kg, height_m) returns a BMI category:

BMI < 18.5: "Underweight"
18.5 <= BMI < 25: "Normal"
25 <= BMI < 30: "Overweight"
BMI >= 30: "Obese"

Using boundary value analysis, identify all boundary test cases.

Answer

Boundaries are at BMI values: 18.5, 25, and 30.

For each boundary, test the value, and the values immediately above and below:

Boundary	Test values	Expected
18.5	18.4, 18.5, 18.6	Underweight, Normal, Normal
25.0	24.9, 25.0, 25.1	Normal, Overweight, Overweight
30.0	29.9, 30.0, 30.1	Overweight, Obese, Obese

To produce these BMIs from weight and height, choose fixed height (e.g., 1.7m) and calculate the corresponding weights.

For height = 1.7m: BMI = weight / (1.7 * 1.7) = weight / 2.89

Target BMI	Weight (kg)
18.4	53.2
18.5	53.5
18.6	53.8
24.9	72.0
25.0	72.3
25.1	72.5
29.9	86.4
30.0	86.7
30.1	87.0

Problem 2. A function merge_sorted(a, b) merges two sorted arrays into one sorted array. Write a set of test cases using equivalence partitioning.

Answer

Equivalence classes for inputs:

Class	Description	Test input	Expected
Both non-empty	Normal case	`[1, 3, 5]`, `[2, 4, 6]`	`[1, 2, 3, 4, 5, 6]`
First empty	Edge case	`[]`, `[1, 2]`	`[1, 2]`
Second empty	Edge case	`[1, 2]`, `[]`	`[1, 2]`
Both empty	Edge case	`[]`, `[]`	`[]`
Single element each	Small case	`[1]`, `[2]`	`[1, 2]`
Duplicate values	Data with repeats	`[1, 2, 2]`, `[2, 3]`	`[1, 2, 2, 2, 3]`
Negative values	Include negatives	`[-3, -1]`, `[-2, 0]`	`[-3, -2, -1, 0]`
Different lengths	Unequal arrays	`[1]`, `[2, 3, 4, 5]`	`[1, 2, 3, 4, 5]`
One contains all smaller	No interleaving needed	`[1, 2]`, `[3, 4]`	`[1, 2, 3, 4]`

Problem 3. Explain why a test that achieves 100% branch coverage might still miss a bug. Provide a concrete code example.

Answer

Consider a function that calculates the area of a triangle given three side lengths using Heron's formula:

def triangle_area(a, b, c):
    if a + b > c and b + c > a and a + c > b:
        s = (a + b + c) / 2
        area = (s * (s - a) * (s - b) * (s - c)) ** 0.5
        return area
    return -1

Two test cases achieve 100% branch coverage:

triangle_area(3, 4, 5) returns 6.0 (takes the true branch)
triangle_area(1, 2, 10) returns -1 (takes the false branch)

Both branches are covered. But the following issues are not caught:

Negative side length: triangle_area(-3, 4, 5) — the condition -3 + 4 > 5 is 1 > 5 which is false, so it returns -1. But triangle_area(-3, -4, -5) — the condition -3 + (-4) > -5 is -7 > -5 which is false. These happen to be handled, but only by accident — the function does not explicitly validate for negative inputs.

The key insight: branch coverage tests control flow, not data ranges, not data types, not arithmetic properties. A function can have correct control flow but incorrect logic for specific data values.

Problem 4. Describe how you would apply TDD to develop a function that converts a Roman numeral string to an integer.

Answer

Step 1: Write the simplest failing tests.

def test_roman_to_int():
    assert roman_to_int("I") == 1
    assert roman_to_int("V") == 5
    assert roman_to_int("X") == 10

Step 2: Minimum code to pass.

def roman_to_int(s):
    values = {"I": 1, "V": 5, "X": 10}
    return values[s]

Step 3: Add more tests.

    assert roman_to_int("III") == 3
    assert roman_to_int("IV") == 4
    assert roman_to_int("IX") == 9
    assert roman_to_int("LVIII") == 58
    assert roman_to_int("MCMXCIV") == 1994

Step 4: Expand implementation.

def roman_to_int(s):
    values = {"I": 1, "V": 5, "X": 10, "L": 50, "C": 100, "D": 500, "M": 1000}
    total = 0
    for i in range(len(s)):
        if i + 1 < len(s) and values[s[i]] < values[s[i + 1]]:
            total -= values[s[i]]
        else:
            total += values[s[i]]
    return total

Step 5: Refactor and add edge case tests.

    assert roman_to_int("") == 0

Each TDD cycle adds a test case that captures a new requirement (repeated characters, subtractive notation, multi-character numerals), then implements just enough code to satisfy it.

Problem 5. A software team has 500 test cases. Running all tests takes 2 hours. After a code change, the team only wants to run the tests most likely to fail. Describe a strategy for selecting which tests to run.

Answer

Test prioritisation strategies:

Impact analysis: Identify which modules were changed and run only the tests that cover those modules. If function X was modified, run all tests that call X directly or indirectly.
Test categorisation:
- Smoke tests (5 minutes): Core functionality — login, database connection, API health. Run always.
- Regression tests (30 minutes): Tests for previously-fixed bugs. Run after each commit.
- Full suite (2 hours): Run nightly or before release.
Historical failure rate: Prioritise tests that have failed most frequently in the past. Tests that always pass are less likely to catch new bugs.
Code change proximity: Tests for code that is "close" (in the call graph) to the changed code are more likely to fail than tests for unrelated modules.
Risk-based selection: If the change is to authentication code, prioritise all security-related tests. If the change is to the UI, prioritise UI tests.

The most practical approach: run smoke tests on every commit, run affected module tests on every pull request, and run the full suite nightly.

:::

1. Introduction to Testing​

Definition​

Why Test?​

Verification vs Validation​

2. Levels of Testing​

2.1 Unit Testing​

2.2 Integration Testing​

2.3 System Testing​

2.4 Acceptance Testing​

3. Black-Box Testing​

Definition​

Equivalence Partitioning​

Boundary Value Analysis​

Decision Table Testing​

State Transition Testing​

4. White-Box Testing​

Definition​

Statement Coverage​

Branch (Decision) Coverage​

Path Coverage​

5. Test-Driven Development (TDD)​

Process​

Benefits​

6. Traceability​

Definition​

Problem Set​

7. Worked Examples: Writing Test Cases​

Worked Example: Boundary Value Analysis for a Password Validator​

Worked Example: Equivalence Partitioning for a Date Validator​

Worked Example: Test Cases for a Stack with Fixed Size​

8. Test-Driven Development Workflow​

Detailed TDD Cycle​

Worked Example: TDD for an is_prime Function​

Benefits of TDD in Practice​

9. Common Pitfalls​

10. Additional Problem Set​

1. Introduction to Testing

Definition

Why Test?

Verification vs Validation

2. Levels of Testing

2.1 Unit Testing

2.2 Integration Testing

2.3 System Testing

2.4 Acceptance Testing

3. Black-Box Testing

Definition

Equivalence Partitioning

Boundary Value Analysis

Decision Table Testing

State Transition Testing

4. White-Box Testing

Definition

Statement Coverage

Branch (Decision) Coverage

Path Coverage

5. Test-Driven Development (TDD)

Process

Benefits

6. Traceability

Definition

Problem Set

7. Worked Examples: Writing Test Cases

Worked Example: Boundary Value Analysis for a Password Validator

Worked Example: Equivalence Partitioning for a Date Validator

Worked Example: Test Cases for a Stack with Fixed Size

8. Test-Driven Development Workflow

Detailed TDD Cycle

Worked Example: TDD for an `is_prime` Function

Benefits of TDD in Practice

9. Common Pitfalls

10. Additional Problem Set