Data Representation in Programming
1. Primitive Types and Their Representation
Integer Representation
Programming languages provide integer types of various sizes:
| Type | Size | Range (signed) | Range (unsigned) |
|---|---|---|---|
| byte | 8 bits | ||
| short | 16 bits | ||
| int | 32 bits | ||
| long | 64 bits |
Python integers have arbitrary precision — they grow to accommodate any value, limited only by available memory.
Floating-Point Representation
IEEE 754 double precision (64 bits): 1 sign bit, 11 exponent bits, 52 mantissa bits.
Precision issues:
>>> 0.1 + 0.2
0.30000000000000004
>>> 0.1 + 0.2 == 0.3
False
Why: cannot be represented exactly in binary floating point (like cannot be represented exactly in decimal).
Pitfall Never use == to compare floating-point numbers. Use abs(a - b) < epsilon with
a small tolerance (e.g., 1e-9).
def approx_equal(a, b, epsilon=1e-9):
return abs(a - b) < epsilon
2. Pointers and References
Definition
A pointer is a variable that stores the memory address of another variable. A reference is an alias for an existing variable.
Pointers in Low-Level Languages
int x = 42;
int *ptr = &x; // ptr stores the address of x
*ptr = 10; // dereference: change x to 10
Python's Model: References, Not Pointers
Python does not have explicit pointers. Variables are references to objects in memory.
a = [1, 2, 3]
b = a # b references the SAME list object
b[0] = 99
print(a) # [99, 2, 3] — a is also modified!
Key distinction:
| Operation | Effect |
|---|---|
b = a | b references the same object as a |
b = a.copy() | b references a new, independent copy |
b = list(a) | Same as a.copy() |
import copy; b = copy.deepcopy(a) | Deep copy (copies nested objects) |
Aliasing
Aliasing occurs when two variables reference the same object. This can lead to unintended side effects.
def append_one(lst):
lst.append(1) # Modifies the original list!
my_list = [0]
append_one(my_list)
print(my_list) # [0, 1]
3. Strings
Definition
A string is a sequence of characters. Internally, strings are represented as arrays of character codes (e.g., UTF-8 or UTF-16).
String Operations and Complexity
| Operation | Python method | Time |
|---|---|---|
| Access character | s[i] | |
| Length | len(s) | |
| Concatenation | s1 + s2 | |
| Substring search | s1 in s2 | naive, optimized |
| Split | s.split(sep) | |
| Slice | s[a:b] |
Pitfall In Python, strings are immutable — you cannot modify individual characters.
s[0] = 'x' raises a TypeError. Use s = 'x' + s[1:] to create a new string.
String Immutability
Strings are immutable for several reasons:
- Security: Prevents sensitive data from being modified in memory
- Thread safety: Immutable objects are inherently thread-safe
- Hashing: Immutable strings can be used as dictionary keys (hash is stable)
- Interning: Python can reuse identical string objects, saving memory
Board-specific AQA requires ASCII, Unicode (UTF-8, UTF-16), image representation (pixels, colour depth, resolution), sound sampling (sample rate, bit depth). CIE (9618) covers similar topics but may emphasise different aspects; requires understanding of file sizes and capacity calculations. OCR (A) requires character encoding, image representation, and sound representation with specific detail on compression (lossy vs lossless). Edexcel covers data representation fundamentals including number systems and character encoding.
4. File Handling
File Modes
| Mode | Description | Creates? | Truncates? |
|---|---|---|---|
'r' | Read | No | No |
'w' | Write | Yes | Yes |
'a' | Append | Yes | No |
'r+' | Read + Write | No | No |
Reading Files
with open("data.txt", "r") as f:
content = f.read() # Entire file as string
lines = f.readlines() # List of lines
Writing Files
with open("output.txt", "w") as f:
f.write("Hello, World!\n")
f.writelines(["Line 1\n", "Line 2\n"])
CSV Files
import csv
with open("data.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
print(row)
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Age"])
writer.writerow(["Alice", 25])
The with Statement
The with statement ensures the file is properly closed, even if an exception occurs during file
operations. This is an example of context management.
with open("file.txt", "r") as f:
data = f.read()
# File is automatically closed here
5. Exception Handling
Definition
An exception is an event that disrupts the normal flow of program execution. Exception handling allows a program to detect and recover from errors gracefully.
Structure
try:
result = 10 / 0
except ZeroDivisionError as e:
print(f"Error: {e}")
except (TypeError, ValueError) as e:
print(f"Type/Value error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
else:
print("No exceptions occurred")
finally:
print("This always runs")
Exception Hierarchy
BaseException
├── SystemExit
├── KeyboardInterrupt
├── GeneratorExit
└── Exception
├── ArithmeticError
│ ├── ZeroDivisionError
│ └── OverflowError
├── LookupError
│ ├── IndexError
│ └── KeyError
├── TypeError
├── ValueError
├── FileNotFoundError
└── ...
Best Practices
- Catch specific exceptions, not bare
except: - Use
finallyfor cleanup (closing files, releasing resources) - Don't use exceptions for normal flow control
- Raise exceptions for truly exceptional conditions
def set_age(age):
if age < 0:
raise ValueError(f"Age cannot be negative: {age}")
self._age = age
Custom Exceptions
class InsufficientFundsError(Exception):
def __init__(self, balance, amount):
self.balance = balance
self.amount = amount
super().__init__(f"Insufficient funds: balance={balance}, requested={amount}")
Problem Set
Problem 1. Explain why 0.1 + 0.2 != 0.3 in most programming languages. What is the binary
representation of 0.1?
Answer
in binary: (repeating). This cannot be represented exactly in a finite number of binary digits. The IEEE 754 double-precision representation stores an approximation, which introduces a small rounding error. When and (both approximations) are added, the result is , not exactly .
Solution: use abs(a - b) < 1e-9 for comparison, or use the decimal module for exact decimal
arithmetic.
Problem 2. What is the output of the following code? Explain.
a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)
Answer
[1, 2, 3, 4]
True
b = a makes b reference the same list object as a (aliasing). Modifying b also modifies a.
a is b returns True because they reference the same object.
To avoid this: b = a.copy() or b = a[:].
Problem 3. Write a function that reads a file and counts the occurrences of each word. Handle the case where the file does not exist.
Answer
from collections import Counter
def count_words(filename):
try:
with open(filename, "r") as f:
content = f.read().lower()
words = content.split()
return Counter(words)
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
return {}
Problem 4. Explain the difference between shallow copy and deep copy. Give an example where they produce different results.
Answer
Shallow copy: Creates a new container but fills it with references to the same objects as the original.
Deep copy: Recursively copies all objects, creating entirely independent copies.
import copy
original = [[1, 2], [3, 4]]
shallow = copy.copy(original)
deep = copy.deepcopy(original)
original[0][0] = 99
print(shallow) # [[99, 2], [3, 4]] — modified!
print(deep) # [[1, 2], [3, 4]] — unchanged
The shallow copy shares the inner lists with the original. The deep copy has independent inner lists.
Problem 5. Write a function that safely divides two numbers, handling division by zero and non-numeric input.
Answer
def safe_divide(a, b):
try:
result = float(a) / float(b)
return result
except ZeroDivisionError:
return "Error: Division by zero"
except (TypeError, ValueError):
return "Error: Non-numeric input"
Problem 6. Explain why strings are immutable in Python. What are the advantages and disadvantages?
Answer
Advantages:
- Thread safety: No locks needed for string operations
- Hash stability: Strings can be dictionary keys (hash doesn't change)
- Security: Sensitive data (passwords) cannot be modified after creation
- Interning: Python can reuse identical string objects, saving memory
- Simplicity: No confusion about whether
s[0] = 'x'modifies the original or a copy
Disadvantages:
- Memory overhead: Every modification creates a new string object
- Performance: Concatenation in a loop is without optimisation
# Inefficient: O(n^2) — creates new string each iteration
s = ""
for i in range(1000):
s += str(i)
# Efficient: O(n) — use join
s = "".join(str(i) for i in range(1000))
Problem 7. A bitmap image has a resolution of pixels and uses 24-bit colour depth. Calculate the uncompressed file size in MB (using bytes). If lossless compression achieves a 3:1 ratio, what is the compressed file size in MB?
Answer
Uncompressed:
With 3:1 compression:
Problem 8. An audio file is recorded at a sample rate of with a bit depth of 16 bits, for a duration of 3 minutes in stereo (2 channels). Calculate the file size in MB (using bytes).
Answer
Problem 9. A text file contains the string "Hello, 世界!" (9 characters). ASCII uses 7 bits
per character. UTF-8 uses 1 byte for ASCII characters and 3 bytes for CJK characters. Calculate the
storage required in bytes for both encodings. Why is Unicode necessary?
Answer
ASCII: ASCII can only represent 128 characters (0–127) and cannot encode "世界". The ASCII
encoding would either produce an error or replace each CJK character with a placeholder (e.g., ?).
If we consider only the encodable characters ("Hello, !"), that is 7 characters at 1 byte each = 7
bytes. The CJK characters cannot be stored.
UTF-8:
| Character | Bytes |
|---|---|
| H, e, l, l, o, ,, space, ! (8 ASCII chars) | 1 byte each = 8 bytes |
| 世 | 3 bytes |
| 界 | 3 bytes |
Why Unicode is needed: ASCII only defines 128 characters, covering basic Latin letters, digits, and symbols. It cannot represent characters from other scripts (Chinese, Arabic, Cyrillic, etc.), mathematical symbols, or emoji. Unicode provides a universal character set of over 149,000 characters across 161 scripts, ensuring every character in every language can be uniquely encoded. UTF-8 is backwards compatible with ASCII, so existing ASCII text works without modification while gaining support for all other scripts.
Problem 10. A system stores 1000 images at 4K resolution () with 32-bit colour depth. Calculate the total storage required in GB (using bytes). If lossless compression achieves a 2:1 ratio, what is the compressed total size in GB?
Answer
Uncompressed size of one image:
Total for 1000 images:
With 2:1 lossless compression:
Working directly with bytes:
For revision on number representation, see Number Systems and Floating Point.
:::
:::