Fieldwork and Research

Research Methods in Geography

Primary and Secondary Data

Primary data is collected firsthand by the researcher for the specific purpose of the investigation. Methods include field observations, measurements, questionnaires, interviews, and experiments. Primary data is directly relevant to the research question but can be time-consuming and expensive to collect.

Secondary data is data that has been collected by someone else for another purpose. Sources include government statistics (Census, ONS), academic publications, maps, satellite imagery, historical records, and media reports. Secondary data is readily available and often covers large areas or long time periods, but may not be precisely relevant to the research question and may be of variable quality.

Quantitative and Qualitative Methods

Quantitative methods produce numerical data that can be analysed statistically. Examples: measurements of river velocity, questionnaires with closed questions, sediment size analysis, pedestrian counts. Quantitative data is objective, replicable, and allows comparison and generalisation, but may oversimplify complex phenomena.

Qualitative methods produce non-numerical data (words, images, observations). Examples: interviews, open-ended questionnaires, field sketches, participant observation, photographs. Qualitative data provides depth, context, and rich description but is subjective, difficult to analyse systematically, and may not be generalisable.

Research Strategies

Case studies: in-depth investigation of a single location, event, or group. Provide rich, detailed data and are useful for exploring complex phenomena. Limited generalisability.

Comparative studies: investigation of two or more cases to identify similarities, differences, and patterns. Strengthen the ability to draw general conclusions but require careful selection of cases and control of confounding variables.

Longitudinal studies: data collection over an extended period (months to years). Allow the study of change over time but are time-consuming and vulnerable to attrition.

Cross-sectional studies: data collection at a single point in time. Efficient and practical but cannot establish temporal sequences or causal relationships.

Data Collection Techniques

Physical Geography Techniques

River studies:

Width, depth, and velocity: measured at multiple points across the channel using a tape measure, ranging pole, and flow meter (impeller or floats). Systematic sampling across the channel provides data on channel geometry and discharge ( $Q = w \times d \times v$ ).
Sediment analysis: sediment samples collected from the riverbed, sieved to determine particle size distribution, and analysed using the Wentworth scale or phi ( $\phi$ ) scale. Smaller, rounder particles indicate greater transport distance.
Hydraulic radius: $R = A / P$ (cross-sectional area divided by wetted perimeter). Higher hydraulic radius indicates greater efficiency.

Coastal studies:

Beach profiling: using a clinometer and ranging poles to measure slope angle at regular intervals from the backshore to the low water mark. Profiles can be compared across seasons or after storm events to assess erosion and deposition.
Sediment analysis: pebble size, shape (roundness, sphericity using Cailleux or Powers index), and composition assessed at regular intervals along the beach.
Longshore drift: measuring the direction and rate of sediment transport using tracer pebbles (painted or tagged pebbles released at a known point and recovered after a set period).

Ecosystem studies:

Quadrat sampling: placing a quadrat ( $0.5\;\mathrm{m} \times 0.5\;\mathrm{m}$ or $1\;\mathrm{m} \times 1\;\mathrm{m}$ ) at systematic or random intervals within the study area to record species presence, abundance, or percentage cover.
Transect sampling: recording data along a line (belt transect or line transect) across an environmental gradient (e.g., from low water mark to cliff top). Useful for studying zonation.
Biotic indices: using indicator species to assess environmental quality (e.g., the Biological Oxygen Demand (BOD) indicator species for water pollution, lichen species as indicators of air quality).

Human Geography Techniques

Questionnaires: structured instruments using closed questions (Likert scales, multiple choice) for quantitative data, and open questions for qualitative data. Can be administered in person, by post, or online. Advantages: efficient, standardised, anonymous. Limitations: low response rates, social desirability bias, limited depth.

Interviews: semi-structured or unstructured interviews allow in-depth exploration of attitudes, perceptions, and experiences. Can reveal unexpected insights but are time-consuming, subjective, and difficult to analyse systematically.

Observations: systematic (structured, quantitative) or participant (qualitative, immersive) observation. Observation avoids the bias of self-report data but is limited to observable behaviour and may be influenced by the observer's presence (Hawthorne effect).

Pedestrian and traffic counts: counting people or vehicles at specific locations at set times to assess patterns of movement, land use, and accessibility. Provides objective quantitative data but captures only a snapshot.

Secondary data sources: Census data (population, employment, housing), deprivation indices (Index of Multiple Deprivation), crime statistics, health data, economic indicators.

Sampling Strategies

Random sampling: every member of the target population has an equal chance of selection. Eliminates sampling bias but requires a complete list of the population and may produce an unrepresentative sample by chance.

Systematic sampling: selecting every $n$ th item from a list at regular intervals. Simple and ensures even coverage but may coincide with a periodic pattern in the data.

Stratified sampling: the population is divided into strata based on relevant characteristics, and samples are drawn proportionally from each stratum. Ensures representativeness but requires knowledge of the population's composition.

Opportunity sampling: selecting whatever is readily available. Quick and convenient but produces a biased sample.

Cluster sampling: the population is divided into clusters (e.g., geographical areas), and a random sample of clusters is selected. All members of selected clusters are studied (or a random sample within each cluster). Practical for large or dispersed populations.

Data Analysis and Presentation

Graphical Techniques

Bar charts: compare discrete categories
Histograms: display the distribution of continuous data (bars touch, representing frequency density)
Line graphs: show trends over time or along a transect
Scatter graphs: display the relationship between two continuous variables; a line of best fit can be drawn to assess correlation
Pie charts: show proportional composition (limited to a small number of categories)
Choropleth maps: display spatial variation in a variable using shading or colour intensity
Proportional symbol maps: symbols (circles, squares) sized in proportion to the data value at each location
Triangular graphs: display three variables simultaneously (e.g., soil composition: sand, silt, clay)
Radial diagrams: display data as sectors radiating from a central point, useful for comparing multiple variables at one location

Descriptive Statistics

Measures of central tendency:

Mean: $\bar{x} = \frac◆LB◆\sum x_i◆RB◆◆LB◆n◆RB◆$ . Uses all data points but affected by outliers.
Median: the middle value when data are sorted. Robust to outliers.
Mode: the most frequent value. Useful for categorical data.

Measures of dispersion:

Range: maximum value minus minimum value. Simple but affected by outliers.
Interquartile range (IQR): $Q_3 - Q_1$ . Robust to outliers; represents the middle $50\%$ of the data.
Standard deviation: $\sigma = \sqrt◆LB◆\frac◆LB◆\sum (x_i - \bar{x})^2◆RB◆◆LB◆n◆RB◆◆RB◆$ . Measures average deviation from the mean; uses all data points.

Inferential Statistics

Spearman's Rank Correlation Coefficient ( $r_s$ )

Measures the strength and direction of the monotonic relationship between two ranked variables.

$r_s = 1 - \frac◆LB◆6 \sum d^2◆RB◆◆LB◆n(n^2 - 1)◆RB◆$

Where $d$ is the difference in ranks for each pair of observations, and $n$ is the sample size. $r_s$ ranges from $-1$ (perfect negative correlation) to $+1$ (perfect positive correlation). The significance of $r_s$ is tested against critical values at a chosen significance level (e.g., $p < 0.05$ ).

Mann-Whitney U Test

A non-parametric test for comparing two independent samples. Tests whether one sample tends to have larger values than the other.

$U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1$

Where $n_1$ and $n_2$ are the sample sizes and $R_1$ is the sum of ranks in sample $1$ . The calculated $U$ value is compared to critical values to determine significance.

Chi-Squared ( $\chi^2$ ) Test

Tests whether there is a significant association between two categorical variables.

$\chi^2 = \sum \frac{(O - E)^2}{E}$

Where $O$ is the observed frequency and $E$ is the expected frequency for each cell. Expected frequency $E = \frac◆LB◆\mathrm{row total} \times \mathrm{column total}◆RB◆◆LB◆\mathrm{grand total}◆RB◆$ . The calculated $\chi^2$ value is compared to critical values with the appropriate degrees of freedom ( $\mathrm{df} = (r-1)(c-1)$ ).

Student's t-Test

A parametric test for comparing the means of two groups (independent or paired). Assumes normally distributed data and equal variances. The independent t-test compares two unrelated groups; the paired t-test compares two related samples (e.g., before and after measurements).

Geographical Information Systems (GIS)

What Is GIS?

A GIS is a computer-based system for storing, analysing, manipulating, and displaying spatially referenced data. GIS integrates geographical data (location) with attribute data (characteristics).

Key GIS Functions

Data input and storage: importing data from various sources (GPS, remote sensing, digitised maps, databases)
Data management: organising, editing, and maintaining spatial databases
Spatial analysis: buffer zones (areas within a specified distance of a feature), overlay analysis (combining multiple data layers), network analysis (shortest path, service area), spatial interpolation (estimating values at unsampled locations)
Visualisation: creating maps, 3D models, and animations to communicate spatial patterns and relationships

Applications in Geography

Physical geography: mapping land use change, modelling flood risk, analysing coastal erosion patterns, monitoring deforestation using satellite imagery
Human geography: mapping population density, deprivation, transport accessibility, retail catchment areas, migration flows
Fieldwork: displaying data collection points on a base map, analysing spatial patterns in primary data, creating layered maps that integrate multiple data sources

Remote Sensing

Remote sensing is the acquisition of information about the Earth's surface from a distance, typically using satellite or aerial sensors. Key applications:

Land cover and land use mapping: classifying satellite imagery to identify vegetation, urban areas, water bodies, and bare soil
Change detection: comparing imagery from different dates to identify deforestation, urban expansion, coastal erosion, or glacial retreat
Vegetation monitoring: using the Normalised Difference Vegetation Index (NDVI) to assess plant health and biomass

Limitations of GIS

Data quality depends on the accuracy, resolution, and currency of the source data
GIS is a tool, not a theory: it does not explain why patterns exist, only where they are
The "digital divide": access to GIS technology, training, and data is uneven, potentially excluding developing countries
Spatial data can be politically sensitive (e.g., mapping disputed boundaries, sensitive infrastructure)

Evaluation of Research

Reliability

Reliability refers to the consistency and repeatability of data collection. To improve reliability:

Standardise methods (same equipment, same procedure, same time of day)
Train data collectors to ensure consistency
Use pilot studies to identify and correct problems
Collect sufficient sample sizes to reduce the influence of random variation
Record methods in sufficient detail for replication

Validity

Validity refers to whether the data accurately measures what it is intended to measure.

Internal validity: the extent to which the results truly reflect the phenomenon being studied, free from confounding variables
External validity: the extent to which the findings can be generalised to other locations, populations, or times

To improve validity:

Use appropriate sampling strategies to ensure representativeness
Control for confounding variables (e.g., comparing sites with similar geology when studying slope processes)
Triangulate (use multiple methods or data sources to cross-validate findings)
Acknowledge and discuss limitations honestly

Limitations in Fieldwork

Sampling bias: opportunity or convenience sampling may not represent the wider population or area
Observer bias: the researcher's expectations may influence data collection or interpretation
Temporal limitations: fieldwork is typically conducted over a short period and may not capture seasonal or long-term variation
Equipment limitations: measurement instruments have finite precision and may introduce systematic error
Access and safety: some locations may be inaccessible or dangerous, restricting data collection
Ethical considerations: obtaining informed consent from human participants, respecting privacy, minimising environmental impact

Common Pitfalls

Confusing correlation with causation in statistical analysis. A strong correlation between two variables does not mean one causes the other; a confounding variable may be responsible.
Using an inappropriate statistical test for the data type or experimental design. The choice of test depends on whether the data are nominal, ordinal, or interval/ratio, and whether the design involves independent or related samples.
Presenting data without contextual interpretation. Statistical results should be related back to the geographical theory and the research question.
Failing to acknowledge limitations. Every study has limitations; acknowledging them strengthens the evaluation and demonstrates critical thinking.
Using primary and secondary data interchangeably without discussing their different strengths, limitations, and potential inconsistencies.
Confusing the mean, median, and mode, or reporting the mean when the data are skewed (the median is more appropriate for skewed distributions).

Practice Problems

Problem 1: Spearman's Rank Correlation

A student investigates the relationship between distance from the city centre ( $\mathrm{km}$ ) and house prices (index, $0$ -- $100$ ). The data are:

Location	Distance ( $\mathrm{km}$ )	House Price Index
A	1	95
B	3	80
C	5	65
D	8	55
E	12	40
F	15	30

Calculate Spearman's rank correlation coefficient.

Ranking the data:

Location	Distance rank	Price rank
A	1	1
B	2	2
C	3	3
D	4	4
E	5	5
F	6	6

$\sum d^2 = 0$ , $n = 6$

$r_s = 1 - \frac◆LB◆6 \times 0◆RB◆◆LB◆6(36 - 1)◆RB◆ = 1.0$

$r_s = 1.0$ : a perfect negative correlation. As distance from the city centre increases, house prices decrease consistently. At $p < 0.05$ , the critical value for $n = 6$ is $0.886$ . Since $1.0 > 0.886$ , the correlation is statistically significant.

Problem 2: Chi-Squared Test

A geographer investigates whether land use varies between two areas. The observed frequencies are:

Land Use	Area A	Area B	Row Total
Residential	45	30	75
Commercial	20	35	55
Industrial	15	25	40
Green Space	20	10	30
Column Total	100	100	200

Calculate expected frequencies and the chi-squared statistic.

Expected frequency: $E = \frac◆LB◆\mathrm{row total} \times \mathrm{column total}◆RB◆◆LB◆\mathrm{grand total}◆RB◆$

Land Use	$E_A$	$E_B$
Residential	$75 \times 100 / 200 = 37.5$	$75 \times 100 / 200 = 37.5$
Commercial	$55 \times 100 / 200 = 27.5$	$55 \times 100 / 200 = 27.5$
Industrial	$40 \times 100 / 200 = 20.0$	$40 \times 100 / 200 = 20.0$
Green Space	$30 \times 100 / 200 = 15.0$	$30 \times 100 / 200 = 15.0$

$\chi^2 = \sum \frac{(O - E)^2}{E}$

$= \frac{(45-37.5)^2}{37.5} + \frac{(30-37.5)^2}{37.5} + \frac{(20-27.5)^2}{27.5} + \frac{(35-27.5)^2}{27.5} + \frac{(15-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(20-15)^2}{15} + \frac{(10-15)^2}{15}$

$= 1.5 + 1.5 + 2.045 + 2.045 + 1.25 + 1.25 + 1.667 + 1.667 = 12.924$

Degrees of freedom $= (4-1)(2-1) = 3$ . Critical value at $p < 0.05$ for $\mathrm{df} = 3$ is $7.815$ . Since $12.924 > 7.815$ , the result is statistically significant. There is a significant association between land use and area.

Problem 3: Sampling Strategy Evaluation

A student wants to investigate whether soil moisture content decreases with distance from a river. They have time to take $20$ measurements. Evaluate two sampling strategies.

Strategy 1: Random sampling. The student randomly selects $20$ points along the riverbank at varying distances and measures soil moisture at each point.

Advantages: eliminates sampling bias; results are statistically valid and can be generalised. Disadvantages: may not capture the full gradient (some distances may be over- or under-represented by chance); practical difficulties in accessing randomly selected points.

Strategy 2: Systematic sampling (transect). The student lays a transect perpendicular to the river and takes measurements at $2\;\mathrm{m}$ intervals from $2\;\mathrm{m}$ to $40\;\mathrm{m}$ from the riverbank.

Advantages: ensures even coverage of the distance gradient; efficient and practical; clearly shows the pattern of change with distance. Disadvantages: if the relationship is non-linear or if there are local anomalies (e.g., a spring, compacted path), the systematic interval may miss important features.

Recommendation: systematic transect sampling is more appropriate for this investigation because the research question specifically concerns the relationship between soil moisture and distance from the river. A systematic transect ensures that the full range of distances is sampled and provides a clear picture of the gradient. The data can be displayed as a line graph (soil moisture vs. distance) and analysed using Spearman's rank correlation.

Problem 4: GIS Application

Explain how a geographer could use GIS to investigate the impact of a new shopping centre on the surrounding area.

A GIS-based investigation could integrate multiple data layers:

Data collection and input:
- Digitise the location and boundaries of the new shopping centre
- Import data on pedestrian flows (before and after opening) from manual counts or automated sensors
- Import data on retail unit occupancy and types within the shopping centre
- Import Census data on household income, car ownership, and employment in surrounding areas
- Import data on existing retail centres (locations, sizes, types of shops)
Spatial analysis:
- Buffer analysis: create buffer zones around the shopping centre (e.g., $1\;\mathrm{km}$ , $3\;\mathrm{km}$ , $5\;\mathrm{km}$ ) to define zones of influence
- Overlay analysis: overlay the buffer zones with Census data to analyse the demographic characteristics of the shopping centre's catchment area
- Network analysis: use road network data to calculate drive-time isochrones (areas reachable within $10$ , $15$ , $20$ minutes by car), which are more meaningful than straight-line buffers
- Thiessen polygons: create catchment areas based on the nearest shopping centre to identify which existing centres have lost trade
Change detection:
- Compare retail vacancy rates, footfall data, and transport patterns before and after the shopping centre's opening
- Map the spatial distribution of shop closures in the town centre to identify whether decline is concentrated in specific areas
Visualisation:
- Create maps showing catchment areas, demographic profiles, and changes in footfall
- Produce 3D visualisations of the shopping centre's visibility and accessibility

Problem 5: Fieldwork Evaluation

A student conducted a river study at three sites along a river. At each site, they measured width, depth, and velocity. Critically evaluate the reliability and validity of this study.

Reliability:

Strengths:

Standardised methods (same equipment, same measurement procedure) at each site improve consistency
Multiple measurements across the channel (systematic sampling) reduce the influence of local anomalies

Limitations:

Only one measurement at each site provides no assessment of temporal variation (seasonal changes, flood events). Repeated measurements over time would improve reliability.
If different people measured at different sites, inter-observer variation could affect consistency
Flow meters can be affected by debris or calibration drift; regular calibration is needed

Validity:

Strengths:

The variables measured (width, depth, velocity) are directly relevant to the study of river processes (discharge, efficiency, hydraulic geometry)
Multiple sites along the river's course allow the investigation of downstream changes, supporting geographical theory

Limitations:

Only three sites may not capture the full complexity of downstream changes; additional sites would strengthen the analysis
No consideration of confounding variables such as geology, land use, or tributary inputs, which could affect the results independently of distance downstream
The study captures a snapshot in time; river characteristics vary with discharge, season, and weather. The findings may not be valid at different times of year
No measurement of sediment load or channel roughness, which are important controls on river velocity and efficiency

Improvements:

Increase the number of sites (e.g., $6$ -- $8$ ) for a more complete picture
Conduct measurements at different times of year to assess seasonal variation
Record confounding variables (geology, land use, tributaries) at each site
Use a larger number of depth and velocity measurements across the channel
Calibrate equipment before each data collection session

Research Methods in Geography​

Primary and Secondary Data​

Quantitative and Qualitative Methods​

Research Strategies​

Data Collection Techniques​

Physical Geography Techniques​

Human Geography Techniques​

Sampling Strategies​

Data Analysis and Presentation​

Graphical Techniques​

Descriptive Statistics​

Inferential Statistics​

Spearman's Rank Correlation Coefficient (rsr_srs​)​

Mann-Whitney U Test​

Chi-Squared (χ2\chi^2χ2) Test​

Student's t-Test​

Geographical Information Systems (GIS)​

What Is GIS?​

Key GIS Functions​

Applications in Geography​

Remote Sensing​

Limitations of GIS​

Evaluation of Research​

Reliability​

Validity​

Limitations in Fieldwork​

Common Pitfalls​

Practice Problems​