Describing Data: Mean, Median, and Mode

The first job of statistics is to summarise data. Three measures of central tendency — the 'typical' or 'average' value — are essential.

Mean: Add all values and divide by how many there are. If five students score 60, 70, 70, 80, and 100, the mean is (60+70+70+80+100) ÷ 5 = 76.

Median: Arrange values in order and find the middle one. In the example above, the median is 70 (the third value when sorted). The median is less affected by extreme values than the mean.

Mode: The most frequently occurring value. Here it is 70 (appearing twice). A dataset can have more than one mode, or none if all values are unique.

Spread: range and standard deviation

Measures of central tendency tell you the typical value. Measures of spread tell you how varied the data is. The range is simply the maximum minus the minimum. Standard deviation measures how far values typically deviate from the mean — a larger standard deviation means more spread-out data.

Describing Data: Mean, Median, and Mode

Collecting Data and Avoiding Bias

Good statistics depends on good data. How data is collected affects what conclusions are valid.

Populations and samples

A population is the entire group you want to study. A sample is a subset of that population. Because it is rarely possible to collect data from everyone, statisticians use samples. The key is that the sample must be representative — chosen in a way that reflects the full population.

Random sampling — selecting participants by chance — is the gold standard. It minimises selection bias, where certain groups are over- or under-represented. A survey about school lunches that only asks children who eat in the canteen is not a random sample — it misses those who bring packed lunches.

Correlation and causation

Statistics can reveal correlations — relationships between two variables. But correlation does not prove causation. Ice cream sales and drowning rates both rise in summer, but ice cream does not cause drowning. A third variable — hot weather — explains both. Always ask whether a correlation has a plausible causal mechanism before drawing conclusions.

Collecting Data and Avoiding Bias

Statistics in the Real World

Statistics underpins almost every field of human knowledge.

Medicine and public health

Clinical trials use statistics to test whether a treatment works. Patients are randomly divided into a treatment group and a control group. Statistical tests determine whether any difference in outcomes is likely to be real or just the result of chance. The p-value — a measure of statistical significance — helps researchers decide. Statistics drove the development of every major vaccine.

Social science and economics

Governments use census data and survey statistics to allocate resources, plan infrastructure, and track inequality. Economists use statistical models to forecast growth and unemployment. Opinion polling uses sampling theory to estimate voting intentions.

Sport analytics

Modern sports teams use statistics to evaluate players, predict opponent behaviour, and optimise strategies. Baseball's sabermetrics movement showed that on-base percentage predicted winning better than batting average. Football clubs now collect thousands of data points per match. Probability and statistics are closely linked — learn about probability here.

Statistics in the Real World

Frequently asked questions

What is the difference between statistics and probability?
Probability deals with predicting outcomes from known rules — given a fair coin, what is the chance of heads? Statistics works in reverse: given observed data, what can we infer about the underlying process? Probability underpins statistics, but they approach uncertainty from opposite directions. Both are essential in science and data analysis.
What does 'statistically significant' mean?
A result is statistically significant if it is unlikely to have occurred by chance. Scientists typically use p < 0.05, meaning less than a 5% probability the result is random. Statistical significance does not mean the effect is large or important — only that it is unlikely to be a fluke. Effect size matters too.
Can statistics be misleading?
Yes — and it often is. Common tricks include cherry-picking data, using small unrepresentative samples, confusing correlation with causation, and using misleading graphs where axes do not start at zero. Headlines often omit crucial context — a treatment that 'doubles your risk' may raise it from 0.01% to 0.02%. Statistical literacy helps you spot these manipulations.
What is a histogram and how is it different from a bar chart?
Both use bars to display data, but they show different things. A bar chart compares separate categories — favourite colours, for example. A histogram shows the distribution of a single numerical variable by dividing it into intervals (bins) and showing how many values fall in each. In a histogram, the bars touch; in a bar chart, they are separated.