Statistics Revision Notes

    Subject: Mathematics | Level: GCSE | Exam Board: Edexcel

    Master the art of data with this comprehensive guide to GCSE Statistics. From calculating averages to interpreting complex scatter graphs and cumulative frequency curves, this topic is packed with highly predictable, winnable marks that appear in every single exam series.

    Revision Notes & Key Concepts

    ## Overview ![Header image for Statistics](https://xnnrgnazirrqvdgfhvou.supabase.co/storage/v1/object/public/study-guide-assets/guide_0e6be59f-ff3f-4e47-ad14-593528b292ae/header_image.png) Welcome to Statistics, one of the most practical and predictable topics in GCSE Mathematics. This topic is fundamentally about making sense of the world through data—how we collect it, present it, and draw meaningful conclusions from it. In your exam, Statistics is guaranteed to appear across multiple papers. Examiners love testing this topic because it assesses both your raw calculation skills (like finding the mean from a grouped table) and your analytical reasoning (like interpreting the spread of data). Statistics connects closely with Probability and Algebra, particularly when dealing with equations of lines of best fit. Typical exam questions range from quick 1-2 mark calculations to 4-6 mark extended problems where you must construct complex diagrams like histograms or compare two data sets using averages and spread. Mastering this topic is often the key to securing your target grade. --- ## Podcast Revision Listen to our comprehensive 10-minute revision podcast covering all key concepts, common mistakes, and exam techniques for Statistics: ![Statistics Revision Podcast](https://xnnrgnazirrqvdgfhvou.supabase.co/storage/v1/object/public/study-guide-assets/guide_0e6be59f-ff3f-4e47-ad14-593528b292ae/statistics_podcast.mp3) --- ## Key Concepts ### Concept 1: Measures of Central Tendency (Averages) An average is a single value that represents the 'middle' or 'typical' value of a data set. You must know three types: Mean, Median, and Mode. ![The three measures of central tendency](https://xnnrgnazirrqvdgfhvou.supabase.co/storage/v1/object/public/study-guide-assets/guide_0e6be59f-ff3f-4e47-ad14-593528b292ae/measures_of_average_diagram.png) **The Mean** is the 'fair share' average. You calculate it by adding all values together and dividing by the total number of values. It uses all the data, making it mathematically strong, but it is easily skewed by extreme outliers. **The Median** is the middle value when the data is placed in order of size. If there is an even number of values, the median is the midpoint between the two middle values. The median is excellent for data with extreme outliers (like house prices) because it ignores the extremes. **The Mode** is the most frequently occurring value. It is the only average that can be used for non-numerical (categorical) data, like eye colour. **Examiner Tip**: When asked to calculate the mean from a grouped frequency table, you MUST use the **midpoint** of each class interval. Multiplying the class boundaries instead of the midpoint is a classic error that costs candidates marks every year. ### Concept 2: Measures of Spread While averages tell us about the 'typical' value, measures of spread tell us how consistent or varied the data is. Examiners almost always ask you to compare two data sets using one average and one measure of spread. **The Range** is the simplest measure: the largest value minus the smallest value. Like the mean, it is heavily affected by outliers. **The Interquartile Range (IQR)** is much more robust. It measures the spread of the middle 50% of the data, ignoring the highest 25% and lowest 25%. $$\text{IQR} = \text{Upper Quartile (Q3)} - \text{Lower Quartile (Q1)}$$ A smaller IQR indicates that the data is more consistent (less varied). A larger IQR indicates the data is more spread out. ### Concept 3: Scatter Graphs and Correlation Scatter graphs are used to investigate the relationship between two variables (bivariate data). ![Types of correlation on scatter graphs](https://xnnrgnazirrqvdgfhvou.supabase.co/storage/v1/object/public/study-guide-assets/guide_0e6be59f-ff3f-4e47-ad14-593528b292ae/scatter_graph_diagram.png) **Correlation** describes the nature of this relationship: - **Positive Correlation**: As one variable increases, the other increases (e.g., hours revised and exam score). - **Negative Correlation**: As one variable increases, the other decreases (e.g., age of a car and its value). - **No Correlation**: There is no discernible pattern. When describing correlation for marks, you must state both the **direction** (positive/negative) and the **strength** (strong/weak). **The Golden Rule**: Correlation does NOT imply causation. Just because ice cream sales and shark attacks both increase in summer (strong positive correlation), it does not mean eating ice cream causes shark attacks. They are both caused by a third variable: warm weather. Examiners will specifically test if you understand this distinction. ### Concept 4: Cumulative Frequency Cumulative frequency is a running total of the frequencies. When plotted, it creates a characteristic S-shaped curve. ![Reading a cumulative frequency graph](https://xnnrgnazirrqvdgfhvou.supabase.co/storage/v1/object/public/study-guide-assets/guide_0e6be59f-ff3f-4e47-ad14-593528b292ae/cumulative_frequency_diagram.png) To construct the graph, you must plot the cumulative frequency against the **upper class boundary** of each interval, NOT the midpoint. This is because the cumulative frequency represents all values *up to and including* that boundary. From this curve, you can estimate the median (at 50% of the total frequency), the lower quartile (at 25%), and the upper quartile (at 75%). ### Concept 5: Histograms (Higher Tier Only) Unlike bar charts, histograms are used for continuous data and can have unequal class widths. Because the widths vary, the height of the bar no longer represents the frequency. Instead, the **area** of the bar represents the frequency. The y-axis on a histogram is always **Frequency Density**. $$\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}$$ --- ## Mathematical Relationships and Formulas | Formula | Equation | Notes | |---------|----------|-------| | **Mean** | $\frac{\sum x}{n}$ | Must memorise. Sum of values divided by number of values. | | **Mean (Grouped)** | $\frac{\sum fx}{\sum f}$ | Must memorise. $x$ is the midpoint of the class. | | **Range** | $\text{Max} - \text{Min}$ | Must memorise. | | **Interquartile Range** | $\text{Q3} - \text{Q1}$ | Must memorise. | | **Frequency Density** | $\frac{\text{Frequency}}{\text{Class Width}}$ | Must memorise (Higher Tier). Used for histograms. | --- ## Practical Applications Statistics is perhaps the most widely applied area of mathematics in the real world: - **Medical Trials**: Using samples and averages to determine if a new drug is effective. - **Quality Control**: Factories use sampling and range to ensure products (like the volume of a soft drink bottle) are consistent. - **Actuarial Science**: Insurance companies use scatter graphs and correlation to assess risk and set premiums. - **Machine Learning**: Modern AI relies heavily on statistical correlation and lines of best fit (regression) to make predictions.

    Key Terms & Definitions

    Continuous Data
    Data that can take any value within a range (e.g., height, time, weight). It is measured, not counted.
    Discrete Data
    Data that can only take specific, exact values (e.g., shoe size, number of siblings). It is counted.
    Extrapolation
    Estimating a value outside the range of the given data points using a line of best fit.
    Outlier (Anomaly)
    A data point that differs significantly from other observations in the same dataset.
    Bivariate Data
    Data for two variables (e.g., height and weight for the same person).
    Frequency Density
    The frequency divided by the class width.

    Worked Examples

    Practice Questions

    Statistics

    Edexcel
    GCSE
    Mathematics

    Master the art of data with this comprehensive guide to GCSE Statistics. From calculating averages to interpreting complex scatter graphs and cumulative frequency curves, this topic is packed with highly predictable, winnable marks that appear in every single exam series.

    6
    Min Read
    3
    Examples
    5
    Questions
    6
    Key Terms
    🎙 Podcast Episode
    Statistics
    0:00-0:00

    Study Notes

    Overview

    Header image for Statistics

    Welcome to Statistics, one of the most practical and predictable topics in GCSE Mathematics. This topic is fundamentally about making sense of the world through data—how we collect it, present it, and draw meaningful conclusions from it.

    In your exam, Statistics is guaranteed to appear across multiple papers. Examiners love testing this topic because it assesses both your raw calculation skills (like finding the mean from a grouped table) and your analytical reasoning (like interpreting the spread of data). Statistics connects closely with Probability and Algebra, particularly when dealing with equations of lines of best fit.

    Typical exam questions range from quick 1-2 mark calculations to 4-6 mark extended problems where you must construct complex diagrams like histograms or compare two data sets using averages and spread. Mastering this topic is often the key to securing your target grade.


    Podcast Revision

    Listen to our comprehensive 10-minute revision podcast covering all key concepts, common mistakes, and exam techniques for Statistics:

    Statistics Revision Podcast


    Key Concepts

    Concept 1: Measures of Central Tendency (Averages)

    An average is a single value that represents the 'middle' or 'typical' value of a data set. You must know three types: Mean, Median, and Mode.

    The three measures of central tendency

    The Mean is the 'fair share' average. You calculate it by adding all values together and dividing by the total number of values. It uses all the data, making it mathematically strong, but it is easily skewed by extreme outliers.

    The Median is the middle value when the data is placed in order of size. If there is an even number of values, the median is the midpoint between the two middle values. The median is excellent for data with extreme outliers (like house prices) because it ignores the extremes.

    The Mode is the most frequently occurring value. It is the only average that can be used for non-numerical (categorical) data, like eye colour.

    Examiner Tip: When asked to calculate the mean from a grouped frequency table, you MUST use the midpoint of each class interval. Multiplying the class boundaries instead of the midpoint is a classic error that costs candidates marks every year.

    Concept 2: Measures of Spread

    While averages tell us about the 'typical' value, measures of spread tell us how consistent or varied the data is. Examiners almost always ask you to compare two data sets using one average and one measure of spread.

    The Range is the simplest measure: the largest value minus the smallest value. Like the mean, it is heavily affected by outliers.

    The Interquartile Range (IQR) is much more robust. It measures the spread of the middle 50% of the data, ignoring the highest 25% and lowest 25%.

    \text{IQR} = \text{Upper Quartile (Q3)} - \text{Lower Quartile (Q1)}

    A smaller IQR indicates that the data is more consistent (less varied). A larger IQR indicates the data is more spread out.

    Concept 3: Scatter Graphs and Correlation

    Scatter graphs are used to investigate the relationship between two variables (bivariate data).

    Types of correlation on scatter graphs

    Correlation describes the nature of this relationship:

    • Positive Correlation: As one variable increases, the other increases (e.g., hours revised and exam score).
    • Negative Correlation: As one variable increases, the other decreases (e.g., age of a car and its value).
    • No Correlation: There is no discernible pattern.

    When describing correlation for marks, you must state both the direction (positive/negative) and the strength (strong/weak).

    The Golden Rule: Correlation does NOT imply causation. Just because ice cream sales and shark attacks both increase in summer (strong positive correlation), it does not mean eating ice cream causes shark attacks. They are both caused by a third variable: warm weather. Examiners will specifically test if you understand this distinction.

    Concept 4: Cumulative Frequency

    Cumulative frequency is a running total of the frequencies. When plotted, it creates a characteristic S-shaped curve.

    Reading a cumulative frequency graph

    To construct the graph, you must plot the cumulative frequency against the upper class boundary of each interval, NOT the midpoint. This is because the cumulative frequency represents all values up to and including that boundary.

    From this curve, you can estimate the median (at 50% of the total frequency), the lower quartile (at 25%), and the upper quartile (at 75%).

    Concept 5: Histograms (Higher Tier Only)

    Unlike bar charts, histograms are used for continuous data and can have unequal class widths. Because the widths vary, the height of the bar no longer represents the frequency. Instead, the area of the bar represents the frequency.

    The y-axis on a histogram is always Frequency Density.

    \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}


    Mathematical Relationships and Formulas

    FormulaEquationNotes
    Mean$\frac{\sum x}{n}$Must memorise. Sum of values divided by number of values.
    Mean (Grouped)$\frac{\sum fx}{\sum f}$Must memorise. $x$ is the midpoint of the class.
    Range$\text{Max} - \text{Min}$Must memorise.
    Interquartile Range$\text{Q3} - \text{Q1}$Must memorise.
    Frequency Density$\frac{\text{Frequency}}{\text{Class Width}}$Must memorise (Higher Tier). Used for histograms.

    Practical Applications

    Statistics is perhaps the most widely applied area of mathematics in the real world:

    • Medical Trials: Using samples and averages to determine if a new drug is effective.
    • Quality Control: Factories use sampling and range to ensure products (like the volume of a soft drink bottle) are consistent.
    • Actuarial Science: Insurance companies use scatter graphs and correlation to assess risk and set premiums.
    • Machine Learning: Modern AI relies heavily on statistical correlation and lines of best fit (regression) to make predictions.

    Visual Resources

    3 diagrams and illustrations

    The three measures of central tendency
    The three measures of central tendency
    Types of correlation on scatter graphs
    Types of correlation on scatter graphs
    Reading a cumulative frequency graph
    Reading a cumulative frequency graph

    Interactive Diagrams

    2 interactive diagrams to visualise key concepts

    Decision tree for selecting the most appropriate measure of central tendency.

    Classification of statistical data types.

    Worked Examples

    3 detailed examples with solutions and examiner commentary

    Practice Questions

    Test your understanding — click to reveal model answers

    Q1

    Here is a list of numbers: 12, 15, 14, 17, 22, 19, 14. Find the median. (2 marks)

    2 marks
    foundation

    Hint: What is the very first thing you must do with a list of numbers before finding the middle?

    Q2

    A student measures the time taken for 20 people to complete a puzzle. The times are recorded in a grouped frequency table. Explain why calculating the mean from this table only gives an estimate. (1 mark)

    1 marks
    standard

    Hint: Think about what information is lost when data is put into groups.

    Q3

    The scatter graph shows the engine size and the fuel efficiency (mpg) of 15 cars. The line of best fit has been drawn. A car has an engine size of 4.5 litres. This is outside the range of the plotted data. Explain why using the line of best fit to estimate its fuel efficiency might not be reliable. (1 mark)

    1 marks
    standard

    Hint: What is the technical term for predicting outside the data range?

    Q4

    Compare the distribution of test scores for Class A and Class B.
    Class A: Median = 65, IQR = 12
    Class B: Median = 72, IQR = 20
    (2 marks)

    2 marks
    challenging

    Hint: You need two distinct comparisons. One for the average, one for the spread. Use context.

    Q5

    A histogram is drawn to show the weights of parcels. The class interval 2 < w \leq 5 has a frequency of 18. Calculate the frequency density for this class. (2 marks)

    2 marks
    challenging

    Hint: What is the formula linking frequency, class width, and frequency density?

    Explore this topic further

    View Topic PageAll Mathematics Topics

    Key Terms

    Essential vocabulary to know