Statistics Skills: Scatter Graphs

    Edexcel
    GCSE
    Mathematics

    Scatter graphs are a powerful visual tool for analysing the relationship between two variables, and this topic is essential for both Foundation and Higher tier candidates. You'll learn to plot bivariate data accurately, draw lines of best fit using a ruler, describe correlation with precision, and make reliable estimates through interpolation while recognising the limitations of extrapolation. Master these skills and you'll confidently tackle scatter graph questions worth multiple marks in your exam.

    10
    Min Read
    5
    Examples
    7
    Questions
    10
    Key Terms

    Study Notes

    Overview

    Scatter graphs, also known as scatter diagrams or scattergrams, are one of the most practical and visually engaging topics in GCSE statistics. They allow you to explore whether a relationship exists between two variables - such as hours of study and exam performance, or the age of a car and its selling price. Edexcel assesses this topic through precise plotting of bivariate data, qualitative analysis of correlation, and the use of lines of best fit to make predictions. Candidates are expected to distinguish between reliable interpolation (estimating within the data range) and unreliable extrapolation (estimating beyond the observed data). This topic connects directly to other areas of statistics, including averages, data handling, and probability, and it frequently appears in both Foundation and Higher tier papers. Typical exam questions ask you to plot points, draw a line of best fit, describe the type of correlation, and make estimates - each step earning marks when executed with precision and clear working.

    Key Concepts

    Concept 1: Bivariate Data and Scatter Graphs

    Bivariate data involves two variables measured for each item or individual. For example, you might record both the height and shoe size of students in a class, or the temperature and ice cream sales at a shop. A scatter graph displays this data visually by plotting one variable on the x-axis (the independent variable) and the other on the y-axis (the dependent variable). Each pair of values becomes a single point on the graph. The beauty of scatter graphs is that they reveal patterns at a glance - you can immediately see whether the variables are related and, if so, how. When plotting points, accuracy is crucial. Edexcel mark schemes typically award one mark for plotting all points correctly, with a tolerance of half a small square on graph paper. This means you must take your time, use a sharp pencil, and double-check each coordinate before moving on.

    Example: Imagine you collect data on 10 students: their hours of revision (x-axis) and their test scores out of 100 (y-axis). Student A revised for 2 hours and scored 35%, so you plot the point (2, 35). Student B revised for 5 hours and scored 60%, giving the point (5, 60). Continue for all 10 students, and you'll have a scatter graph showing the relationship between revision time and performance.

    Concept 2: Correlation

    Correlation describes the relationship between the two variables on your scatter graph. There are three types of correlation you must be able to identify and describe:

    Positive correlation occurs when as one variable increases, the other also increases. On a scatter graph, the points trend upward from left to right. For instance, more hours of revision generally lead to higher test scores - this is positive correlation. The stronger the correlation, the closer the points cluster around an imaginary straight line.

    Negative correlation occurs when as one variable increases, the other decreases. The points trend downward from left to right. A classic example is the age of a car and its value - as cars get older, their price typically falls. Again, the strength of the correlation depends on how tightly the points cluster around a line.

    No correlation means there is no relationship between the variables. The points are scattered randomly across the graph with no discernible pattern. For example, shoe size and exam score have no logical connection, so you would expect no correlation.

    Edexcel examiners are very particular about language here. Simply writing "positive" or "it goes up" is too vague and will not earn full marks. Instead, you must either use precise mathematical terminology ("There is a positive correlation between revision time and test score") or, even better, give a contextual description that references the specific variables: "As the number of hours of revision increases, the test score increases." This shows you understand what the data actually represents, not just the mathematical pattern.

    Concept 3: Line of Best Fit

    The line of best fit is a straight line drawn through a scatter graph to represent the general trend of the data. It is not a dot-to-dot line connecting the points, nor is it a freehand curve. It is a single, straight line drawn with a ruler that passes roughly through the middle of the points, with approximately equal numbers of points above and below the line. Ideally, the line should pass through the mean point - that is, the point where x equals the average of all x-values and y equals the average of all y-values. The line should extend across the full range of the data, from the smallest x-value to the largest.

    Drawing the line of best fit is a skill that improves with practice. Edexcel mark schemes award one mark for a correctly drawn line of best fit, and examiners look for a line that is straight, ruled, passes through the mean, and covers the full data range. Common mistakes include drawing the line too short, using a freehand squiggle, or forcing the line through outliers (points that don't fit the general pattern). Your line should ignore outliers and represent the overall trend.

    Why does this work? The line of best fit summarises the relationship between the variables in a simple, visual way. It allows you to make predictions and estimates, which is the practical purpose of scatter graphs. By reducing a cloud of points to a single line, you can quickly see the trend and use it to answer questions.

    Concept 4: Interpolation and Extrapolation

    Once you have drawn your line of best fit, you can use it to estimate values. This is where the distinction between interpolation and extrapolation becomes critical.

    Interpolation is when you estimate a value within the range of your data. For example, if your data covers revision times from 1 hour to 10 hours, and you use your line of best fit to estimate the test score for 6 hours of revision, that is interpolation. These estimates are considered reliable because you are working within the observed pattern. The line of best fit is based on actual data points in this range, so your estimate is grounded in evidence.

    Extrapolation is when you estimate a value outside the range of your data. If your data only goes up to 10 hours and you try to predict the score for 15 hours of revision, that is extrapolation. These estimates are unreliable because you are assuming the trend continues beyond what you have actually measured - and it might not. For instance, there may be a limit to how much revision helps, or the relationship might change at higher values.

    Edexcel loves to test this distinction. A Higher tier question might ask you to estimate a value outside the data range and then comment on the reliability of your estimate. The correct approach is to make the estimate using your line of best fit (because the question asks you to), but then add a critical comment such as: "This estimate may be unreliable because it is based on extrapolation beyond the observed data range" or "This prediction assumes the trend continues, which may not be the case."

    Example: Suppose your scatter graph shows data for revision times from 2 to 10 hours. If asked to estimate the score for 7 hours, you draw a vertical dashed line from x = 7 up to your line of best fit, then a horizontal dashed line across to the y-axis, and read off the value (say, 68%). This is interpolation and is reliable. If asked to estimate the score for 15 hours, you extend your line of best fit and repeat the process, but you must comment that this extrapolation is unreliable.

    Concept 5: Outliers

    An outlier is a data point that does not fit the general pattern of the scatter graph. It lies far away from the other points and the line of best fit. Outliers can occur for various reasons: measurement errors, unusual circumstances, or genuine exceptions to the trend. When drawing your line of best fit, you should ignore outliers - do not force your line to pass through them. If an exam question asks you to identify or comment on an outlier, you should explain why it doesn't fit the trend. For example: "This point is an outlier because the student scored much lower than expected for their revision time, possibly due to illness on the exam day."

    Mathematical Relationships

    Scatter graphs do not typically involve formulas in the way that algebra or geometry do, but there are key mathematical concepts you must understand:

    Mean Point: The mean point of a scatter graph is calculated by finding the mean (average) of all x-values and the mean of all y-values. If you have data points (x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ), then the mean point is:

    Mean point = (x̄, ȳ) where x̄ = (x₁ + x₂ + ... + xₙ) / n and ȳ = (y₁ + y₂ + ... + yₙ) / nYour line of best fit should pass through or very close to this mean point.

    Gradient and Intercept (Higher Tier): While not always required, understanding that a line of best fit has a gradient (slope) and y-intercept can help. If the line has equation y = mx + c, then m is the gradient (how steep the line is) and c is where the line crosses the y-axis. A positive gradient indicates positive correlation; a negative gradient indicates negative correlation.

    Correlation vs Causation: This is not a formula but a critical concept. Just because two variables are correlated does not mean one causes the other. For example, ice cream sales and drowning incidents are positively correlated (both increase in summer), but eating ice cream does not cause drowning. Both are linked to a third factor: hot weather. Examiners may ask you to comment on this, so always think critically about what the data actually shows.

    Practical Applications

    Scatter graphs are used extensively in real-world contexts, and Edexcel often sets questions in practical scenarios to test your understanding:

    • Economics and Business: Analysing the relationship between advertising spend and sales revenue, or between price and demand.
    • Health and Fitness: Exploring the link between exercise hours and weight loss, or between age and reaction time.
    • Science: Investigating how temperature affects the rate of a chemical reaction, or how plant height relates to the amount of fertiliser used.
    • Education: Examining the connection between attendance and exam results, or between hours of study and grades.

    In each case, scatter graphs allow you to visualise the data, identify trends, and make informed predictions. The key is to always interpret the graph in context - don't just describe the mathematical pattern, but explain what it means in the real-world scenario.

    Listen to the Podcast

    Listen to our 10-minute podcast episode where an experienced educator walks you through the key concepts, exam tips, and common mistakes for scatter graphs. This audio guide reinforces everything you've read and includes a quick-fire quiz to test your understanding.

    Worked Examples

    5 detailed examples with solutions and examiner commentary

    Practice Questions

    Test your understanding — click to reveal model answers

    Q1

    The table shows the temperature (°C) and number of ice creams sold at a shop over 8 days. Plot this data on a scatter graph. (2 marks)

    2 marks
    foundation

    Hint: Remember to label your axes with the correct variables and units, and plot each point within half a small square of the correct position.

    Q2

    Draw a line of best fit on your scatter graph. (1 mark)

    1 marks
    foundation

    Hint: Use a ruler and make sure your line passes through the mean of the points and extends across the full range of the data.

    Q3

    Describe the relationship between temperature and ice cream sales. (2 marks)

    2 marks
    standard

    Hint: Identify the type of correlation and describe it using the specific variables from the question.

    Q4

    Use your line of best fit to estimate the number of ice creams sold when the temperature is 22°C. Show your working. (2 marks)

    2 marks
    standard

    Hint: Draw dashed lines on your graph to show where you took your reading - one vertical from x = 22 to the line, then one horizontal to the y-axis.

    Q5

    The shop manager uses the line of best fit to predict ice cream sales when the temperature is 35°C. The data only goes up to 28°C. Comment on the reliability of this prediction. (2 marks)

    2 marks
    challenging

    Hint: Think about whether 35°C is inside or outside the range of the data, and what this means for the reliability of the estimate.

    Q6

    On one day, the temperature was 18°C but only 30 ice creams were sold, much lower than expected. Suggest a reason why this might be an outlier. (1 mark)

    1 marks
    standard

    Hint: Think about real-world reasons why ice cream sales might be unusually low on a warm day.

    Q7

    A student says: 'The scatter graph proves that hot weather causes people to buy ice cream.' Explain why this statement is not necessarily correct. (2 marks)

    2 marks
    challenging

    Hint: Think about the difference between correlation and causation. Does the graph actually prove that temperature causes ice cream sales, or just that they are related?

    Key Terms

    Essential vocabulary to know

    More Mathematics Study Guides

    View all

    Geometry and Measures Skills: Volume

    Edexcel
    GCSE

    Master the essential skill of calculating volume for your Edexcel GCSE Maths exam. This guide breaks down everything from simple prisms to complex composite solids, giving you the formulas, exam techniques, and memory hooks needed to secure top marks.

    Statistics Skills: Averages (Mean, Median, Mode)

    Edexcel
    GCSE

    Master the essential Statistics skills of Mean, Median, and Mode for your Edexcel GCSE Maths exam. This guide breaks down how to calculate, interpret, and compare averages, securing you top marks on these guaranteed-to-appear questions.

    Vectors

    AQA
    GCSE

    This guide provides a comprehensive overview of Vectors for AQA GCSE Mathematics, covering everything from basic column notation to complex geometric proofs. It's designed to help you secure every possible mark by focusing on examiner expectations, common pitfalls, and powerful memory techniques.

    Powers and roots

    OCR
    GCSE

    Unlock the power of numbers! This guide demystifies powers and roots for your OCR GCSE Maths exam, showing you how to master index laws and tackle complex calculations with confidence. From basic squares to tricky fractional indices, we'll equip you with the techniques to secure every last mark.

    Vectors

    OCR
    GCSE

    Master OCR GCSE Vectors with this guide, packed with examiner tips and interactive content. We'll break down everything from basic column vectors to complex geometric proofs, showing you how to secure every mark and turn a tricky topic into one of your strengths.

    Data collection (sampling, questionnaires)

    WJEC
    GCSE

    Master WJEC GCSE Mathematics Data Collection (4.1) by learning how to design flawless questionnaires and calculate representative samples. This guide will show you how to secure every mark by avoiding common pitfalls and applying examiner-approved techniques for sampling and data presentation.