Interpreting scatterplots
A scatterplot places paired observations on a two-dimensional grid, with one variable on the horizontal axis and the other on the vertical axis. Looking at both variables at once helps you see whether they tend to increase together, move in opposite directions, or show no clear pattern. Scatterplots are used in many fields to reveal patterns, trends, clusters, and outliers.
When you interpret a scatterplot, start by looking for the overall pattern. The points might:
- trend upward
- trend downward
- form a random cloud
These visual cues help you decide whether a linear relationship might exist and whether tools like correlation or regression would be useful.
In practice, you might use a scatterplot to examine the link between advertising spend and sales revenue to see whether higher budgets are associated with greater revenue. Or you might plot patients’ dosage of a drug against their response level to assess efficacy. Scatterplots can also reveal:
- clusters (subgroups in your data that behave differently)
- outliers (points far from the main pattern, possibly due to measurement error or unusual conditions)
Example: Bus routes and ridership A city records the number of daily bus routes in operation and total ridership (in thousands) for fifteen days. We expect that more routes will generally result in higher ridership. The scatterplot below illustrates this positive trend:
| Bus routes | Ridership (thousands) |
|---|---|
| 20 | 5.5 |
| 22 | 6.1 |
| 24 | 6.8 |
| 26 | 7.3 |
| 28 | 7.8 |
| 30 | 8.2 |
| 32 | 8.8 |
| 34 | 9.3 |
| 36 | 9.7 |
| 38 | 10.2 |
| 40 | 10.8 |
| 42 | 11.3 |
| 44 | 11.7 |
| 46 | 12.1 |
| 48 | 12.6 |
Answer: The scatterplot shows a clear upward trend, which indicates a strong positive linear relationship between the number of bus routes and ridership. As the number of available bus routes increases, total ridership increases as well. In context, this makes sense: more routes typically make public transportation accessible to more people, which can increase usage.
Example: Pets and shoe size Survey data for number of pets versus shoe size for fifteen participants. There is no reason to expect any relationship between these variables, so the scatterplot forms a random cloud.
Answer the following questions:
- What type of relationship is shown here, how can you tell, and put it in context to the problem?
- Would it be appropriate to use this data to make predictions about someone’s shoe size based on how many pets they have? Explain why or why not.
- Compare and contrast this scatterplot with the previous example from the bus routes. What fundamental difference exists between the two data relationships?
| Number of pets | Shoe size |
|---|---|
| 2 | 8 |
| 5 | 11 |
| 7 | 6 |
| 1 | 12 |
| 9 | 7 |
| 0 | 9 |
| 3 | 10 |
| 4 | 7 |
| 6 | 13 |
| 8 | 8 |
| 10 | 11 |
| 2 | 5 |
| 5 | 9 |
| 7 | 6 |
| 3 | 12 |
Answer:
The scatterplot shows no correlation between number of pets and shoe size. You can tell because the points form a random cloud with no clear upward or downward trend. In context, that means these variables don’t appear to be related.
No. Because there’s no relationship, knowing how many pets someone has doesn’t give useful information about their shoe size. Any prediction would be essentially a guess.
The bus route data shows a strong positive linear relationship: more routes are associated with higher ridership. The pets and shoe size data shows random variation with no pattern. The fundamental difference is that the bus route relationship is predictable (at least roughly), while the pets and shoe size relationship has no predictive value.
Example: Temperature and hot chocolate sales Data for temperature versus hot chocolate sales at a café over fifteen days. As the weather warms up, sales tend to drop, showing a clear negative trend.
Answer the following questions:
- What type of relationship is shown here, how can you tell, and put it in context to the problem?
- If the temperature rises to 90°F, approximately how many cups of hot chocolate would you expect to sell? Explain your reasoning.
- Compare this scatterplot with the previous example. What evidence in the graph shows that these variables have a meaningful statistical relationship?
| Temperature (°F) | Cups sold |
|---|---|
| 30 | 137 |
| 34 | 131 |
| 38 | 121 |
| 42 | 110 |
| 46 | 103 |
| 50 | 95 |
| 54 | 85 |
| 58 | 78 |
| 62 | 69 |
| 66 | 60 |
| 70 | 51 |
| 74 | 45 |
| 78 | 35 |
| 80 | 30 |
| 82 | 25 |
Answer:
-
The scatterplot shows a strong negative correlation between temperature and hot chocolate sales. You can see this from the clear downward trend from left to right. In context, as outdoor temperature increases, hot chocolate sales decrease, which fits the idea that people buy fewer hot drinks in warm weather.
-
Based on the trend, you’d expect roughly 10-15 cups at 90°F. The data show a steady decrease as temperature rises. Extending that downward pattern beyond the last point at 82°F suggests even lower sales at 90°F.
-
Compared with the no-correlation example, these points follow a fairly tight downward line rather than scattering randomly. That close clustering around a downward trend is evidence of a meaningful and predictable relationship.
Caution: Correlation does not imply causation
A clear linear trend doesn’t prove that one variable causes the other. A lurking variable (a third factor affecting both variables) or reverse causation can create a correlation that looks meaningful but isn’t causal. Use scatterplots alongside context, subject-matter knowledge, and additional statistical tools.
Example: Correlation versus causation (ice cream and drownings)
| Month | Ice cream sales | Drownings |
|---|---|---|
| June | 20 | 5 |
| July | 25 | 8 |
| Aug | 22 | 7 |
| Sept | 15 | 4 |
Why do both ice cream sales and drownings increase in summer?
Answer: They are both influenced by a lurking variable - warm weather - not by one causing the other.
Example: Correlation versus causation (firefighters and damage)
| Firefighters | Damage cost |
|---|---|
| 5 | 50 |
| 10 | 200 |
| 20 | 500 |
| 30 | 1000 |
Why do more firefighters tend to be at fires with more damage?
At first glance, the table shows that as the number of firefighters increases, the damage cost also increases. This indicates a positive correlation between firefighters and damage.
However, correlation does not imply causation. The firefighters are not causing the damage.
The true cause is a third variable: the size or severity of the fire.
- Larger fires naturally cause more damage.
- Larger fires also require more firefighters to control them.
Because both variables increase together due to the same underlying cause, they appear correlated even though one does not cause the other.
Answer: Larger fires cause both more damage and the need for more firefighters; the firefighters do not cause the damage.


