2.4 Interpreting scatterplots

Achievable Praxis Core: Math (5733)

2. Data analysis, statistics, and probability

Our Praxis Core: Math course is currently in development and is a work-in-progress.

Interpreting scatterplots

6 min read

Font

Discuss

Feedback

A scatterplot places paired observations on a two-dimensional grid, with one variable on the horizontal axis and the other on the vertical axis. Looking at both variables at once helps you see whether they tend to increase together, move in opposite directions, or show no clear pattern.

When you interpret a scatterplot, start by looking for the overall pattern. The points might:

trend upward
trend downward
form a random cloud

These visual cues help you decide whether a linear relationship might exist and whether tools like correlation or regression would be useful.

Definitions

Scatterplot: A display of paired data points on coordinate axes showing how two numerical variables relate.
Positive linear relationship: A pattern in which points trend upward from left to right, indicating that both variables increase together.
Negative linear relationship: A pattern in which points trend downward from left to right, indicating that one variable decreases while the other increases.
No correlation: A pattern in which points form a random cloud with no discernible upward or downward trend.
Correlation: A measure of the strength and direction of a linear relationship between two variables, often summarized by a correlation coefficient. A correlation does not necessarily imply a causal relationship between the variables.

In practice, you might use a scatterplot to examine the link between advertising spend and sales revenue to see whether higher budgets are associated with greater revenue. Or you might plot patients’ dosage of a drug against their response level to assess efficacy. Scatterplots can also reveal:

clusters (subgroups in your data that behave differently)
outliers (points far from the main pattern, possibly due to measurement error or unusual conditions)

The scatterplot below shows daily bus routes versus ridership across several days. The clear upward trend indicates a strong positive linear relationship: as the number of available routes increases, total ridership increases as well. More routes make public transportation accessible to more people, which tends to increase usage.

Example: Temperature and hot chocolate sales

Data for temperature versus hot chocolate sales at a café over several days. As the weather warms up, sales tend to drop, showing a clear negative trend.

Answer the following questions:

What type of relationship is shown, how can you tell, and what does it mean in context?

If the temperature rises to 90°F, approximately how many cups of hot chocolate would you expect to sell? Explain your reasoning.

Temperature vs. hot chocolate sales

(spoiler)

Answer:

The scatterplot shows a strong negative correlation between temperature and hot chocolate sales. You can see this from the clear downward trend from left to right. In context, as outdoor temperature increases, hot chocolate sales decrease, which fits the idea that people buy fewer hot drinks in warm weather.
90°F is outside the observed range (30-82°F), so any prediction is extrapolation and unreliable - the trend may not continue.

The correlation coefficient r

Scatterplots let you see the direction and rough strength of a relationship visually, but you can also measure it numerically using the correlation coefficient, written as r.

r ranges from −1 to 1.
The sign tells you the direction: positive r means a positive (upward) trend; negative r means a negative (downward) trend.
The magnitude (how far r is from 0) tells you the strength: values near ±1 indicate a strong relationship, values near 0 indicate little or no linear relationship, and values in between indicate moderate strength.

For example, r = 0.91 is a strong positive correlation - the value is close to 1 and the sign is positive. r = −0.65 is a moderate negative correlation. r = 0.07 is essentially no correlation.

Sidenote

r measures only linear relationships

Two variables can have a strong curved relationship and still produce an r near 0. Always look at the scatterplot alongside r.

Caution: correlation does not imply causation

A clear linear trend doesn’t prove that one variable causes the other. A lurking variable (a third factor affecting both variables) or reverse causation can create a correlation that looks meaningful but isn’t causal. With a small sample, a correlation can also arise purely by chance - a single scatterplot is rarely sufficient evidence of a real relationship. Use scatterplots alongside context, subject-matter knowledge, and additional statistical tools.

Definitions

Causation: A relationship in which changes in one variable directly produce changes in another variable.
Lurking variable: A factor that influences both variables under study, potentially creating a spurious correlation.
Reverse causation: A situation in which the assumed direction of causation is backwards - variable B may cause variable A, rather than A causing B.

Example: Correlation versus causation

More firefighters are associated with higher damage costs, and ice cream sales are correlated with drowning rates in summer. What explains these correlations? And in a study where people who exercise regularly tend to have better health outcomes, could the causation run the other way?

(spoiler)

Answer: Neither correlation in the first set of examples is causal. Fire severity drives both higher damage costs and the need for more firefighters; warm weather drives both more swimming and more ice cream purchases. These are classic lurking variable examples - the firefighters don’t cause the damage, and ice cream sales don’t cause drownings.

The exercise-health correlation is a reverse causation scenario. People who are already healthier may have more energy and fewer physical limitations, making them more likely to exercise. The data show a correlation, but the direction of cause and effect is ambiguous. Identifying reverse causation requires more than a scatterplot - it typically requires a controlled experiment.

When to use a scatterplot

A scatterplot is the right choice when you have two quantitative (numerical) variables and want to explore the relationship between them. Use a different graph type when:

One variable is categorical (e.g., comparing test scores across three teaching methods) → use a bar graph or side-by-side boxplot
You want to show how one variable changes over time → use a line graph
You want to show the distribution of a single variable → use a histogram or dotplot

Scatterplots display paired quantitative data by plotting each $(x, y)$ pair as a point on a coordinate grid.
The horizontal axis represents one variable, and the vertical axis represents another.
Overall patterns are identified by observing whether points trend upward (positive), downward (negative), or form a random cloud (no linear association).
The strength of a relationship is judged by how closely points cluster around an imaginary straight line.
Outliers are points that fall far from the main pattern and may affect interpretation.
Visual evidence from a scatterplot should be considered before applying numerical measures like correlation.
A visible relationship does not imply causation and should be interpreted in context.

Interpreting scatterplots

When you interpret a scatterplot, start by looking for the overall pattern. The points might:

trend upward
trend downward
form a random cloud

These visual cues help you decide whether a linear relationship might exist and whether tools like correlation or regression would be useful.

Definitions

Scatterplot: A display of paired data points on coordinate axes showing how two numerical variables relate.
Positive linear relationship: A pattern in which points trend upward from left to right, indicating that both variables increase together.
Negative linear relationship: A pattern in which points trend downward from left to right, indicating that one variable decreases while the other increases.
No correlation: A pattern in which points form a random cloud with no discernible upward or downward trend.
Correlation: A measure of the strength and direction of a linear relationship between two variables, often summarized by a correlation coefficient. A correlation does not necessarily imply a causal relationship between the variables.

clusters (subgroups in your data that behave differently)
outliers (points far from the main pattern, possibly due to measurement error or unusual conditions)

Example: Temperature and hot chocolate sales

Data for temperature versus hot chocolate sales at a café over several days. As the weather warms up, sales tend to drop, showing a clear negative trend.

Answer the following questions:

What type of relationship is shown, how can you tell, and what does it mean in context?

If the temperature rises to 90°F, approximately how many cups of hot chocolate would you expect to sell? Explain your reasoning.

(spoiler)

Answer:

The scatterplot shows a strong negative correlation between temperature and hot chocolate sales. You can see this from the clear downward trend from left to right. In context, as outdoor temperature increases, hot chocolate sales decrease, which fits the idea that people buy fewer hot drinks in warm weather.
90°F is outside the observed range (30-82°F), so any prediction is extrapolation and unreliable - the trend may not continue.

The correlation coefficient r

Scatterplots let you see the direction and rough strength of a relationship visually, but you can also measure it numerically using the correlation coefficient, written as r.

r ranges from −1 to 1.
The sign tells you the direction: positive r means a positive (upward) trend; negative r means a negative (downward) trend.
The magnitude (how far r is from 0) tells you the strength: values near ±1 indicate a strong relationship, values near 0 indicate little or no linear relationship, and values in between indicate moderate strength.

Sidenote

r measures only linear relationships

Two variables can have a strong curved relationship and still produce an r near 0. Always look at the scatterplot alongside r.

Caution: correlation does not imply causation

Definitions

Causation: A relationship in which changes in one variable directly produce changes in another variable.
Lurking variable: A factor that influences both variables under study, potentially creating a spurious correlation.
Reverse causation: A situation in which the assumed direction of causation is backwards - variable B may cause variable A, rather than A causing B.

Example: Correlation versus causation

More firefighters are associated with higher damage costs, and ice cream sales are correlated with drowning rates in summer. What explains these correlations? And in a study where people who exercise regularly tend to have better health outcomes, could the causation run the other way?

(spoiler)

When to use a scatterplot

A scatterplot is the right choice when you have two quantitative (numerical) variables and want to explore the relationship between them. Use a different graph type when:

One variable is categorical (e.g., comparing test scores across three teaching methods) → use a bar graph or side-by-side boxplot
You want to show how one variable changes over time → use a line graph
You want to show the distribution of a single variable → use a histogram or dotplot

Achievable Praxis Core: Math (5733)

2. Data analysis, statistics, and probability

Our Praxis Core: Math course is currently in development and is a work-in-progress.

Interpreting scatterplots

6 min read

Font

Discuss

Feedback

When you interpret a scatterplot, start by looking for the overall pattern. The points might:

trend upward
trend downward
form a random cloud

These visual cues help you decide whether a linear relationship might exist and whether tools like correlation or regression would be useful.

Definitions

Scatterplot: A display of paired data points on coordinate axes showing how two numerical variables relate.
Positive linear relationship: A pattern in which points trend upward from left to right, indicating that both variables increase together.
Negative linear relationship: A pattern in which points trend downward from left to right, indicating that one variable decreases while the other increases.
No correlation: A pattern in which points form a random cloud with no discernible upward or downward trend.
Correlation: A measure of the strength and direction of a linear relationship between two variables, often summarized by a correlation coefficient. A correlation does not necessarily imply a causal relationship between the variables.

clusters (subgroups in your data that behave differently)
outliers (points far from the main pattern, possibly due to measurement error or unusual conditions)

Example: Temperature and hot chocolate sales

Data for temperature versus hot chocolate sales at a café over several days. As the weather warms up, sales tend to drop, showing a clear negative trend.

Answer the following questions:

What type of relationship is shown, how can you tell, and what does it mean in context?

If the temperature rises to 90°F, approximately how many cups of hot chocolate would you expect to sell? Explain your reasoning.

Temperature vs. hot chocolate sales

(spoiler)

Answer:

The scatterplot shows a strong negative correlation between temperature and hot chocolate sales. You can see this from the clear downward trend from left to right. In context, as outdoor temperature increases, hot chocolate sales decrease, which fits the idea that people buy fewer hot drinks in warm weather.
90°F is outside the observed range (30-82°F), so any prediction is extrapolation and unreliable - the trend may not continue.

The correlation coefficient r

Scatterplots let you see the direction and rough strength of a relationship visually, but you can also measure it numerically using the correlation coefficient, written as r.

r ranges from −1 to 1.
The sign tells you the direction: positive r means a positive (upward) trend; negative r means a negative (downward) trend.
The magnitude (how far r is from 0) tells you the strength: values near ±1 indicate a strong relationship, values near 0 indicate little or no linear relationship, and values in between indicate moderate strength.

Sidenote

r measures only linear relationships

Two variables can have a strong curved relationship and still produce an r near 0. Always look at the scatterplot alongside r.

Caution: correlation does not imply causation

Definitions

Causation: A relationship in which changes in one variable directly produce changes in another variable.
Lurking variable: A factor that influences both variables under study, potentially creating a spurious correlation.
Reverse causation: A situation in which the assumed direction of causation is backwards - variable B may cause variable A, rather than A causing B.

Example: Correlation versus causation

More firefighters are associated with higher damage costs, and ice cream sales are correlated with drowning rates in summer. What explains these correlations? And in a study where people who exercise regularly tend to have better health outcomes, could the causation run the other way?

(spoiler)

When to use a scatterplot

A scatterplot is the right choice when you have two quantitative (numerical) variables and want to explore the relationship between them. Use a different graph type when:

One variable is categorical (e.g., comparing test scores across three teaching methods) → use a bar graph or side-by-side boxplot
You want to show how one variable changes over time → use a line graph
You want to show the distribution of a single variable → use a histogram or dotplot

Scatterplots display paired quantitative data by plotting each $(x, y)$ pair as a point on a coordinate grid.
The horizontal axis represents one variable, and the vertical axis represents another.
Overall patterns are identified by observing whether points trend upward (positive), downward (negative), or form a random cloud (no linear association).
The strength of a relationship is judged by how closely points cluster around an imaginary straight line.
Outliers are points that fall far from the main pattern and may affect interpretation.
Visual evidence from a scatterplot should be considered before applying numerical measures like correlation.
A visible relationship does not imply causation and should be interpreted in context.

Interpreting scatterplots

When you interpret a scatterplot, start by looking for the overall pattern. The points might:

trend upward
trend downward
form a random cloud

These visual cues help you decide whether a linear relationship might exist and whether tools like correlation or regression would be useful.

Definitions

Scatterplot: A display of paired data points on coordinate axes showing how two numerical variables relate.
Positive linear relationship: A pattern in which points trend upward from left to right, indicating that both variables increase together.
Negative linear relationship: A pattern in which points trend downward from left to right, indicating that one variable decreases while the other increases.
No correlation: A pattern in which points form a random cloud with no discernible upward or downward trend.
Correlation: A measure of the strength and direction of a linear relationship between two variables, often summarized by a correlation coefficient. A correlation does not necessarily imply a causal relationship between the variables.

clusters (subgroups in your data that behave differently)
outliers (points far from the main pattern, possibly due to measurement error or unusual conditions)

Example: Temperature and hot chocolate sales

Data for temperature versus hot chocolate sales at a café over several days. As the weather warms up, sales tend to drop, showing a clear negative trend.

Answer the following questions:

What type of relationship is shown, how can you tell, and what does it mean in context?

If the temperature rises to 90°F, approximately how many cups of hot chocolate would you expect to sell? Explain your reasoning.

(spoiler)

Answer:

The scatterplot shows a strong negative correlation between temperature and hot chocolate sales. You can see this from the clear downward trend from left to right. In context, as outdoor temperature increases, hot chocolate sales decrease, which fits the idea that people buy fewer hot drinks in warm weather.
90°F is outside the observed range (30-82°F), so any prediction is extrapolation and unreliable - the trend may not continue.

The correlation coefficient r

Scatterplots let you see the direction and rough strength of a relationship visually, but you can also measure it numerically using the correlation coefficient, written as r.

r ranges from −1 to 1.
The sign tells you the direction: positive r means a positive (upward) trend; negative r means a negative (downward) trend.
The magnitude (how far r is from 0) tells you the strength: values near ±1 indicate a strong relationship, values near 0 indicate little or no linear relationship, and values in between indicate moderate strength.

Sidenote

r measures only linear relationships

Two variables can have a strong curved relationship and still produce an r near 0. Always look at the scatterplot alongside r.

Caution: correlation does not imply causation

Definitions

Causation: A relationship in which changes in one variable directly produce changes in another variable.
Lurking variable: A factor that influences both variables under study, potentially creating a spurious correlation.
Reverse causation: A situation in which the assumed direction of causation is backwards - variable B may cause variable A, rather than A causing B.

Example: Correlation versus causation

More firefighters are associated with higher damage costs, and ice cream sales are correlated with drowning rates in summer. What explains these correlations? And in a study where people who exercise regularly tend to have better health outcomes, could the causation run the other way?

(spoiler)

When to use a scatterplot

A scatterplot is the right choice when you have two quantitative (numerical) variables and want to explore the relationship between them. Use a different graph type when:

One variable is categorical (e.g., comparing test scores across three teaching methods) → use a bar graph or side-by-side boxplot
You want to show how one variable changes over time → use a line graph
You want to show the distribution of a single variable → use a histogram or dotplot