2.2 Understanding and representing data

Achievable Praxis Core: Math (5733)

2. Data analysis, statistics, and probability

Our Praxis Core: Math course is currently in development and is a work-in-progress.

Understanding and representing data

9 min read

Font

Discuss

Feedback

Data can be organized and displayed in many formats. Choosing the right representation makes patterns easier to spot and conclusions easier to support. Being able to read, create, and interpret these displays is a key skill for solving real-world problems.

Types of data

There are two major types of data:

Categorical (Qualitative): describes qualities or characteristics (e.g., favorite color, type of pet)
Numerical (Quantitative): measures or counts something (e.g., height, number of siblings)

Knowing the type of data helps you choose a display that matches what the data can - and can’t - show.

Common types of data displays

Display type	Description	Best used for
Table	Organizes values in rows and columns	All types of raw data
Bar graph	Uses bars to compare frequencies or categories	Categorical data
Line graph	Shows trends over time using points connected by lines	Time-based changes in data
Circle graph	Also called a pie chart (the more commonly used name), shows parts of a whole	Percentages or proportions
Histogram	A bar graph where each bar represents a range (or bin) of values; bars touch with no gaps	Distribution of numerical data
Stem-and-leaf	Displays quantitative data in a way that preserves actual data points	Small data sets of numbers
Boxplot	Summarizes a dataset using quartiles, median, and outliers	Comparing multiple sets, identifying spread
Scatterplot	Plots two numerical variables to show correlation or relationships	Bivariate numerical data

Choosing the right display

The best display depends on the type of data and what you want the reader to notice.

Example: Categorical data

A teacher surveys students on their favorite ice cream flavor. The results are:

Flavor Students

Vanilla 10

Chocolate 8

Strawberry 5

Mint Chip 7

What type of graph best displays this data?

Flavor	Students
Vanilla	10
Chocolate	8
Strawberry	5
Mint Chip	7

(spoiler)

Answer: A bar graph.

A bar graph fits because the data are categories (flavors). Each bar represents one flavor, and the bar height shows how many students chose it. That makes comparisons across categories quick and clear.

Example: Numerical data over time

A student tracks their daily screen time for a week:

Day Hours

Monday 3

Tuesday 2.5

Wednesday 4

Thursday 3.5

Friday 5

What type of graph best displays this data?

Day	Hours
Monday	3
Tuesday	2.5
Wednesday	4
Thursday	3.5
Friday	5

(spoiler)

Answer: A line graph.

A line graph works well because the data are numerical and ordered by time. Plotting the points and connecting them highlights day-to-day changes and makes overall trends easy to see.

Interpreting visual data

When reading a graph or chart, pay close attention to:

Titles and labels
Explain what the graph represents and what each axis or category means.

Scale
Check whether spacing is consistent and whether units are clearly shown. Be aware that a non-zero baseline or inconsistent axis scaling can make small differences look much larger than they are - always check the scale before drawing conclusions.

Trends
Look for overall increases, decreases, or patterns over time.

Outliers
Identify values that don’t fit the overall pattern, since they can affect averages and interpretations.

Example: Applications of graphical data

$1650$ people responded to the survey saying SciFi was their favorite genre. The circle graph below shows the percentage breakdown of responses by genre; the SciFi slice is labeled $16.2%$ . Use the circle graph to find the total number of people surveyed.

Favorite movie genre

(spoiler)

The circle graph shows that SciFi accounts for $16.2%$ of all responses. Since $1650$ people represent that share, divide to find the total:

$1650 \div 0.162 \approx 10, 185$

Because the displayed percent is rounded, the answer is approximate.

Answer: Approximately $10, 185$ people were surveyed.

Interpreting stem-and-leaf plots

A stem-and-leaf plot shows the distribution of a small numerical dataset while keeping the exact data values. The stem contains the leading digit(s), and the leaf contains the final digit of each number.

Example: Reading a stem-and-leaf plot

A stem-and-leaf plot shows quiz scores for a class:

Stem Leaf

$6$ $58$

$7$ $147$

$8$ $229$

Key: $6 ∣ 5 = 65$

How many students scored in the 70s?

What is the highest score?

Stem	Leaf
$6$	$58$
$7$	$147$
$8$	$229$

(spoiler)

Each row lists the ones digits for all scores with that tens digit. Stem $7$ has leaves $1$ , $4$ , and $7$ , so three students scored $71$ , $74$ , and $77$ .

The highest score is the largest leaf on the largest stem: stem $8$ , leaf $9$ = $89$ .

Answer: Three students scored in the 70s. The highest score is $89$ .

Interpreting histograms

A histogram looks similar to a bar graph, but it displays the distribution of numerical data grouped into ranges called bins (or intervals). Unlike bar graphs, histogram bars touch each other - there are no gaps - because the bins represent a continuous range of values.

Each bar’s height shows how many data values fall within that bin (the frequency).

Sidenote

Bin boundaries

Histogram bins are typically half-open intervals - for example, a bin labeled “2-4 hours” includes 2 but excludes 4, written as $[2, 4)$ . This means a value of exactly 4 belongs to the next bin, $[4, 6)$ , not the current one. This convention ensures each data value falls into exactly one bin. Watch for this on the exam when summing frequencies across adjacent bins.

Example: Reading a histogram

A histogram shows the number of hours students spent studying for an exam. The bins are: 0-2 hours, 2-4 hours, 4-6 hours, 6-8 hours. The bar heights are 3, 8, 12, 5 respectively.

Which bin has the most students?

How many students spent fewer than 4 hours studying?

(spoiler)

Read the height of each bar to get the frequency for each bin.

0-2 hours: 3 students
2-4 hours: 8 students
4-6 hours: 12 students
6-8 hours: 5 students

The tallest bar is the 4-6 hours bin with 12 students.

For fewer than 4 hours, add the frequencies of the first two bins: $3 + 8 = 11$ .

Answer: The 4-6 hour bin has the most students (12). Eleven students spent fewer than 4 hours studying.

Interpreting boxplots

A boxplot (also called a box-and-whisker plot) summarizes a dataset using five key values:

Minimum: the smallest value in the dataset
Q1 (first quartile): the median of the lower half of the data
Median (Q2): the middle value of the entire dataset
Q3 (third quartile): the median of the upper half of the data
Maximum: the largest value in the dataset

The box stretches from Q1 to Q3, and a line inside the box marks the median. The whiskers extend from the box out to the minimum and maximum. The distance from Q1 to Q3 is called the interquartile range (IQR): $IQR = Q 3 - Q 1$ . Values that fall far outside the whiskers may be flagged as outliers.

Example: Reading a boxplot

A boxplot for student test scores has the following five-number summary: Min = $52$ , Q1 = $65$ , Median = $73.5$ , Q3 = $80$ , Max = $95$

What is the IQR?

What percent of students scored between $65$ and $80$ ?

(spoiler)

The IQR is the distance from Q1 to Q3:

$IQR = Q 3 - Q 1 = 80 - 65 = 15$

The box spans Q1 to Q3, which always contains the middle $50%$ of the data.

Answer: The IQR is $15$ . The middle $50%$ of students scored between $65$ and $80$ .

Interpreting scatterplots

Scatterplots show the relationship between two numerical variables by plotting individual data points on a coordinate grid. The $x$ -coordinate represents one variable and the $y$ -coordinate represents the other. Each point corresponds to one observation.

When interpreting a scatterplot, look for:

Direction: Does the pattern increase (positive correlation) or decrease (negative correlation)? No clear direction means no correlation.
Form: Is the relationship roughly linear or curved?
Strength: Are the points tightly clustered (strong) or widely scattered (weak)?
Outliers: Are there points far from the overall pattern?

Watch out: correlation vs. causation. A scatterplot can suggest a relationship between two variables, but correlation does not imply causation - a third factor may explain both.

Example: Study time vs. test scores

The scatterplot below shows the relationship between the number of hours students studied and their test scores. Use it to answer the question that follows.

Study time vs. test scores

How many students who studied for more than $4$ hours received an A (scored above $90$ %)?

(spoiler)

To answer this question, apply both conditions at the same time:

Study time greater than $4$ hours (points to the right of $4$ on the horizontal axis)
Test scores above $90%$ (points above $90$ on the vertical axis)

Count only the points that satisfy both conditions.

From the scatterplot, there are $4$ points that lie to the right of $4$ hours and above the $90%$ score line.

Answer: $4$ students studied more than $4$ hours and scored above $90%$ .

Understanding and representing data

Types of data

There are two major types of data:

Categorical (Qualitative): describes qualities or characteristics (e.g., favorite color, type of pet)
Numerical (Quantitative): measures or counts something (e.g., height, number of siblings)

Knowing the type of data helps you choose a display that matches what the data can - and can’t - show.

Common types of data displays

Display type	Description	Best used for
Table	Organizes values in rows and columns	All types of raw data
Bar graph	Uses bars to compare frequencies or categories	Categorical data
Line graph	Shows trends over time using points connected by lines	Time-based changes in data
Circle graph	Also called a pie chart (the more commonly used name), shows parts of a whole	Percentages or proportions
Histogram	A bar graph where each bar represents a range (or bin) of values; bars touch with no gaps	Distribution of numerical data
Stem-and-leaf	Displays quantitative data in a way that preserves actual data points	Small data sets of numbers
Boxplot	Summarizes a dataset using quartiles, median, and outliers	Comparing multiple sets, identifying spread
Scatterplot	Plots two numerical variables to show correlation or relationships	Bivariate numerical data

Choosing the right display

The best display depends on the type of data and what you want the reader to notice.

Example: Categorical data

A teacher surveys students on their favorite ice cream flavor. The results are:

Flavor Students

Vanilla 10

Chocolate 8

Strawberry 5

Mint Chip 7

What type of graph best displays this data?

Flavor	Students
Vanilla	10
Chocolate	8
Strawberry	5
Mint Chip	7

(spoiler)

Answer: A bar graph.

Example: Numerical data over time

A student tracks their daily screen time for a week:

Day Hours

Monday 3

Tuesday 2.5

Wednesday 4

Thursday 3.5

Friday 5

What type of graph best displays this data?

Day	Hours
Monday	3
Tuesday	2.5
Wednesday	4
Thursday	3.5
Friday	5

(spoiler)

Answer: A line graph.

A line graph works well because the data are numerical and ordered by time. Plotting the points and connecting them highlights day-to-day changes and makes overall trends easy to see.

Interpreting visual data

When reading a graph or chart, pay close attention to:

Titles and labels
Explain what the graph represents and what each axis or category means.

Trends
Look for overall increases, decreases, or patterns over time.

Outliers
Identify values that don’t fit the overall pattern, since they can affect averages and interpretations.

Example: Applications of graphical data

$1650$ people responded to the survey saying SciFi was their favorite genre. The circle graph below shows the percentage breakdown of responses by genre; the SciFi slice is labeled $16.2%$ . Use the circle graph to find the total number of people surveyed.

(spoiler)

The circle graph shows that SciFi accounts for $16.2%$ of all responses. Since $1650$ people represent that share, divide to find the total:

$1650 \div 0.162 \approx 10, 185$

Because the displayed percent is rounded, the answer is approximate.

Answer: Approximately $10, 185$ people were surveyed.

Interpreting stem-and-leaf plots

Example: Reading a stem-and-leaf plot

A stem-and-leaf plot shows quiz scores for a class:

Stem Leaf

$6$ $58$

$7$ $147$

$8$ $229$

Key: $6 ∣ 5 = 65$

How many students scored in the 70s?

What is the highest score?

Stem	Leaf
$6$	$58$
$7$	$147$
$8$	$229$

(spoiler)

Each row lists the ones digits for all scores with that tens digit. Stem $7$ has leaves $1$ , $4$ , and $7$ , so three students scored $71$ , $74$ , and $77$ .

The highest score is the largest leaf on the largest stem: stem $8$ , leaf $9$ = $89$ .

Answer: Three students scored in the 70s. The highest score is $89$ .

Interpreting histograms

Each bar’s height shows how many data values fall within that bin (the frequency).

Sidenote

Bin boundaries

Example: Reading a histogram

A histogram shows the number of hours students spent studying for an exam. The bins are: 0-2 hours, 2-4 hours, 4-6 hours, 6-8 hours. The bar heights are 3, 8, 12, 5 respectively.

Which bin has the most students?

How many students spent fewer than 4 hours studying?

(spoiler)

Read the height of each bar to get the frequency for each bin.

0-2 hours: 3 students
2-4 hours: 8 students
4-6 hours: 12 students
6-8 hours: 5 students

The tallest bar is the 4-6 hours bin with 12 students.

For fewer than 4 hours, add the frequencies of the first two bins: $3 + 8 = 11$ .

Answer: The 4-6 hour bin has the most students (12). Eleven students spent fewer than 4 hours studying.

Interpreting boxplots

A boxplot (also called a box-and-whisker plot) summarizes a dataset using five key values:

Minimum: the smallest value in the dataset
Q1 (first quartile): the median of the lower half of the data
Median (Q2): the middle value of the entire dataset
Q3 (third quartile): the median of the upper half of the data
Maximum: the largest value in the dataset

Example: Reading a boxplot

A boxplot for student test scores has the following five-number summary: Min = $52$ , Q1 = $65$ , Median = $73.5$ , Q3 = $80$ , Max = $95$

What is the IQR?

What percent of students scored between $65$ and $80$ ?

(spoiler)

The IQR is the distance from Q1 to Q3:

$IQR = Q 3 - Q 1 = 80 - 65 = 15$

The box spans Q1 to Q3, which always contains the middle $50%$ of the data.

Answer: The IQR is $15$ . The middle $50%$ of students scored between $65$ and $80$ .

Interpreting scatterplots

When interpreting a scatterplot, look for:

Direction: Does the pattern increase (positive correlation) or decrease (negative correlation)? No clear direction means no correlation.
Form: Is the relationship roughly linear or curved?
Strength: Are the points tightly clustered (strong) or widely scattered (weak)?
Outliers: Are there points far from the overall pattern?

Watch out: correlation vs. causation. A scatterplot can suggest a relationship between two variables, but correlation does not imply causation - a third factor may explain both.

Example: Study time vs. test scores

The scatterplot below shows the relationship between the number of hours students studied and their test scores. Use it to answer the question that follows.

How many students who studied for more than $4$ hours received an A (scored above $90$ %)?

(spoiler)

To answer this question, apply both conditions at the same time:

Study time greater than $4$ hours (points to the right of $4$ on the horizontal axis)
Test scores above $90%$ (points above $90$ on the vertical axis)

Count only the points that satisfy both conditions.

From the scatterplot, there are $4$ points that lie to the right of $4$ hours and above the $90%$ score line.

Answer: $4$ students studied more than $4$ hours and scored above $90%$ .

Achievable Praxis Core: Math (5733)

2. Data analysis, statistics, and probability

Our Praxis Core: Math course is currently in development and is a work-in-progress.

Understanding and representing data

9 min read

Font

Discuss

Feedback

Types of data

There are two major types of data:

Categorical (Qualitative): describes qualities or characteristics (e.g., favorite color, type of pet)
Numerical (Quantitative): measures or counts something (e.g., height, number of siblings)

Knowing the type of data helps you choose a display that matches what the data can - and can’t - show.

Common types of data displays

Display type	Description	Best used for
Table	Organizes values in rows and columns	All types of raw data
Bar graph	Uses bars to compare frequencies or categories	Categorical data
Line graph	Shows trends over time using points connected by lines	Time-based changes in data
Circle graph	Also called a pie chart (the more commonly used name), shows parts of a whole	Percentages or proportions
Histogram	A bar graph where each bar represents a range (or bin) of values; bars touch with no gaps	Distribution of numerical data
Stem-and-leaf	Displays quantitative data in a way that preserves actual data points	Small data sets of numbers
Boxplot	Summarizes a dataset using quartiles, median, and outliers	Comparing multiple sets, identifying spread
Scatterplot	Plots two numerical variables to show correlation or relationships	Bivariate numerical data

Choosing the right display

The best display depends on the type of data and what you want the reader to notice.

Example: Categorical data

A teacher surveys students on their favorite ice cream flavor. The results are:

Flavor Students

Vanilla 10

Chocolate 8

Strawberry 5

Mint Chip 7

What type of graph best displays this data?

Flavor	Students
Vanilla	10
Chocolate	8
Strawberry	5
Mint Chip	7

(spoiler)

Answer: A bar graph.

Example: Numerical data over time

A student tracks their daily screen time for a week:

Day Hours

Monday 3

Tuesday 2.5

Wednesday 4

Thursday 3.5

Friday 5

What type of graph best displays this data?

Day	Hours
Monday	3
Tuesday	2.5
Wednesday	4
Thursday	3.5
Friday	5

(spoiler)

Answer: A line graph.

A line graph works well because the data are numerical and ordered by time. Plotting the points and connecting them highlights day-to-day changes and makes overall trends easy to see.

Interpreting visual data

When reading a graph or chart, pay close attention to:

Titles and labels
Explain what the graph represents and what each axis or category means.

Trends
Look for overall increases, decreases, or patterns over time.

Outliers
Identify values that don’t fit the overall pattern, since they can affect averages and interpretations.

Example: Applications of graphical data

$1650$ people responded to the survey saying SciFi was their favorite genre. The circle graph below shows the percentage breakdown of responses by genre; the SciFi slice is labeled $16.2%$ . Use the circle graph to find the total number of people surveyed.

Favorite movie genre

(spoiler)

The circle graph shows that SciFi accounts for $16.2%$ of all responses. Since $1650$ people represent that share, divide to find the total:

$1650 \div 0.162 \approx 10, 185$

Because the displayed percent is rounded, the answer is approximate.

Answer: Approximately $10, 185$ people were surveyed.

Interpreting stem-and-leaf plots

Example: Reading a stem-and-leaf plot

A stem-and-leaf plot shows quiz scores for a class:

Stem Leaf

$6$ $58$

$7$ $147$

$8$ $229$

Key: $6 ∣ 5 = 65$

How many students scored in the 70s?

What is the highest score?

Stem	Leaf
$6$	$58$
$7$	$147$
$8$	$229$

(spoiler)

Each row lists the ones digits for all scores with that tens digit. Stem $7$ has leaves $1$ , $4$ , and $7$ , so three students scored $71$ , $74$ , and $77$ .

The highest score is the largest leaf on the largest stem: stem $8$ , leaf $9$ = $89$ .

Answer: Three students scored in the 70s. The highest score is $89$ .

Interpreting histograms

Each bar’s height shows how many data values fall within that bin (the frequency).

Sidenote

Bin boundaries

Example: Reading a histogram

A histogram shows the number of hours students spent studying for an exam. The bins are: 0-2 hours, 2-4 hours, 4-6 hours, 6-8 hours. The bar heights are 3, 8, 12, 5 respectively.

Which bin has the most students?

How many students spent fewer than 4 hours studying?

(spoiler)

Read the height of each bar to get the frequency for each bin.

0-2 hours: 3 students
2-4 hours: 8 students
4-6 hours: 12 students
6-8 hours: 5 students

The tallest bar is the 4-6 hours bin with 12 students.

For fewer than 4 hours, add the frequencies of the first two bins: $3 + 8 = 11$ .

Answer: The 4-6 hour bin has the most students (12). Eleven students spent fewer than 4 hours studying.

Interpreting boxplots

A boxplot (also called a box-and-whisker plot) summarizes a dataset using five key values:

Minimum: the smallest value in the dataset
Q1 (first quartile): the median of the lower half of the data
Median (Q2): the middle value of the entire dataset
Q3 (third quartile): the median of the upper half of the data
Maximum: the largest value in the dataset

Example: Reading a boxplot

A boxplot for student test scores has the following five-number summary: Min = $52$ , Q1 = $65$ , Median = $73.5$ , Q3 = $80$ , Max = $95$

What is the IQR?

What percent of students scored between $65$ and $80$ ?

(spoiler)

The IQR is the distance from Q1 to Q3:

$IQR = Q 3 - Q 1 = 80 - 65 = 15$

The box spans Q1 to Q3, which always contains the middle $50%$ of the data.

Answer: The IQR is $15$ . The middle $50%$ of students scored between $65$ and $80$ .

Interpreting scatterplots

When interpreting a scatterplot, look for:

Direction: Does the pattern increase (positive correlation) or decrease (negative correlation)? No clear direction means no correlation.
Form: Is the relationship roughly linear or curved?
Strength: Are the points tightly clustered (strong) or widely scattered (weak)?
Outliers: Are there points far from the overall pattern?

Watch out: correlation vs. causation. A scatterplot can suggest a relationship between two variables, but correlation does not imply causation - a third factor may explain both.

Example: Study time vs. test scores

The scatterplot below shows the relationship between the number of hours students studied and their test scores. Use it to answer the question that follows.

Study time vs. test scores

How many students who studied for more than $4$ hours received an A (scored above $90$ %)?

(spoiler)

To answer this question, apply both conditions at the same time:

Study time greater than $4$ hours (points to the right of $4$ on the horizontal axis)
Test scores above $90%$ (points above $90$ on the vertical axis)

Count only the points that satisfy both conditions.

From the scatterplot, there are $4$ points that lie to the right of $4$ hours and above the $90%$ score line.

Answer: $4$ students studied more than $4$ hours and scored above $90%$ .

Understanding and representing data

Types of data

There are two major types of data:

Categorical (Qualitative): describes qualities or characteristics (e.g., favorite color, type of pet)
Numerical (Quantitative): measures or counts something (e.g., height, number of siblings)

Knowing the type of data helps you choose a display that matches what the data can - and can’t - show.

Common types of data displays

Display type	Description	Best used for
Table	Organizes values in rows and columns	All types of raw data
Bar graph	Uses bars to compare frequencies or categories	Categorical data
Line graph	Shows trends over time using points connected by lines	Time-based changes in data
Circle graph	Also called a pie chart (the more commonly used name), shows parts of a whole	Percentages or proportions
Histogram	A bar graph where each bar represents a range (or bin) of values; bars touch with no gaps	Distribution of numerical data
Stem-and-leaf	Displays quantitative data in a way that preserves actual data points	Small data sets of numbers
Boxplot	Summarizes a dataset using quartiles, median, and outliers	Comparing multiple sets, identifying spread
Scatterplot	Plots two numerical variables to show correlation or relationships	Bivariate numerical data

Choosing the right display

The best display depends on the type of data and what you want the reader to notice.

Example: Categorical data

A teacher surveys students on their favorite ice cream flavor. The results are:

Flavor Students

Vanilla 10

Chocolate 8

Strawberry 5

Mint Chip 7

What type of graph best displays this data?

Flavor	Students
Vanilla	10
Chocolate	8
Strawberry	5
Mint Chip	7

(spoiler)

Answer: A bar graph.

Example: Numerical data over time

A student tracks their daily screen time for a week:

Day Hours

Monday 3

Tuesday 2.5

Wednesday 4

Thursday 3.5

Friday 5

What type of graph best displays this data?

Day	Hours
Monday	3
Tuesday	2.5
Wednesday	4
Thursday	3.5
Friday	5

(spoiler)

Answer: A line graph.

A line graph works well because the data are numerical and ordered by time. Plotting the points and connecting them highlights day-to-day changes and makes overall trends easy to see.

Interpreting visual data

When reading a graph or chart, pay close attention to:

Titles and labels
Explain what the graph represents and what each axis or category means.

Trends
Look for overall increases, decreases, or patterns over time.

Outliers
Identify values that don’t fit the overall pattern, since they can affect averages and interpretations.

Example: Applications of graphical data

$1650$ people responded to the survey saying SciFi was their favorite genre. The circle graph below shows the percentage breakdown of responses by genre; the SciFi slice is labeled $16.2%$ . Use the circle graph to find the total number of people surveyed.

(spoiler)

The circle graph shows that SciFi accounts for $16.2%$ of all responses. Since $1650$ people represent that share, divide to find the total:

$1650 \div 0.162 \approx 10, 185$

Because the displayed percent is rounded, the answer is approximate.

Answer: Approximately $10, 185$ people were surveyed.

Interpreting stem-and-leaf plots

Example: Reading a stem-and-leaf plot

A stem-and-leaf plot shows quiz scores for a class:

Stem Leaf

$6$ $58$

$7$ $147$

$8$ $229$

Key: $6 ∣ 5 = 65$

How many students scored in the 70s?

What is the highest score?

Stem	Leaf
$6$	$58$
$7$	$147$
$8$	$229$

(spoiler)

Each row lists the ones digits for all scores with that tens digit. Stem $7$ has leaves $1$ , $4$ , and $7$ , so three students scored $71$ , $74$ , and $77$ .

The highest score is the largest leaf on the largest stem: stem $8$ , leaf $9$ = $89$ .

Answer: Three students scored in the 70s. The highest score is $89$ .

Interpreting histograms

Each bar’s height shows how many data values fall within that bin (the frequency).

Sidenote

Bin boundaries

Example: Reading a histogram

A histogram shows the number of hours students spent studying for an exam. The bins are: 0-2 hours, 2-4 hours, 4-6 hours, 6-8 hours. The bar heights are 3, 8, 12, 5 respectively.

Which bin has the most students?

How many students spent fewer than 4 hours studying?

(spoiler)

Read the height of each bar to get the frequency for each bin.

0-2 hours: 3 students
2-4 hours: 8 students
4-6 hours: 12 students
6-8 hours: 5 students

The tallest bar is the 4-6 hours bin with 12 students.

For fewer than 4 hours, add the frequencies of the first two bins: $3 + 8 = 11$ .

Answer: The 4-6 hour bin has the most students (12). Eleven students spent fewer than 4 hours studying.

Interpreting boxplots

A boxplot (also called a box-and-whisker plot) summarizes a dataset using five key values:

Minimum: the smallest value in the dataset
Q1 (first quartile): the median of the lower half of the data
Median (Q2): the middle value of the entire dataset
Q3 (third quartile): the median of the upper half of the data
Maximum: the largest value in the dataset

Example: Reading a boxplot

A boxplot for student test scores has the following five-number summary: Min = $52$ , Q1 = $65$ , Median = $73.5$ , Q3 = $80$ , Max = $95$

What is the IQR?

What percent of students scored between $65$ and $80$ ?

(spoiler)

The IQR is the distance from Q1 to Q3:

$IQR = Q 3 - Q 1 = 80 - 65 = 15$

The box spans Q1 to Q3, which always contains the middle $50%$ of the data.

Answer: The IQR is $15$ . The middle $50%$ of students scored between $65$ and $80$ .

Interpreting scatterplots

When interpreting a scatterplot, look for:

Direction: Does the pattern increase (positive correlation) or decrease (negative correlation)? No clear direction means no correlation.
Form: Is the relationship roughly linear or curved?
Strength: Are the points tightly clustered (strong) or widely scattered (weak)?
Outliers: Are there points far from the overall pattern?

Watch out: correlation vs. causation. A scatterplot can suggest a relationship between two variables, but correlation does not imply causation - a third factor may explain both.

Example: Study time vs. test scores

The scatterplot below shows the relationship between the number of hours students studied and their test scores. Use it to answer the question that follows.

How many students who studied for more than $4$ hours received an A (scored above $90$ %)?

(spoiler)

To answer this question, apply both conditions at the same time:

Study time greater than $4$ hours (points to the right of $4$ on the horizontal axis)
Test scores above $90%$ (points above $90$ on the vertical axis)

Count only the points that satisfy both conditions.

From the scatterplot, there are $4$ points that lie to the right of $4$ hours and above the $90%$ score line.

Answer: $4$ students studied more than $4$ hours and scored above $90%$ .