Interpreting data
Spotting patterns, trends, and outliers
Patterns and trends describe how data values behave as a group - for example, steadily rising, steadily falling, or clustering around certain values. Outliers matter because they can distort summary measures like the mean or the range.
Finding the five-number summary
Take the following data set:
The values below show the five-number summary for this data set. (If you need a refresher on how each value is found, refer back to section of this unit.)
Minimum (Min) =
Maximum (Max) =
Median () =
First quartile () =
Third quartile () =
Visual interpretation
Some problems ask you to choose the correct box-and-whisker plot (also called a box plot) for a given data set. To do that, you need to know how the five-number summary maps onto the picture.
- A box plot can be drawn horizontally or vertically.
- In a horizontal box plot, the minimum is on the left and the maximum is on the right.
- In a vertical box plot, the minimum is at the bottom and the maximum is at the top.
Let’s sketch one using the five-number summary from the previous example.
Example: Drawing a five-number summary
Suppose you want to manually construct the boxplot.
To create a box plot, start by ordering the data set:
To draw the box plot:
- Draw a box from to
- Place a vertical line inside the box at the median
- Extend whiskers from the box out to the minimum and maximum
Answer: Draw the box from to , place the median line at , and draw whiskers to and .
Example: Identify the outlier and compare means
Find the following:
- The outlier
- The mean of all eight values
- The mean of the seven typical values excluding the outlier
Order the data:
- First we need to compute the IQR by finding the corresponding and values.
- Recall we find the IQR by taking the and subtracting it from the value.
By the definition of outliers, if the number in the data set exceeds the sum of and then it is an outlier.
- Outlier cutoff: Since , it is an outlier.
- Mean of all values:
- Mean without outlier:
Answer: The outlier is . The mean of all values is , and the mean without the outlier is approximately .
Example: Detect trend, cluster, and outlier Given the following data set describe cluster(s), trend, or outlier(s) if applicable.
- Two clusters: - and -
- Could also argue three clusters -, -, and -
Given the data set:
-
The data is already ordered from least to greatest.
-
Median (): There are values in the data set. Since is even, the median is the average of the and values:
-
Lower quartile (): The lower half of the data includes the values below the median: This set has values, so is the middle (third) value:
-
Upper quartile (): The upper half of the data includes the values above the median: This set also has values, so is the middle (third) value:
-
Interquartile range (IQR):
To determine if any values in the data set are outliers, we use the rule.
We already found:
Now calculate the lower and upper bounds for outliers:
- Lower bound =
- Upper bound =
Any value below or above is considered an outlier. Since all values in the data set fall between and , there are no outliers.
Answer: The data show two clusters (- and -) and no outliers.
Justifying conclusions with data
Strong conclusions point to specific numbers, trends, or features in the display, and they avoid claims the data can’t support.
Example: Trend description and justification Monthly website visits are shown below:
| Month | Visits |
|---|---|
| Jan | 1200 |
| Feb | 1500 |
| Mar | 1800 |
Write a sentence describing the trend and justify it with data.
Visits rose by exactly each month from in January to in March, indicating a consistent upward trend likely driven by marketing or seasonal factors.
Answer: Visits increased by each month from January to March.
