Understanding central tendencies
Later, you’ll learn formal rules for identifying outliers. For now, treat an outlier as a value that looks unusually far from the rest of the data.
We’ll compare two data sets: one without an extreme value and one with an extreme value.
Example: Dataset without extreme values
- First, order the data: .
- Next, compute the mean by adding the values and dividing by the number of values.
- .
- The median is the middle value in the ordered list. Here, the third value is .
- Check whether any value occurs most often (the mode). Since all values are unique, there is no mode.
Answer: Mean , median , there is no mode
Example: Dataset with an extreme value
- Order the data: .
- Compute the measures of central tendency: mean, median, and mode.
- .
- Since there are 5 values (an odd number), the median is the 3rd value in the ordered list, which is .
- Since every value is unique, there is no mode.
Answer: Mean , median , there is no mode.
Measures of spread
Measures of spread describe how much the values in a data set vary around the center.
Common measures include:
- Range (difference between the highest and lowest values)
- Interquartile range (IQR) (spread of the middle 50% of the data)
- Variance (average squared deviation from the mean)
- Standard deviation (square root of the variance, describing typical distance from the mean)
These measures help you describe consistency versus variability and can hint at outliers or clustering. For the Praxis, focus on range, interquartile range, and standard deviation. Introductory statistics courses go deeper into variance and its role.
Range
- Formula
- Total span of the data. Sensitive to extreme values.
Interquartile range IQR
- Formula
- Spread of the middle fifty percent of data.
- Ignores extremes.
Standard deviation (population)
- Formula
- Average distance of values from the mean.
Example: Computing range and IQR
- To find the interquartile range (IQR), start by ordering the data and identifying the median, which is 7. Then split the data into a lower half and an upper half around the median.
- The lower half is , so the first quartile is the average of those two values: .
- The upper half is , so the third quartile is .
- The IQR is then calculated as .
Answer:
Example: Computing standard deviation
Compute the mean,
To calculate the standard deviation, start by finding the deviations of each data point from the mean. If the mean is 7, then the deviations are: , , , , .
Next, square each deviation to eliminate negatives: , , , , and .
These squared deviations are , which sum to 20.
Divide this sum by the number of data points (5) to find the variance: .
Finally, take the square root of the variance to get the standard deviation: .
Answer:
Choosing the right measure
| Situation | Best measure of center | Best measure of spread | Why |
|---|---|---|---|
| Categorical data (e.g., favorite color) | Mode | Not applicable | Mean and median cannot be calculated, only mode makes sense. |
| Numerical data, no extreme values, symmetric | Mean | Standard deviation | Uses all values, accurate for well-behaved data. |
| Numerical data, no extreme values, skewed | Median | IQR | Median resists skew, IQR ignores extremes. |
| Numerical data with extreme values (outliers) | Median | IQR | Both are resistant to outliers, mean and standard deviation would be distorted. |
| Small data set | Median | Range | Easy to compute, IQR and standard deviation less meaningful with tiny samples. |
Quick rule:
- Mean and standard deviation: only when data is numeric, roughly symmetric, and outlier free.
- Median and IQR: use when data is numeric but skewed or has outliers.
- Mode: use for categorical data or when identifying the most common value matters.
Effects of transformations
When a constant is added to each value in a data set, the mean, median, and mode all increase by . Measures of spread such as the range, interquartile range (IQR), and standard deviation stay the same, because the distances between values don’t change.
In contrast, when each data value is multiplied by a positive constant , the mean, median, and mode are all multiplied by . The range, IQR, and standard deviation are also multiplied by , because all distances are scaled by the same factor.
Example: Applying transformations
The original data set has the following statistics (as computed previously):
- Mean =
- Median =
- Range =
- Interquartile range (IQR) =
- Standard deviation (SD) =
After adding to each value, the new data set is:
- Mean =
- Median =
- Range = (unchanged)
- IQR = (unchanged)
- SD = (unchanged)
After multiplying each value by , the new data set is:
- Mean =
- Median =
- Range =
- IQR =
- SD =
Answer:
- Adding a constant shifts the mean and median but does not change range, IQR, or SD.
- Multiplying by a constant scales the mean, median, range, IQR, and SD by that factor.
Example
For the data set above, find the following:
- Mean
- Median
- Mode
- Range
- Interquartile range (IQR)
- Standard deviation
-
The mean is calculated as
-
The ordered data is: . Since there are 8 values, the median is the average of the 4th and 5th values the
-
Mode is the value that occurs most often is 7 and 12, which appears twice.
-
-
The lower half of the data is: .
-
So, is the average of the 2nd and 3rd values:
-
The upper half is: . So, is the average of the 2nd and 3rd values:
-
-
To find the standard deviation we use the mean , compute the deviations:
-
Now square each deviation:
-
The sum of these squared deviations is approximately .
-
Divide by 8 to get the variance:
-
Then take the square root to calculate the standard deviation:
Answer:
- Mean:
- Median:
- Mode: and
- Range:
- IQR:
- Standard deviation: