Achievable logoAchievable logo
USMLE/1
Sign in
Sign up
Purchase
Textbook
Support
How it works
Resources
Exam catalog
Mountain with a flag at the peak
Textbook
Introduction
1. Anatomy
2. Microbiology
3. Physiology
4. Pathology
5. Pharmacology
6. Immunology
7. Biochemistry
8. Cell and molecular biology
9. Biostatistics and epidemiology
9.1 Measure of disease frequency
9.2 Measures of health status
9.3 Reportable diseases
9.4 Variables and distributions
9.5 Standard deviation and confidence intervals
9.6 Measures of association
9.7 Types of study design
9.8 Bias
9.9 Hypothesis testing
9.10 Sensitivity, specificity and predictive values
9.11 Phases of drug approval
9.12 Doctor patient relationships, ethics and decision-making capacity
9.13 Additional information
10. Genetics
11. Behavioral science
Wrapping up
Achievable logoAchievable logo
9.5 Standard deviation and confidence intervals
Achievable USMLE/1
9. Biostatistics and epidemiology

Standard deviation and confidence intervals

4 min read
Font
Discuss
Share
Feedback

What is spread?

Spread (also called variation or dispersion) describes how data values are distributed around a central value.

Measures of spread include the following:

  1. Range: The range of a dataset is the difference between its largest (maximum) value and its smallest (minimum) value.
  2. Quartiles: Data is divided into four equal parts, called quartiles. Each quartile contains 25% of the data. The cut-off for the second quartile is the 50th percentile, which is the median.
  3. Interquartile range: The interquartile range (IQR) describes the central portion of the distribution, from the 25th percentile to the 75th percentile (i.e., it includes the second and third quartiles).
  4. Standard deviation (SD or sigma): Standard deviation measures how spread out a dataset is relative to its mean. A lower SD means less variability; a higher SD means more variability. SD is most useful when the data is normally (symmetrically) distributed.

Steps to calculate the SD:

Step 1. Calculate the arithmetic mean.

Step 2. Subtract the mean from each observation.

Step 3. Square each difference.

Step 4. Sum the squared differences.

Step 5. Divide the sum of the squared differences by n − 1.

Step 6. Take the square root of the value obtained in Step 5. The result is the standard deviation.

Bell curve and distributions
Bell curve and distributions

Bell-shaped curve with the standard deviations equally distributed on the x-axis. 99.7% of the data falls between the minus 3 and plus 3 standard deviation. 95.5% of the data falls between the minus 2 and plus 2 standard deviation. 68.3% of the data falls between the minus 1 and plus 1 standard deviations.

Areas included in normal distribution

  • ±1 SD includes 68.3%
  • ±1.96 SD includes 95.0%
  • ±2 SD includes 95.5%
  • ±3 SD includes 99.7%

Standard error of mean (SEM): The SEM measures how far the sample mean is likely to be from the true population mean. The SEM is always smaller than the SD.

SEM = Standard deviation/ square root of n, where “n” is the number of observations.

SEM is used to calculate confidence intervals around the arithmetic mean.

Confidence intervals or confidence limits (CI): A confidence interval is a range of values that is likely to contain a population parameter (such as the mean). It is expressed as a percentage (e.g., 95%, 99%).

  • The percentage reflects how confident you are that the interval contains the true population mean.
  • For example, a 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population.
  • As sample size increases, the interval becomes narrower and precision increases.
  • A narrow confidence interval indicates high precision, whereas a wide confidence interval indicates low precision.

CI is given by the formula,

CI = Mean + or - Z x SEM

A CI is reported as a range with a lower and upper limit. For a 95% CI, Z = 1.96. For a 99% CI, Z = 2.58.

Problem 1: Find the 95% confidence interval for a mean total cholesterol level of 206, standard error of the mean of 3.

Using the formula above, upper limit = 206 + 1.96 X 3 = 211.88

Lower limit = 206 - 1.96 X 3 = 200.12

CI is 200.12 to 211.88.

In other words, the best estimate of the true population mean total cholesterol from the given data is 206, but the true mean could reasonably lie anywhere between 200.12 and 211.88.

If the CI of two groups does not overlap, then it means that a statistically significant difference exists. If the CI of two groups overlap, then it means that no significant difference exists.

Measures of Spread

  • Describe data distribution around a central value
  • Key measures: range, quartiles, interquartile range (IQR), standard deviation (SD)

Range

  • Difference between maximum and minimum values

Quartiles & Interquartile Range (IQR)

  • Quartiles: divide data into four equal parts (each 25%)
  • IQR: range from 25th to 75th percentile (Q1 to Q3)

Standard Deviation (SD)

  • Measures variability around the mean
  • Lower SD = less spread; higher SD = more spread
  • Calculation steps:
    • Find mean
    • Subtract mean from each value, square differences
    • Sum squared differences, divide by n-1
    • Take square root

Normal Distribution & SD

  • Bell curve: symmetric distribution
  • ±1 SD: 68.3% of data
  • ±1.96 SD: 95.0% of data
  • ±2 SD: 95.5% of data
  • ±3 SD: 99.7% of data

Standard Error of Mean (SEM)

  • Estimates how far sample mean is from population mean
  • SEM = SD / √n
  • SEM < SD; used for confidence intervals

Confidence Intervals (CI)

  • Range likely to contain population parameter (e.g., mean)
  • Expressed as percentage (e.g., 95%, 99%)
  • Formula: CI = Mean ± Z × SEM
    • Z = 1.96 for 95% CI; Z = 2.58 for 99% CI
  • Larger sample size → narrower CI (higher precision)
  • Non-overlapping CIs between groups = statistically significant difference

Sign up for free to take 2 quiz questions on this topic

All rights reserved ©2016 - 2026 Achievable, Inc.

Standard deviation and confidence intervals

What is spread?

Spread (also called variation or dispersion) describes how data values are distributed around a central value.

Measures of spread include the following:

  1. Range: The range of a dataset is the difference between its largest (maximum) value and its smallest (minimum) value.
  2. Quartiles: Data is divided into four equal parts, called quartiles. Each quartile contains 25% of the data. The cut-off for the second quartile is the 50th percentile, which is the median.
  3. Interquartile range: The interquartile range (IQR) describes the central portion of the distribution, from the 25th percentile to the 75th percentile (i.e., it includes the second and third quartiles).
  4. Standard deviation (SD or sigma): Standard deviation measures how spread out a dataset is relative to its mean. A lower SD means less variability; a higher SD means more variability. SD is most useful when the data is normally (symmetrically) distributed.

Steps to calculate the SD:

Step 1. Calculate the arithmetic mean.

Step 2. Subtract the mean from each observation.

Step 3. Square each difference.

Step 4. Sum the squared differences.

Step 5. Divide the sum of the squared differences by n − 1.

Step 6. Take the square root of the value obtained in Step 5. The result is the standard deviation.

Bell-shaped curve with the standard deviations equally distributed on the x-axis. 99.7% of the data falls between the minus 3 and plus 3 standard deviation. 95.5% of the data falls between the minus 2 and plus 2 standard deviation. 68.3% of the data falls between the minus 1 and plus 1 standard deviations.

Areas included in normal distribution

  • ±1 SD includes 68.3%
  • ±1.96 SD includes 95.0%
  • ±2 SD includes 95.5%
  • ±3 SD includes 99.7%

Standard error of mean (SEM): The SEM measures how far the sample mean is likely to be from the true population mean. The SEM is always smaller than the SD.

SEM = Standard deviation/ square root of n, where “n” is the number of observations.

SEM is used to calculate confidence intervals around the arithmetic mean.

Confidence intervals or confidence limits (CI): A confidence interval is a range of values that is likely to contain a population parameter (such as the mean). It is expressed as a percentage (e.g., 95%, 99%).

  • The percentage reflects how confident you are that the interval contains the true population mean.
  • For example, a 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population.
  • As sample size increases, the interval becomes narrower and precision increases.
  • A narrow confidence interval indicates high precision, whereas a wide confidence interval indicates low precision.

CI is given by the formula,

CI = Mean + or - Z x SEM

A CI is reported as a range with a lower and upper limit. For a 95% CI, Z = 1.96. For a 99% CI, Z = 2.58.

Problem 1: Find the 95% confidence interval for a mean total cholesterol level of 206, standard error of the mean of 3.

Using the formula above, upper limit = 206 + 1.96 X 3 = 211.88

Lower limit = 206 - 1.96 X 3 = 200.12

CI is 200.12 to 211.88.

In other words, the best estimate of the true population mean total cholesterol from the given data is 206, but the true mean could reasonably lie anywhere between 200.12 and 211.88.

If the CI of two groups does not overlap, then it means that a statistically significant difference exists. If the CI of two groups overlap, then it means that no significant difference exists.

Key points

Measures of Spread

  • Describe data distribution around a central value
  • Key measures: range, quartiles, interquartile range (IQR), standard deviation (SD)

Range

  • Difference between maximum and minimum values

Quartiles & Interquartile Range (IQR)

  • Quartiles: divide data into four equal parts (each 25%)
  • IQR: range from 25th to 75th percentile (Q1 to Q3)

Standard Deviation (SD)

  • Measures variability around the mean
  • Lower SD = less spread; higher SD = more spread
  • Calculation steps:
    • Find mean
    • Subtract mean from each value, square differences
    • Sum squared differences, divide by n-1
    • Take square root

Normal Distribution & SD

  • Bell curve: symmetric distribution
  • ±1 SD: 68.3% of data
  • ±1.96 SD: 95.0% of data
  • ±2 SD: 95.5% of data
  • ±3 SD: 99.7% of data

Standard Error of Mean (SEM)

  • Estimates how far sample mean is from population mean
  • SEM = SD / √n
  • SEM < SD; used for confidence intervals

Confidence Intervals (CI)

  • Range likely to contain population parameter (e.g., mean)
  • Expressed as percentage (e.g., 95%, 99%)
  • Formula: CI = Mean ± Z × SEM
    • Z = 1.96 for 95% CI; Z = 2.58 for 99% CI
  • Larger sample size → narrower CI (higher precision)
  • Non-overlapping CIs between groups = statistically significant difference