Achievable logoAchievable logo
USMLE/1
Sign in
Sign up
Purchase
Textbook
Support
How it works
Resources
Exam catalog
Mountain with a flag at the peak
Textbook
Introduction
1. Anatomy
2. Microbiology
3. Physiology
4. Pathology
5. Pharmacology
6. Immunology
7. Biochemistry
8. Cell and molecular biology
9. Biostatistics and epidemiology
9.1 Measure of disease frequency
9.2 Measures of health status
9.3 Reportable diseases
9.4 Variables and distributions
9.5 Standard deviation and confidence intervals
9.6 Measures of association
9.7 Types of study design
9.8 Bias
9.9 Hypothesis testing
9.10 Sensitivity, specificity and predictive values
9.11 Phases of drug approval
9.12 Doctor patient relationships, ethics and decision-making capacity
9.13 Additional information
10. Genetics
11. Behavioral science
Wrapping up
Achievable logoAchievable logo
9.9 Hypothesis testing
Achievable USMLE/1
9. Biostatistics and epidemiology

Hypothesis testing

4 min read
Font
Discuss
Share
Feedback

The null hypothesis (H0) states that there is no difference between two groups. In other words, a population parameter (such as the mean, the standard deviation, and so on) is equal to a hypothesized or previously observed value.

The alternative hypothesis (H1) states that there is a difference. This means a population parameter is smaller than, greater than, or different from the hypothesized or previously observed value stated in the null hypothesis.

Type I error means incorrectly rejecting the null hypothesis. In other words, you conclude that there is a difference when there is actually no difference. The probability of making a Type I error is denoted by alpha (α), also called the level of significance.

  • If α is set to a lower value, the chance of a Type I error decreases.
  • If α is set to a higher value, the chance of a Type I error increases.
  • A lower α also means the test is conducted under more rigorous standards.

Type II error means incorrectly accepting the null hypothesis. In other words, you conclude that there is no difference when there is actually a difference. The probability of making a Type II error is denoted by beta (β).

Power of a study is (1 − β). Power is the probability of correctly rejecting the null hypothesis when it is false, meaning it detects a real difference when one truly exists.

There is a trade-off between α and β:

  • If you set α to a lower value, you reduce the chance of a Type I error, but you increase the chance of a Type II error.
  • Increasing β lowers power, because power = 1 − β.

So, when α decreases, power also tends to decrease (and vice versa).

Traditionally:

  • α = 0.05 (5%)
  • β = 0.2 (20%)
  • Minimum power = 0.8 (80%)

If the p value obtained from the study is = or < 0.05, then the null hypothesis is rejected and the alternative hypothesis is accepted. The result is said to be statistically significant.

If the 95% confidence interval includes 0, then there is no significance and the null hypothesis is not rejected. If the 95% confidence interval for RR or odds ratio includes 1, then the null hypothesis is also not rejected.

Correlation and regression: Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables.

  • Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient.
  • In simple linear regression, a single independent variable is used to predict the value of a dependent variable.

If both variables are normally distributed, then Pearson’s correlation coefficient ® is calculated. If one or both variables are not normally distributed, then a rank correlation coefficient such as Spearman’s rho (ρ) may be calculated.

The coefficient of determination (r square) denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x.

Using linear regression, if the value of either variable is known, the value of the other variable can be calculated.

Correlation analysis quantifies the strength of the association between two numerical variables. A scatter plot (or diagram) is used to depict the relationship between two variables:

  • The independent variable is plotted on the X-axis.
  • The dependent variable is plotted on the Y-axis.

The closer the points (dots) on the scatter plot are to each other, the stronger the association between the two variables.

The correlation coefficient quantifies the strength of the relationship between two variables and ranges between −1 and +1:

  • A plus (+) sign denotes a direct relationship.
  • A minus (−) sign denotes an inverse relationship.
  • A value of r close to +1 indicates a strong direct linear relationship (one variable increases as the other increases).
  • A value of r close to −1 indicates a strong inverse linear relationship (one variable decreases as the other increases).
  • A value of r very close to 0 does not necessarily mean there is no association; it means there is no linear association.

Correlation shows association; it does not prove causation.

In multiple regression, two or more independent variables are used to predict the value of a dependent variable.

Sign up for free to take 3 quiz questions on this topic

All rights reserved ©2016 - 2026 Achievable, Inc.