flowchart LR
H[State H₀, H₁] --> A[Choose α]
A --> T[Choose test statistic]
T --> C[Compute statistic /<br/>p-value]
C --> D{Reject H₀?}
D -- Yes --> R[Reject H₀]
D -- No --> F[Fail to reject H₀]
style H fill:#E3F2FD,stroke:#1565C0
style R fill:#FFEBEE,stroke:#C62828
72 Hypothesis Testing
72.1 What is Hypothesis Testing?
A hypothesis is a statement about a population parameter that is to be verified or rejected on the basis of sample evidence. Hypothesis testing is the formal statistical procedure for deciding whether to reject or fail to reject a hypothesis based on a sample.
Two competing hypotheses are stated:
| Hypothesis | Statement | Default position |
|---|---|---|
| Null hypothesis (H₀) | A statement of “no effect”, “no difference”, “status quo” | Assumed true until evidence rejects it |
| Alternative hypothesis (H₁ or Ha) | The contradiction of H₀; what we are trying to support | What we accept if H₀ is rejected |
The null hypothesis is only rejected — never proven. We say “reject” or “fail to reject” — never “accept” H₀.
72.2 Type I and Type II Errors
Because hypothesis testing rests on probability, two errors are possible:
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I error (α) — false positive | Correct (Power = 1 − β) |
| Fail to reject H₀ | Correct | Type II error (β) — false negative |
| Error | Symbol | Consequence |
|---|---|---|
| Type I (False positive) | α | Reject true H₀ — convict the innocent |
| Type II (False negative) | β | Fail to reject false H₀ — let the guilty go free |
| Power | 1 − β | Probability of correctly rejecting false H₀ |
The textbook trade-off: reducing α tends to raise β. A larger sample reduces both.
72.3 The Five-Step Process
| # | Step |
|---|---|
| 1 | State the null and alternative hypotheses |
| 2 | Choose the level of significance (α — typically 0.05 or 0.01) |
| 3 | Choose the appropriate test statistic and identify its sampling distribution |
| 4 | Compute the test statistic from sample data; find the critical value or p-value |
| 5 | Decision — Reject H₀ if test statistic > critical value (or p-value < α) |
72.4 One-tailed vs Two-tailed Tests
| Type | When to use | Critical region |
|---|---|---|
| Two-tailed | Testing for any difference (H₁: μ ≠ μ₀) | Both tails of the distribution |
| One-tailed (right) | Testing if greater (H₁: μ > μ₀) | Right tail only |
| One-tailed (left) | Testing if less (H₁: μ < μ₀) | Left tail only |
72.5 Test Statistics — When to Use Which
| Test | Used for | Distribution |
|---|---|---|
| z-test | Mean (large sample, known σ) or proportion | Standard normal |
| t-test | Mean (small sample, unknown σ) | Student’s t |
| Paired t-test | Same units measured twice | Student’s t |
| Independent samples t-test | Compare means of two groups | Student’s t |
| F-test | Compare variances; ANOVA | F |
| Chi-square goodness-of-fit | Observed vs expected categorical frequencies | Chi-square |
| Chi-square independence | Two categorical variables | Chi-square |
| ANOVA | Compare means across 3+ groups | F |
72.5.1 When to use t vs z
| Use z when | Use t when |
|---|---|
| Population σ is known, OR | Population σ is unknown |
| Large sample (n > 30, even with σ unknown) | Small sample with unknown σ |
The t-distribution converges to z as the sample size grows.
72.6 p-Value Approach
The p-value is the probability of observing a sample statistic as extreme as the one observed if the null hypothesis is true.
| Comparison | Decision |
|---|---|
| p-value ≤ α | Reject H₀ |
| p-value > α | Fail to reject H₀ |
A common misinterpretation: p-value is NOT the probability that H₀ is true. It is the probability of observed data assuming H₀.
72.7 ANOVA — Analysis of Variance
When comparing means of three or more groups, multiple t-tests inflate Type I error. ANOVA (R.A. Fisher) handles this with a single F-test (Ronald A. Fisher, 1925):
| Source | What it captures |
|---|---|
| Between groups (treatment) | Variation due to group differences |
| Within groups (error) | Variation due to chance within groups |
| Total | Sum of the two |
The F-statistic:
\[F = \frac{\text{Mean Square Between}}{\text{Mean Square Within}}\]
If F > critical value (or p < α), reject the null that all group means are equal.
| Design | What it captures |
|---|---|
| One-way ANOVA | One independent variable, three or more groups |
| Two-way ANOVA | Two independent variables; tests main and interaction effects |
| Repeated-measures ANOVA | Same units measured multiple times |
72.8 Chi-Square Tests
| Test | What it does | Formula |
|---|---|---|
| Goodness-of-fit | Compare observed frequencies with expected | χ² = Σ (O − E)² ÷ E |
| Test of independence | Test whether two categorical variables are independent | χ² = Σ (O − E)² ÷ E (with contingency table) |
72.9 Practice Questions
The null hypothesis (H₀) is best described as:
View solution
A Type I error occurs when:
View solution
The "power" of a statistical test is:
View solution
A t-test is preferred over a z-test when:
View solution
If the p-value of a test is 0.02 and α = 0.05, the appropriate decision is:
View solution
When comparing means across three or more groups, the appropriate test is:
View solution
A test of whether two categorical variables (gender × purchase) are independent uses:
View solution
A two-tailed test is appropriate when the alternative hypothesis is:
View solution
- Hypothesis test = sample-based decision about a population parameter. H₀ vs H₁; we reject or fail to reject H₀, never “accept”.
- Type I (α) = false positive; Type II (β) = false negative; Power = 1 − β.
- Five-step process: hypotheses → α → test statistic → compute → decide.
- One-tailed (directional) vs two-tailed.
- Tests: z (large/known σ), t (small/unknown σ), F (variance/ANOVA), chi-square (categorical), ANOVA (3+ group means).
- p-value ≤ α → reject H₀.
- ANOVA decomposes variance into between-group + within-group; F = MSbetween/MSwithin.
- Chi-square tests: goodness-of-fit and independence.