73 Hypothesis Testing
73.1 Concept
Hypothesis Testing = a statistical procedure to decide whether sample evidence supports a particular claim about a population. The modern framework was developed by Jerzy Neyman and Egon Pearson (1928, 1933) building on R.A. Fisher (1925) and Karl Pearson (1900).
73.2 Key Concepts
- Null Hypothesis (H₀) — no difference / no effect; the status quo.
- Alternative Hypothesis (H₁ or Hₐ) — researcher’s claim.
- One-tailed test — directional.
- Two-tailed test — non-directional.
- Test statistic — Z, t, χ², F.
- Significance level (α) — probability of Type I error (typically 0.05 or 0.01).
- p-value — probability of observing data as extreme as ours, given H₀ is true.
- Power (1 − β) — probability of correctly rejecting H₀.
- Critical region / Rejection region.
- Degrees of freedom (df).
73.3 Type I and Type II Errors
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I error (α) — false positive | Correct (Power) |
| Fail to Reject H₀ | Correct (1 − α) | Type II error (β) — false negative |
73.4 Steps in Hypothesis Testing
- State H₀ and H₁.
- Choose significance level α (0.05, 0.01).
- Select appropriate test statistic (Z, t, χ², F).
- Compute test statistic from sample data.
- Determine p-value or critical value.
- Decision — reject or fail to reject H₀.
- Interpret in business / managerial context.
73.5 Parametric Tests
Assume specific distribution (usually normal) and meet certain conditions.
| Test | Use case |
|---|---|
| Z-test | Large samples (n ≥ 30) or known σ |
| One-sample t-test | Single mean, σ unknown, small n |
| Independent samples t-test | Two means, independent samples |
| Paired t-test | Two means, dependent samples (before-after) |
| F-test | Equality of variances |
| ANOVA | 3+ group means; R.A. Fisher |
| One-way ANOVA | Single factor |
| Two-way ANOVA | Two factors with interaction |
| MANOVA | Multiple dependent variables |
| Pearson Correlation Test | |
| Linear Regression Test |
73.6 Non-Parametric Tests
No distributional assumptions; for ordinal / nominal data.
| Test | Use case | Parametric equivalent |
|---|---|---|
| Chi-square (χ²) Goodness of Fit | Observed vs expected | |
| Chi-square Test of Independence | Two categorical variables | |
| Mann-Whitney U | Two independent samples | t-test |
| Wilcoxon Signed-Rank | Paired samples | Paired t-test |
| Kruskal-Wallis H | 3+ independent groups | One-way ANOVA |
| Friedman Test | 3+ related groups | Repeated ANOVA |
| Spearman’s Rank Correlation | Ordinal | Pearson |
| Kolmogorov-Smirnov (K-S) | Goodness of fit, normality | |
| Sign Test | ||
| Runs Test | Randomness |
73.7 ANOVA — Analysis of Variance
ANOVA — R.A. Fisher (1925) — tests equality of three or more means by partitioning total variance.
\[F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}}\]
- One-way ANOVA — single factor.
- Two-way ANOVA — two factors + interaction.
- N-way ANOVA.
- Repeated-measures ANOVA.
- ANCOVA — Analysis of Covariance.
- MANOVA — multiple DVs.
- Latin Square Design.
- Randomised Block Design.
- Factorial designs.
73.8 Choice of Test
- Sample size — large (Z) vs small (t).
- Variance known? — Z (known) vs t (unknown).
- Number of groups — 1 (one-sample), 2 (two-sample), 3+ (ANOVA).
- Dependent vs independent samples.
- Distributional assumptions — parametric vs non-parametric.
- Measurement scale — interval/ratio (parametric); ordinal/nominal (non-parametric).
73.9 Effect Size
- Cohen’s d — for mean differences (Small 0.2, Medium 0.5, Large 0.8).
- η² (eta squared) — ANOVA.
- r — correlation.
- Odds Ratio · Hazard Ratio — for categorical/survival.
73.10 p-value Controversy and Modern Critique
- p < 0.05 is arbitrary (Fisher).
- American Statistical Association (2016 statement) cautioning against over-reliance.
- Calls for confidence intervals, effect sizes, Bayesian alternatives.
- Replication crisis in social sciences.
- Pre-registration of hypotheses becoming standard.
73.11 Bayesian Hypothesis Testing
- Updates prior beliefs with evidence to form posterior.
- Bayes Factor — strength of evidence for H₁ vs H₀.
- Avoids many p-value pitfalls.
- Computational: MCMC, Stan, JAGS.
73.12 Modern Trends
- Bayesian methods.
- Multiple-testing correction — Bonferroni, FDR (Benjamini-Hochberg).
- A/B testing at internet scale.
- Causal inference — RCT, IV, regression discontinuity.
- Pre-registration.
- Open data and reproducibility.
- Effect-size focus.
- Bootstrapping (Efron 1979).
- Permutation tests.
- Machine learning-driven testing.
73.13 Practice Questions
The Null Hypothesis (H₀) typically asserts:
View solution
A Type I error is:
View solution
A p-value less than 0.05 indicates:
View solution
ANOVA was developed by:
View solution
The non-parametric equivalent of an independent t-test is:
View solution
Kruskal-Wallis is the non-parametric equivalent of:
View solution
Cohen's d = 0.5 indicates effect of:
View solution
Two-tailed test is appropriate when:
View solution
Chi-square Test of Independence tests:
View solution
Power of a test is:
View solution
Before-after design uses:
View solution
F-test compares:
View solution
Modern hypothesis-testing framework (1928-1933) is by:
View solution
Bonferroni correction addresses:
View solution
Bootstrap technique was introduced in 1979 by:
View solution
73.13.1 Advanced Format Questions
A: One-tailed test rejects H₀ in one direction only.
R: The critical region is split equally in two tails.
View solution
Hypothesis testing steps: (i) State H₀/H₁. (ii) Choose α. (iii) Compute statistic. (iv) Decide.
View solution
Sample n = 100; x̄ = 52; μ = 50; σ = 10. Z-statistic:
View solution
At α = 0.05 two-tailed, critical Z value:
View solution
73.14 Quick Recall
- Hypothesis testing — Neyman-Pearson (1928, 1933) building on Fisher (1925).
- H₀ vs H₁; One-tailed vs Two-tailed.
- α (Type I) vs β (Type II); Power = 1 − β.
- 6 steps: H₀/H₁ → α → test stat → compute → p-value → decide.
- Parametric: Z · t (one-sample, independent, paired) · F · ANOVA (Fisher 1925) · ANCOVA · MANOVA · Pearson.
- Non-parametric: χ² · Mann-Whitney U · Wilcoxon Signed-Rank · Kruskal-Wallis H · Friedman · Spearman · K-S · Sign · Runs.
- Choice: based on sample size, σ, # groups, samples dependent?, scale.
- Effect size: Cohen’s d (0.2 / 0.5 / 0.8) · η² · r · OR.
- p-value critique — ASA 2016; replication crisis.
- Bayesian: prior + likelihood = posterior; Bayes Factor.
- Modern: Bayesian · Bonferroni · FDR (Benjamini-Hochberg) · A/B testing · causal inference · pre-registration · effect-size focus · Bootstrap (Efron 1979) · permutation · ML-driven.