72 Sampling — Concept, Process and Techniques
72.1 Concept
Sampling = the process of selecting a subset (sample) from a population to make inferences about the population. P.C. Mahalanobis (ISI, 1937) pioneered large-scale sample surveys in India. Jerzy Neyman (1934) formalised modern survey sampling theory.
72.2 Key Terms
- Population (Universe) — entire group of interest (N).
- Sample — subset selected (n).
- Sampling Frame — list from which sample is drawn.
- Sampling Unit — element to be selected.
- Parameter — population characteristic (μ, σ).
- Statistic — sample characteristic (X̄, s).
- Sampling Error — sample vs population.
- Non-sampling Error — bias from other sources.
- Sampling Distribution — distribution of statistic across all possible samples.
72.3 Sampling Process
- Define target population.
- Identify sampling frame.
- Choose sampling technique.
- Determine sample size.
- Execute sampling and data collection.
72.4 Sampling Techniques
72.4.1 Probability Sampling
| Method | Description |
|---|---|
| Simple Random Sampling (SRS) | Each unit has equal chance |
| Systematic Sampling | Every k-th unit |
| Stratified Sampling | Population divided into strata; sample from each |
| Cluster Sampling | Population divided into clusters; sample whole clusters |
| Multi-stage Sampling | Successive stages (state → district → village → household) |
| Probability Proportional to Size (PPS) | Larger units have higher selection probability |
72.4.2 Non-Probability Sampling
| Method | Description |
|---|---|
| Convenience Sampling | Accessibility-based |
| Judgemental / Purposive | Researcher’s judgement |
| Quota Sampling | Quotas filled (similar to stratified but non-random) |
| Snowball Sampling | Referrals; rare populations |
| Self-Selection Sampling | Volunteers |
72.5 Stratified vs Cluster Sampling
| Dimension | Stratified | Cluster |
|---|---|---|
| Subgroups | Homogeneous within, heterogeneous between | Heterogeneous within, homogeneous between |
| Sampling | From each stratum | Sample whole clusters |
| Goal | Reduce variance | Reduce cost |
| Example | Sample by income strata | Sample by villages |
72.6 Sample Size Determination
- Population size.
- Confidence level (90 %, 95 %, 99 %).
- Margin of error / Precision (E).
- Variability (σ or p×q).
- Population strata.
72.6.1 Sample Size Formulas
For mean estimation: \[n = \left(\frac{Z \cdot \sigma}{E}\right)^2\]
For proportion estimation: \[n = \frac{Z^2 \cdot p \cdot q}{E^2}\]
Where Z = z-score for confidence level, σ = std dev, p = proportion, q = 1−p, E = margin of error.
Slovin’s Formula: n = N / (1 + N·e²) — quick approximation.
Cochran’s Formula (1977) — most widely used.
72.7 Sampling Errors and Bias
- Sampling error — random; reducible by larger n.
-
Non-sampling error:
- Coverage error — frame issues.
- Non-response bias.
- Response bias.
- Measurement error.
- Processing error.
- Selection bias — non-random.
72.8 Confidence Intervals
A 95 % CI means the true population parameter lies in the interval with 95 % probability:
\[\text{CI} = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}\]
Z-values: 90 % = 1.645 · 95 % = 1.96 · 99 % = 2.576.
72.9 Indian National Surveys
- NSS / NSO Surveys — quinquennial.
- NFHS — National Family Health Survey.
- PLFS — Periodic Labour Force Survey.
- CMIE Consumer Pyramids.
- IRS — Indian Readership Survey.
- CES — Consumer Expenditure Survey.
- ASI — Annual Survey of Industries.
- Election polls — C-Voter, CSDS.
72.10 Modern Trends
- Big-data sampling — non-probability digital footprints.
- Adaptive sampling.
- Respondent-driven sampling (RDS) for hidden populations.
- Bayesian sampling.
- Sequential sampling.
- Online panel-based sampling.
- Synthetic samples / AI-generated.
- Differential privacy in sampling.
72.11 Practice Questions
Which is NOT a probability sampling method?
View solution
In stratified sampling, strata are:
View solution
Z-value for 95 % confidence interval is:
View solution
Sampling all households in 10 randomly chosen villages is:
View solution
Snowball sampling is best for:
View solution
A "sampling frame" is:
View solution
Large-scale survey sampling in India was pioneered by:
View solution
Selecting every 10th customer is:
View solution
Slovin's formula is:
View solution
PPS sampling stands for:
View solution
Quota sampling differs from stratified in that it is:
View solution
NFHS in India stands for:
View solution
Modern survey-sampling theory (1934) was formalised by:
View solution
A measure of population is called:
View solution
Match:
| (i) | SRS | (a) | Non-random |
| (ii) | Stratified | (b) | Whole groups |
| (iii) | Cluster | (c) | Equal chance |
| (iv) | Convenience | (d) | Strata-based |
View solution
72.11.1 Advanced Format Questions
A: Probability sampling permits statistical inference.
R: Each unit has a known, non-zero probability of being selected.
View solution
Probability techniques: (i) SRS. (ii) Systematic. (iii) Stratified. (iv) Cluster.
View solution
For p = 0.5, E = 5%, 95% CI, sample size n =
View solution
Population N = 1,000; e = 5%. Slovin's formula n =
View solution
72.12 Quick Recall
- Sampling — subset of population. Neyman (1934) · Mahalanobis Indian surveys.
- Terms: Population N · Sample n · Frame · Unit · Parameter (μ) · Statistic (X̄) · Sampling error · Sampling distribution.
- Process (5 steps): define population → frame → technique → size → execute.
- Probability: SRS · Systematic · Stratified · Cluster · Multi-stage · PPS.
- Non-probability: Convenience · Judgemental · Quota · Snowball · Self-selection.
- Stratified (homogeneous within strata) vs Cluster (heterogeneous within clusters).
- Sample size: n = (Zσ/E)² for mean; n = Z²pq/E² for proportion; Slovin n = N/(1+Ne²); Cochran (1977).
- CI: 95 % → Z = 1.96; 99 % → 2.576.
- Errors: Sampling (random) vs Non-sampling (coverage, non-response, response, measurement, processing).
- India: NSS/NSO · NFHS · PLFS · ASI · IRS · Census · CMIE · CSDS · C-Voter.
- Modern: big-data sampling · RDS · Bayesian · sequential · online panels · synthetic data · differential privacy.