72  Sampling — Concept, Process and Techniques

72.1 Concept

Sampling = the process of selecting a subset (sample) from a population to make inferences about the population. P.C. Mahalanobis (ISI, 1937) pioneered large-scale sample surveys in India. Jerzy Neyman (1934) formalised modern survey sampling theory.

72.2 Key Terms

TipSampling terminology
  • Population (Universe) — entire group of interest (N).
  • Sample — subset selected (n).
  • Sampling Frame — list from which sample is drawn.
  • Sampling Unit — element to be selected.
  • Parameter — population characteristic (μ, σ).
  • Statistic — sample characteristic (X̄, s).
  • Sampling Error — sample vs population.
  • Non-sampling Error — bias from other sources.
  • Sampling Distribution — distribution of statistic across all possible samples.

72.3 Sampling Process

TipCooper-Schindler sampling design (5 steps)
  1. Define target population.
  2. Identify sampling frame.
  3. Choose sampling technique.
  4. Determine sample size.
  5. Execute sampling and data collection.

72.4 Sampling Techniques

72.4.1 Probability Sampling

TipProbability (random) sampling methods
Method Description
Simple Random Sampling (SRS) Each unit has equal chance
Systematic Sampling Every k-th unit
Stratified Sampling Population divided into strata; sample from each
Cluster Sampling Population divided into clusters; sample whole clusters
Multi-stage Sampling Successive stages (state → district → village → household)
Probability Proportional to Size (PPS) Larger units have higher selection probability

72.4.2 Non-Probability Sampling

TipNon-probability sampling methods
Method Description
Convenience Sampling Accessibility-based
Judgemental / Purposive Researcher’s judgement
Quota Sampling Quotas filled (similar to stratified but non-random)
Snowball Sampling Referrals; rare populations
Self-Selection Sampling Volunteers

72.5 Stratified vs Cluster Sampling

TipStratified vs Cluster
Dimension Stratified Cluster
Subgroups Homogeneous within, heterogeneous between Heterogeneous within, homogeneous between
Sampling From each stratum Sample whole clusters
Goal Reduce variance Reduce cost
Example Sample by income strata Sample by villages

72.6 Sample Size Determination

TipFactors affecting sample size
  • Population size.
  • Confidence level (90 %, 95 %, 99 %).
  • Margin of error / Precision (E).
  • Variability (σ or p×q).
  • Population strata.

72.6.1 Sample Size Formulas

For mean estimation: \[n = \left(\frac{Z \cdot \sigma}{E}\right)^2\]

For proportion estimation: \[n = \frac{Z^2 \cdot p \cdot q}{E^2}\]

Where Z = z-score for confidence level, σ = std dev, p = proportion, q = 1−p, E = margin of error.

Slovin’s Formula: n = N / (1 + N·e²) — quick approximation.

Cochran’s Formula (1977) — most widely used.

72.7 Sampling Errors and Bias

TipSources of error
  • Sampling error — random; reducible by larger n.
  • Non-sampling error:
    • Coverage error — frame issues.
    • Non-response bias.
    • Response bias.
    • Measurement error.
    • Processing error.
  • Selection bias — non-random.

72.8 Confidence Intervals

A 95 % CI means the true population parameter lies in the interval with 95 % probability:

\[\text{CI} = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}\]

Z-values: 90 % = 1.645 · 95 % = 1.96 · 99 % = 2.576.

72.9 Indian National Surveys

TipMajor Indian sample surveys
  • NSS / NSO Surveys — quinquennial.
  • NFHS — National Family Health Survey.
  • PLFS — Periodic Labour Force Survey.
  • CMIE Consumer Pyramids.
  • IRS — Indian Readership Survey.
  • CES — Consumer Expenditure Survey.
  • ASI — Annual Survey of Industries.
  • Election polls — C-Voter, CSDS.

72.11 Practice Questions

Q 01ProbabilityEasy

Which is NOT a probability sampling method?

  • ASRS
  • BStratified
  • CCluster
  • DConvenience
View solution
Correct Option: D
Convenience is non-probability.
Q 02StratifiedMedium

In stratified sampling, strata are:

  • AHeterogeneous within, homogeneous between
  • BHomogeneous within, heterogeneous between
  • CRandomly chosen
  • DGeographic only
View solution
Correct Option: B
Reduce variance by homogeneous strata.
Q 0395 % ZMedium

Z-value for 95 % confidence interval is:

  • A1.645
  • B1.96
  • C2.576
  • D3.00
View solution
Correct Option: B
95 % CI → Z = 1.96.
Q 04ClusterMedium

Sampling all households in 10 randomly chosen villages is:

  • ACluster sampling
  • BStratified
  • CSystematic
  • DQuota
View solution
Correct Option: A
Whole clusters sampled.
Q 05SnowballMedium

Snowball sampling is best for:

  • ALarge populations
  • BHidden / hard-to-reach populations
  • CRandom studies
  • DGovernment surveys
View solution
Correct Option: B
Referral chains.
Q 06Sampling frameEasy

A "sampling frame" is:

  • AList from which sample is drawn
  • BThe questionnaire
  • CSample size formula
  • DBias correction
View solution
Correct Option: A
List/source of units.
Q 07MahalanobisHard

Large-scale survey sampling in India was pioneered by:

  • AP.C. Mahalanobis
  • BC.R. Rao
  • CManmohan Singh
  • DV.K.R.V. Rao
View solution
Correct Option: A
P.C. Mahalanobis, ISI (1937).
Q 08SystematicMedium

Selecting every 10th customer is:

  • ASRS
  • BSystematic
  • CStratified
  • DConvenience
View solution
Correct Option: B
Every k-th unit.
Q 09SlovinHard

Slovin's formula is:

  • An = N / (1 + Ne²)
  • Bn = Z²σ²/E²
  • Cn = pq/E
  • Dn = N²·E
View solution
Correct Option: A
Quick approximation.
Q 10PPSHard

PPS sampling stands for:

  • AProbability Proportional to Size
  • BPre-Probability Selection
  • CPilot-Proportional Survey
  • DPopulation-Proportional Sampling
View solution
Correct Option: A
Larger units have higher selection probability.
Q 11QuotaMedium

Quota sampling differs from stratified in that it is:

  • ANon-random
  • BRandom
  • CHierarchical
  • DAlways larger
View solution
Correct Option: A
Quota is non-probability; stratified is random.
Q 12NFHSMedium

NFHS in India stands for:

  • ANational Family Health Survey
  • BNational Food and Health Survey
  • CNational Financial House Survey
  • DNational Fishery Health Survey
View solution
Correct Option: A
Conducted by IIPS Mumbai.
Q 13NeymanHard

Modern survey-sampling theory (1934) was formalised by:

  • AJerzy Neyman
  • BFisher
  • CCochran
  • DPearson
View solution
Correct Option: A
Jerzy Neyman (1934).
Q 14Population vs sampleEasy

A measure of population is called:

  • AStatistic
  • BParameter
  • CVariable
  • DEstimate
View solution
Correct Option: B
Population → Parameter; Sample → Statistic.
Q 15MatchHard

Match:

(i) SRS (a) Non-random
(ii) Stratified (b) Whole groups
(iii) Cluster (c) Equal chance
(iv) Convenience (d) Strata-based
  • A(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
  • D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)
View solution
Correct Option: A
SRS — Equal chance; Stratified — Strata; Cluster — Whole groups; Convenience — Non-random.

72.11.1 Advanced Format Questions

AR 1Assertion-ReasonHard

A: Probability sampling permits statistical inference.
R: Each unit has a known, non-zero probability of being selected.

  • ABoth true; R explains A
  • BBoth true; R does not explain A
  • CA true, R false
  • DA false, R true
View solution
Correct Option: A
S 1Statement-basedMedium

Probability techniques: (i) SRS. (ii) Systematic. (iii) Stratified. (iv) Cluster.

  • AAll four
  • B(i) and (ii) only
  • C(iii) and (iv) only
  • D(i), (ii), (iii) only
View solution
Correct Option: A
N 1NumericalMedium

For p = 0.5, E = 5%, 95% CI, sample size n =

  • A≈ 385
  • B≈ 100
  • C≈ 1,000
  • D≈ 50
View solution
Correct Option: A
n = Z²pq/E² = (1.96)²(0.5)(0.5)/(0.05)² ≈ 384.16.
N 2NumericalHard

Population N = 1,000; e = 5%. Slovin's formula n =

  • A≈ 286
  • B≈ 50
  • C≈ 1,000
  • D≈ 100
View solution
Correct Option: A
n = 1000/(1+1000×0.0025) = 1000/3.5 ≈ 285.7.

72.12 Quick Recall

ImportantQuick recall
  • Sampling — subset of population. Neyman (1934) · Mahalanobis Indian surveys.
  • Terms: Population N · Sample n · Frame · Unit · Parameter (μ) · Statistic (X̄) · Sampling error · Sampling distribution.
  • Process (5 steps): define population → frame → technique → size → execute.
  • Probability: SRS · Systematic · Stratified · Cluster · Multi-stage · PPS.
  • Non-probability: Convenience · Judgemental · Quota · Snowball · Self-selection.
  • Stratified (homogeneous within strata) vs Cluster (heterogeneous within clusters).
  • Sample size: n = (Zσ/E)² for mean; n = Z²pq/E² for proportion; Slovin n = N/(1+Ne²); Cochran (1977).
  • CI: 95 % → Z = 1.96; 99 % → 2.576.
  • Errors: Sampling (random) vs Non-sampling (coverage, non-response, response, measurement, processing).
  • India: NSS/NSO · NFHS · PLFS · ASI · IRS · Census · CMIE · CSDS · C-Voter.
  • Modern: big-data sampling · RDS · Bayesian · sequential · online panels · synthetic data · differential privacy.