72 Sampling — Concept, Process and Techniques

72.1 Concept

Sampling = the process of selecting a subset (sample) from a population to make inferences about the population. P.C. Mahalanobis (ISI, 1937) pioneered large-scale sample surveys in India. Jerzy Neyman (1934) formalised modern survey sampling theory.

72.2 Key Terms

Sampling terminology

Population (Universe) — entire group of interest (N).
Sample — subset selected (n).
Sampling Frame — list from which sample is drawn.
Sampling Unit — element to be selected.
Parameter — population characteristic (μ, σ).
Statistic — sample characteristic (X̄, s).
Sampling Error — sample vs population.
Non-sampling Error — bias from other sources.
Sampling Distribution — distribution of statistic across all possible samples.

72.3 Sampling Process

Cooper-Schindler sampling design (5 steps)

Define target population.
Identify sampling frame.
Choose sampling technique.
Determine sample size.
Execute sampling and data collection.

72.4 Sampling Techniques

72.4.1 Probability Sampling

Probability (random) sampling methods

Method	Description
Simple Random Sampling (SRS)	Each unit has equal chance
Systematic Sampling	Every k-th unit
Stratified Sampling	Population divided into strata; sample from each
Cluster Sampling	Population divided into clusters; sample whole clusters
Multi-stage Sampling	Successive stages (state → district → village → household)
Probability Proportional to Size (PPS)	Larger units have higher selection probability

72.4.2 Non-Probability Sampling

Non-probability sampling methods

Method	Description
Convenience Sampling	Accessibility-based
Judgemental / Purposive	Researcher’s judgement
Quota Sampling	Quotas filled (similar to stratified but non-random)
Snowball Sampling	Referrals; rare populations
Self-Selection Sampling	Volunteers

72.5 Stratified vs Cluster Sampling

Stratified vs Cluster

Dimension	Stratified	Cluster
Subgroups	Homogeneous within, heterogeneous between	Heterogeneous within, homogeneous between
Sampling	From each stratum	Sample whole clusters
Goal	Reduce variance	Reduce cost
Example	Sample by income strata	Sample by villages

72.6 Sample Size Determination

Factors affecting sample size

Population size.
Confidence level (90 %, 95 %, 99 %).
Margin of error / Precision (E).
Variability (σ or p×q).
Population strata.

72.6.1 Sample Size Formulas

For mean estimation: \[n = \left(\frac{Z \cdot \sigma}{E}\right)^2\]

For proportion estimation: \[n = \frac{Z^2 \cdot p \cdot q}{E^2}\]

Where Z = z-score for confidence level, σ = std dev, p = proportion, q = 1−p, E = margin of error.

Slovin’s Formula: n = N / (1 + N·e²) — quick approximation.

Cochran’s Formula (1977) — most widely used.

72.7 Sampling Errors and Bias

Sources of error

Sampling error — random; reducible by larger n.
Non-sampling error:
- Coverage error — frame issues.
- Non-response bias.
- Response bias.
- Measurement error.
- Processing error.
Selection bias — non-random.

72.8 Confidence Intervals

A 95 % CI means the true population parameter lies in the interval with 95 % probability:

\[\text{CI} = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}\]

Z-values: 90 % = 1.645 · 95 % = 1.96 · 99 % = 2.576.

72.9 Indian National Surveys

Major Indian sample surveys

NSS / NSO Surveys — quinquennial.
NFHS — National Family Health Survey.
PLFS — Periodic Labour Force Survey.
CMIE Consumer Pyramids.
IRS — Indian Readership Survey.
CES — Consumer Expenditure Survey.
ASI — Annual Survey of Industries.
Election polls — C-Voter, CSDS.

72.10 Modern Trends

Modern sampling trends

Big-data sampling — non-probability digital footprints.
Adaptive sampling.
Respondent-driven sampling (RDS) for hidden populations.
Bayesian sampling.
Sequential sampling.
Online panel-based sampling.
Synthetic samples / AI-generated.
Differential privacy in sampling.

72.11 Practice Questions

Q 01ProbabilityEasy

Which is NOT a probability sampling method?

ASRS
BStratified
CCluster
DConvenience

View solution

Correct Option: D

Convenience is non-probability.

Q 02StratifiedMedium

In stratified sampling, strata are:

AHeterogeneous within, homogeneous between
BHomogeneous within, heterogeneous between
CRandomly chosen
DGeographic only

View solution

Correct Option: B

Reduce variance by homogeneous strata.

Q 0395 % ZMedium

Z-value for 95 % confidence interval is:

A1.645
B1.96
C2.576
D3.00

View solution

Correct Option: B

95 % CI → Z = 1.96.

Q 04ClusterMedium

Sampling all households in 10 randomly chosen villages is:

ACluster sampling
BStratified
CSystematic
DQuota

View solution

Correct Option: A

Whole clusters sampled.

Q 05SnowballMedium

Snowball sampling is best for:

ALarge populations
BHidden / hard-to-reach populations
CRandom studies
DGovernment surveys

View solution

Correct Option: B

Referral chains.

Q 06Sampling frameEasy

A "sampling frame" is:

AList from which sample is drawn
BThe questionnaire
CSample size formula
DBias correction

View solution

Correct Option: A

List/source of units.

Q 07MahalanobisHard

Large-scale survey sampling in India was pioneered by:

AP.C. Mahalanobis
BC.R. Rao
CManmohan Singh
DV.K.R.V. Rao

View solution

Correct Option: A

P.C. Mahalanobis, ISI (1937).

Q 08SystematicMedium

Selecting every 10th customer is:

ASRS
BSystematic
CStratified
DConvenience

View solution

Correct Option: B

Every k-th unit.

Q 09SlovinHard

Slovin's formula is:

An = N / (1 + Ne²)
Bn = Z²σ²/E²
Cn = pq/E
Dn = N²·E

View solution

Correct Option: A

Quick approximation.

Q 10PPSHard

PPS sampling stands for:

AProbability Proportional to Size
BPre-Probability Selection
CPilot-Proportional Survey
DPopulation-Proportional Sampling

View solution

Correct Option: A

Larger units have higher selection probability.

Q 11QuotaMedium

Quota sampling differs from stratified in that it is:

ANon-random
BRandom
CHierarchical
DAlways larger

View solution

Correct Option: A

Quota is non-probability; stratified is random.

Q 12NFHSMedium

NFHS in India stands for:

ANational Family Health Survey
BNational Food and Health Survey
CNational Financial House Survey
DNational Fishery Health Survey

View solution

Correct Option: A

Conducted by IIPS Mumbai.

Q 13NeymanHard

Modern survey-sampling theory (1934) was formalised by:

AJerzy Neyman
BFisher
CCochran
DPearson

View solution

Correct Option: A

Jerzy Neyman (1934).

Q 14Population vs sampleEasy

A measure of population is called:

AStatistic
BParameter
CVariable
DEstimate

View solution

Correct Option: B

Population → Parameter; Sample → Statistic.

Q 15MatchHard

Match:

(i)	SRS	(a)	Non-random
(ii)	Stratified	(b)	Whole groups
(iii)	Cluster	(c)	Equal chance
(iv)	Convenience	(d)	Strata-based

A(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)

View solution

Correct Option: A

SRS — Equal chance; Stratified — Strata; Cluster — Whole groups; Convenience — Non-random.

72.11.1 Advanced Format Questions

AR 1Assertion-ReasonHard

A: Probability sampling permits statistical inference.
R: Each unit has a known, non-zero probability of being selected.

ABoth true; R explains A
BBoth true; R does not explain A
CA true, R false
DA false, R true

View solution

Correct Option: A

S 1Statement-basedMedium

Probability techniques: (i) SRS. (ii) Systematic. (iii) Stratified. (iv) Cluster.

AAll four
B(i) and (ii) only
C(iii) and (iv) only
D(i), (ii), (iii) only

View solution

Correct Option: A

N 1NumericalMedium

For p = 0.5, E = 5%, 95% CI, sample size n =

A≈ 385
B≈ 100
C≈ 1,000
D≈ 50

View solution

Correct Option: A

n = Z²pq/E² = (1.96)²(0.5)(0.5)/(0.05)² ≈ 384.16.

N 2NumericalHard

Population N = 1,000; e = 5%. Slovin's formula n =

A≈ 286
B≈ 50
C≈ 1,000
D≈ 100

View solution

Correct Option: A

n = 1000/(1+1000×0.0025) = 1000/3.5 ≈ 285.7.

72.12 Quick Recall

Quick recall

Sampling — subset of population. Neyman (1934) · Mahalanobis Indian surveys.
Terms: Population N · Sample n · Frame · Unit · Parameter (μ) · Statistic (X̄) · Sampling error · Sampling distribution.
Process (5 steps): define population → frame → technique → size → execute.
Probability: SRS · Systematic · Stratified · Cluster · Multi-stage · PPS.
Non-probability: Convenience · Judgemental · Quota · Snowball · Self-selection.
Stratified (homogeneous within strata) vs Cluster (heterogeneous within clusters).
Sample size: n = (Zσ/E)² for mean; n = Z²pq/E² for proportion; Slovin n = N/(1+Ne²); Cochran (1977).
CI: 95 % → Z = 1.96; 99 % → 2.576.
Errors: Sampling (random) vs Non-sampling (coverage, non-response, response, measurement, processing).
India: NSS/NSO · NFHS · PLFS · ASI · IRS · Census · CMIE · CSDS · C-Voter.
Modern: big-data sampling · RDS · Bayesian · sequential · online panels · synthetic data · differential privacy.