71  Data Collection and Questionnaire Design

71.1 Concept

Data Collection = the process of gathering and measuring information on variables of interest in a systematic, established manner — enables answering of research questions and testing hypotheses. Donald Cooper & Pamela Schindler (Business Research Methods), Naresh Malhotra, Earl Babbie, C.R. Kothari are the leading textbook authors.

71.2 Types of Data

TipPrimary vs Secondary

Primary data — collected first-hand by researcher for specific purpose. - Methods: Survey · Observation · Experiment · Interview · Focus group.

Secondary data — already-collected data used for new research. - Sources: govt publications, industry reports, databases, internet.

71.3 Data-Collection Methods

TipMajor data-collection methods
Method Description
Survey / Questionnaire Structured questions
Personal Interview One-to-one
Telephone Interview / CATI Computer-Aided Telephone Interview
Mail / Postal Survey Self-administered
Online Survey Web-based — Google Forms, SurveyMonkey, Qualtrics
Mobile Survey SMS, WhatsApp, app
Observation Direct or participant
Experiment Lab or field
Focus Group 6-10 participants discussion
Depth Interview Long, qualitative
Projective Techniques Indirect (word association, TAT, Rorschach)
Ethnography Immersive observation
Content Analysis Documents, social media
Big Data / Web Scraping Digital footprints

71.4 Survey Errors

TipCooper-Schindler — survey error types
  • Sampling errors — random.
  • Non-sampling errors:
    • Response bias — social desirability, acquiescence.
    • Non-response bias.
    • Interviewer bias.
    • Measurement / instrument bias.
    • Processing errors.

71.5 Questionnaire Design

A questionnaire is a formalised set of questions to obtain information from respondents. Naresh Malhotra’s 10-step questionnaire design process is widely taught.

71.5.1 Malhotra’s 10-step Questionnaire Design

TipQuestionnaire design — 10 steps (Malhotra)
  1. Specify information needed.
  2. Type of interview method.
  3. Determine content of individual questions.
  4. Design questions to overcome respondent’s inability and unwillingness to answer.
  5. Decide question structure.
  6. Determine question wording.
  7. Arrange questions in proper order.
  8. Identify form and layout.
  9. Reproduce the questionnaire.
  10. Pretest, revise and prepare final.

71.6 Types of Questions

TipQuestion types
Type Description
Open-ended Free response
Close-ended Predefined options
Dichotomous Two options (Yes/No)
Multiple choice Several options
Rating scale Likert, Semantic Differential
Ranking Order preferences
Filter / Screener Routing
Contingency Conditional

71.7 Scaling Techniques

TipMajor scaling techniques
Scale Description Inventor
Likert 5/7-point agreement (Strongly Disagree → Strongly Agree) Rensis Likert (1932)
Semantic Differential Bipolar adjectives (Good-Bad) Charles Osgood (1957)
Thurstone Equal-appearing intervals L.L. Thurstone (1928)
Guttman Cumulative scale Louis Guttman (1944)
Stapel Scale +5 to −5 single adjective Jan Stapel
Bogardus Social Distance Acceptance closeness Emory Bogardus (1925)
Q-Sort Forced-choice ranking William Stephenson (1953)
Constant Sum Distribute fixed points
Paired Comparison Pairs ranked

71.8 Validity and Reliability

TipValidity
  • Content validity — coverage.
  • Construct validity — Convergent + Discriminant.
  • Criterion validity — Concurrent + Predictive.
  • Face validity — appears valid.
TipReliability
  • Test-retest — consistency over time.
  • Internal consistencyCronbach’s Alpha (1951) — > 0.7 acceptable.
  • Parallel forms.
  • Inter-rater reliability — Cohen’s Kappa.
  • Split-half.

71.9 Pretest and Pilot Study

Run questionnaire with small sample (15-30) before main study. Identifies — confusing questions, time required, response patterns.

71.10 Errors to Avoid in Questions

TipQuestion wording errors
  • Leading questions — bias response.
  • Loaded questions — emotional charge.
  • Double-barrelled questions — two ideas in one.
  • Ambiguous questions — vague terms.
  • Negative wording.
  • Jargon and complex vocabulary.
  • Assumptive questions.
  • Generalisation questions.

71.11 Indian Research Agencies

TipIndian market research agencies
  • IMRB / Kantar India.
  • Nielsen India.
  • AC Nielsen ORG-MARG (now Nielsen).
  • GfK India.
  • Hansa Research.
  • Ipsos India.
  • TNS India.
  • C-Voter (political).
  • CMIE (economic data).
  • Crisil Research.

71.13 Practice Questions

Q 01LikertEasy

The Likert scale (1932) was developed by:

  • ARensis Likert
  • BCharles Osgood
  • CThurstone
  • DGuttman
View solution
Correct Option: A
Rensis Likert (1932).
Q 02SemanticMedium

Semantic Differential scale uses:

  • ABipolar adjective pairs
  • B5-point agreement
  • CForced ranking
  • DYes/No
View solution
Correct Option: A
Osgood (1957) — Good-Bad, Hot-Cold etc.
Q 03CronbachMedium

Cronbach's Alpha (1951) measures:

  • AInternal-consistency reliability
  • BValidity
  • CSkewness
  • DStatistical significance
View solution
Correct Option: A
Acceptable α ≥ 0.7.
Q 04BogardusHard

Social Distance scale (1925) is by:

  • AEmory Bogardus
  • BLikert
  • CThurstone
  • DGuttman
View solution
Correct Option: A
Bogardus 1925.
Q 05PretestEasy

Pretest is conducted to:

  • AIdentify problems in questionnaire
  • BInflate sample size
  • CAvoid sampling
  • DForecast results
View solution
Correct Option: A
Catch confusing questions, timing issues.
Q 06Double-barrelMedium

"Do you find the website fast and easy to use?" is an example of:

  • ALeading question
  • BDouble-barrelled question
  • CLoaded question
  • DAmbiguous question
View solution
Correct Option: B
Two attributes (fast + easy) in one question.
Q 07Q-SortHard

Q-Sort technique is by:

  • AWilliam Stephenson
  • BLikert
  • COsgood
  • DGuttman
View solution
Correct Option: A
William Stephenson (1953).
Q 08Open-endedEasy

"What did you like about our product?" is:

  • AOpen-ended
  • BClose-ended
  • CDichotomous
  • DMultiple choice
View solution
Correct Option: A
Free response.
Q 09Focus groupMedium

Optimal size of a focus group is:

  • A2-3
  • B6-10
  • C20-30
  • D50+
View solution
Correct Option: B
6-10 participants is the norm.
Q 10Secondary dataEasy

Census of India data is:

  • APrimary data
  • BSecondary data
  • CSample data
  • DExperimental data
View solution
Correct Option: B
Pre-collected; used by other researchers → secondary.
Q 11ValidityMedium

"Does the scale measure what it claims to measure?" refers to:

  • AReliability
  • BValidity
  • CSensitivity
  • DStability
View solution
Correct Option: B
Validity = accuracy of measurement.
Q 12ProjectiveHard

Word association and TAT are examples of:

  • AProjective techniques
  • BLikert scaling
  • CStratified sampling
  • DQuantitative methods
View solution
Correct Option: A
Indirect / qualitative.
Q 13ThurstoneHard

Equal-appearing intervals scale is by:

  • AL.L. Thurstone
  • BLikert
  • CStapel
  • DBogardus
View solution
Correct Option: A
L.L. Thurstone (1928).
Q 14CATIHard

CATI stands for:

  • AComputer-Assisted Telephone Interview
  • BCumulative Attitude Test Index
  • CComputer Algorithm for Test Interpretation
  • DCentralised Audit Tool
View solution
Correct Option: A
Phone survey software.
Q 15Match scalesHard

Match:

(i) Likert (a) Bipolar
(ii) Semantic Differential (b) Cumulative
(iii) Guttman (c) Agreement
(iv) Bogardus (d) Social distance
  • A(i)-(c), (ii)-(a), (iii)-(b), (iv)-(d)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
  • D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)
View solution
Correct Option: A
Likert — Agreement; Semantic — Bipolar; Guttman — Cumulative; Bogardus — Social Distance.

71.13.1 Advanced Format Questions

AR 1Assertion-ReasonHard

A: Likert is summated rating.
R: It uses bipolar adjective pairs.

  • ABoth true; R explains A
  • BBoth true; R does not explain A
  • CA true, R false
  • DA false, R true
View solution
Correct Option: C
Bipolar pairs are used in Semantic Differential, not Likert.
S 1Statement-basedMedium

Reliability measures: (i) Test-retest. (ii) Internal consistency. (iii) Parallel forms. (iv) Inter-rater.

  • AAll four
  • B(i) and (ii) only
  • C(iii) and (iv) only
  • D(ii) only
View solution
Correct Option: A
S 2Statement-basedHard

Questionnaire pitfalls: (i) Leading. (ii) Double-barrelled. (iii) Ambiguous. (iv) Loaded.

  • AAll four
  • B(i) and (ii) only
  • C(iii) and (iv) only
  • D(i), (ii), (iii) only
View solution
Correct Option: A

71.14 Quick Recall

ImportantQuick recall
  • Data: Primary vs Secondary.
  • Methods: Survey · Interview · Observation · Experiment · Focus Group · Projective · Ethnography · Web scraping.
  • Errors: Sampling vs Non-sampling (response, non-response, interviewer, measurement, processing).
  • Malhotra’s 10-step questionnaire process.
  • Question types: Open · Close · Dichotomous · MCQ · Rating · Ranking · Filter · Contingency.
  • Scales: Likert (1932) · Semantic Differential (Osgood 1957) · Thurstone (1928) · Guttman (1944) · Stapel · Bogardus Social Distance (1925) · Q-Sort (Stephenson 1953) · Constant Sum · Paired Comparison.
  • Validity: Content · Construct (Convergent/Discriminant) · Criterion (Concurrent/Predictive) · Face.
  • Reliability: Test-retest · Cronbach’s α (1951) ≥ 0.7 · Parallel · Inter-rater (Cohen’s κ) · Split-half.
  • Question errors: Leading · Loaded · Double-barrelled · Ambiguous · Negative · Jargon · Assumptive.
  • Indian agencies: IMRB/Kantar · Nielsen · Ipsos · CMIE · Crisil · C-Voter · Hansa · GfK.
  • Modern trends: online/mobile · big data · social listening · biometrics · neuro · AI/synthetic data · conversational · geo-location · privacy-first.