88  Data Warehousing, Mining and Knowledge Management

88.1 Data Warehouse — Concept

Data Warehouse (DW) = “a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process”W.H. Inmon (1990). Inmon is the “Father of Data Warehousing”; Ralph Kimball is the proponent of the dimensional / star-schema approach.

88.2 Inmon vs Kimball

TipInmon vs Kimball
Dimension Inmon Kimball
Approach Top-down Bottom-up
Data model 3NF / Normalised Dimensional / Star
Build order EDW first, then data marts Data marts first, integrated later
Data marts Dependent Conformed
Use Enterprise integration Fast analytics

88.3 Data Warehouse Architecture

TipDW components
  • Source systems — OLTP DBs, files.
  • ETL / ELT (Extract, Transform, Load).
  • Data Warehouse / Lake / Lakehouse.
  • OLAP cubes.
  • Data Marts — subject-specific.
  • Metadata repository.
  • BI tools — Tableau, Power BI.

88.4 Schemas

TipDW schemas
  • Star schema — central fact + dimension tables.
  • Snowflake schema — normalised dimensions.
  • Galaxy / Fact-constellation — multiple facts.
  • Slowly Changing Dimensions (SCD) — Type 1, 2, 3, 6.

88.5 OLTP vs OLAP

TipOLTP vs OLAP
Dimension OLTP OLAP
Purpose Daily operations Analysis
Workload Many small txns Few large queries
Schema Normalised Star/Snowflake
Data age Current Historical
Users Operational staff Analysts, executives

88.6 Data Mining — Concept

Data Mining = the process of discovering patterns, correlations and useful information from large datasets using statistical, machine-learning and database techniques. Han & Kamber’s textbook is the standard reference. The acronym KDD (Knowledge Discovery in Databases) is from Usama Fayyad (1996).

88.7 KDD Process

TipKDD process (Fayyad)
  1. Data Selection.
  2. Pre-processing (clean, integrate).
  3. Transformation — feature engineering.
  4. Data Mining — algorithms.
  5. Interpretation / Evaluation.

88.8 CRISP-DM

Cross-Industry Standard Process for Data Mining (1999) — six phases: Business Understanding → Data Understanding → Data Preparation → Modelling → Evaluation → Deployment.

88.9 Data Mining Tasks

TipMining task types
  • Classification — supervised; predict class.
  • Regression — predict numeric.
  • Clustering — unsupervised grouping.
  • Association Rule Mining — Apriori, FP-Growth (market-basket).
  • Sequential pattern mining — time-ordered.
  • Anomaly detection — fraud, outliers.
  • Text / Web / Image / Graph mining.
  • Process mining — discover business processes from logs.

88.10 Market-Basket and Apriori

TipMarket-basket analysis
  • Support = P(A ∩ B) = count(A∩B)/N.
  • Confidence = P(B|A) = support(A∩B) / support(A).
  • Lift = confidence / P(B).
  • Apriori algorithm — Agrawal & Srikant (1994).
  • FP-Growth — Han 2000.
  • Diapers and Beer — famous anecdote.

88.11 Knowledge Management — Concept

Knowledge Management (KM) = the systematic process of capturing, developing, sharing and effectively using organisational knowledge for competitive advantage. Ikujiro Nonaka & Hirotaka Takeuchi wrote The Knowledge-Creating Company (1995). Karl-Erik Sveibyintellectual capital. Thomas Davenport & Laurence PrusakWorking Knowledge (1998). Peter Drucker — coined “knowledge worker” (1959).

88.12 Tacit vs Explicit Knowledge

TipKnowledge types
  • Tacit — personal, hard to codify; experience, intuition. Michael Polanyi (1966) — “We can know more than we can tell.”
  • Explicit — documented, codifiable.

88.13 Nonaka’s SECI Model

TipSECI Model (Nonaka 1995)
Mode From → To Example
Socialisation Tacit → Tacit Apprenticeship
Externalisation Tacit → Explicit Manuals, writing
Combination Explicit → Explicit Reports, training
Internalisation Explicit → Tacit Learning by doing

Plus Ba — shared context for knowledge creation (Nonaka, Konno 1998).

88.14 DIKW Pyramid

TipDIKW Pyramid

Data → Information → Knowledge → Wisdom — Russell Ackoff (1989).

88.15 KM Strategies — Hansen-Nohria-Tierney (1999)

TipKM strategies
  • Codification — store in databases (Accenture, Andersen).
  • Personalisation — person-to-person (McKinsey, BCG).

88.16 KM Technologies

TipKM tools
  • Document Management Systems (DMS).
  • Content Management Systems (CMS).
  • Intranets / Portals.
  • Knowledge bases / Wikis.
  • Communities of Practice (CoP) — Wenger 1998.
  • Expert systems.
  • Social collaboration — Slack, Teams.
  • Semantic search.
  • AI-based KM — Gen-AI chatbots over corporate knowledge (RAG).

88.17 Indian Context

TipIndia context
  • TCS, Infosys, Wipro — pioneers of large-scale KM.
  • Infosys KM programme (KShop) was an early case study.
  • CII Centre for Excellence in KM.
  • ISRO — knowledge re-use across missions.
  • IITs, IIMs — academic KM research (NMIMS, IIM-Bangalore).
  • Data Mining Indian firms: Mu Sigma, Fractal, Tiger Analytics, Tredence.
  • Government: PRAGATI dashboard (PM monitoring), DBT-Aadhaar linkage.

88.19 Practice Questions

Q 01InmonMedium

"Father of Data Warehousing" is:

  • AW.H. Inmon
  • BRalph Kimball
  • CCodd
  • DHan
View solution
Correct Option: A
Bill Inmon, 1990.
Q 02KimballMedium

Dimensional / star-schema approach is by:

  • ARalph Kimball
  • BInmon
  • CHan
  • DCodd
View solution
Correct Option: A
Ralph Kimball.
Q 03SECIMedium

SECI Model is by:

  • ANonaka & Takeuchi
  • BDavenport
  • CPolanyi
  • DDrucker
View solution
Correct Option: A
Nonaka-Takeuchi (1995).
Q 04CRISP-DMMedium

CRISP-DM has:

  • A6 phases
  • B4 phases
  • C7 phases
  • D5 phases
View solution
Correct Option: A
Business → Data → Prep → Modelling → Evaluation → Deployment.
Q 05AprioriHard

Apriori algorithm (1994) is by:

  • AAgrawal & Srikant
  • BHan
  • CQuinlan
  • DCodd
View solution
Correct Option: A
Rakesh Agrawal and Ramakrishnan Srikant (IBM Almaden).
Q 06DIKWMedium

DIKW pyramid order is:

  • AData → Information → Knowledge → Wisdom
  • BKnowledge → Data → Information → Wisdom
  • CWisdom → Knowledge → Information → Data
  • DData → Wisdom → Knowledge → Info
View solution
Correct Option: A
Ackoff 1989.
Q 07TacitMedium

"We can know more than we can tell" is by:

  • AMichael Polanyi
  • BDrucker
  • CSenge
  • DNonaka
View solution
Correct Option: A
Michael Polanyi (1966).
Q 08Star schemaMedium

Star schema has central:

  • AFact table
  • BDimension table
  • CIndex
  • DView
View solution
Correct Option: A
Fact in middle, dimensions around it.
Q 09Knowledge workerHard

"Knowledge worker" was coined by:

  • APeter Drucker
  • BNonaka
  • CPolanyi
  • DSenge
View solution
Correct Option: A
Drucker (1959).
Q 10HansenHard

McKinsey & BCG use which KM strategy?

  • APersonalisation
  • BCodification
  • CEncryption
  • DDocumentation
View solution
Correct Option: A
Person-to-person, Hansen-Nohria-Tierney 1999.
Q 11CoPHard

Communities of Practice is by:

  • AEtienne Wenger
  • BDrucker
  • CPolanyi
  • DNonaka
View solution
Correct Option: A
Etienne Wenger, 1998.
Q 12LiftHard

In association rules, Lift = 1 means:

  • AIndependence between A and B
  • BStrong association
  • CNo data
  • DNegative
View solution
Correct Option: A
Lift > 1 → positive association; = 1 independent; < 1 negative.
Q 13Data meshHard

"Data Mesh" architecture is by:

  • AZhamak Dehghani
  • BInmon
  • CKimball
  • DCodd
View solution
Correct Option: A
Zhamak Dehghani (2019).
Q 14OLAPMedium

OLAP optimised for:

  • AAnalytical queries
  • BTransaction processing
  • CEncryption
  • DStorage
View solution
Correct Option: A
On-Line Analytical Processing.
Q 15MatchHard

Match SECI:

(i) Socialisation (a) Tacit→Explicit
(ii) Externalisation (b) Tacit→Tacit
(iii) Combination (c) Explicit→Tacit
(iv) Internalisation (d) Explicit→Explicit
  • A(i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)
View solution
Correct Option: A
S = T→T; E = T→Ex; C = Ex→Ex; I = Ex→T.

88.19.1 Advanced Format Questions

AR 1Assertion-ReasonHard

A: Inmon used 3NF and top-down approach.
R: Kimball used dimensional star schema and bottom-up.

  • ABoth true; R explains A
  • BBoth true; R does not explain A
  • CA true, R false
  • DA false, R true
View solution
Correct Option: B
S 1Statement-basedMedium

SECI Model (Nonaka): (i) Socialisation. (ii) Externalisation. (iii) Combination. (iv) Internalisation.

  • AAll four
  • B(i) and (ii) only
  • C(iii) and (iv) only
  • D(i) only
View solution
Correct Option: A
S 2Statement-basedHard

Mining tasks: (i) Classification. (ii) Clustering. (iii) Association. (iv) Anomaly detection.

  • AAll four
  • B(i) and (ii) only
  • C(iii) and (iv) only
  • D(i), (ii), (iii) only
View solution
Correct Option: A

88.20 Quick Recall

ImportantQuick recall
  • DW (Inmon 1990) — subject-oriented, integrated, time-variant, non-volatile.
  • Inmon (top-down, 3NF) vs Kimball (bottom-up, dimensional/star).
  • Schemas: Star · Snowflake · Galaxy · SCD types.
  • OLTP vs OLAP — operations vs analysis.
  • Architecture: Source → ETL/ELT → DW/Lake → OLAP → Marts → Metadata → BI.
  • Data mining = KDD — Fayyad (1996) 5 steps; CRISP-DM 6 phases.
  • Mining tasks: Classification · Regression · Clustering · Association rules (Apriori — Agrawal-Srikant 1994; FP-Growth — Han 2000) · Sequential · Anomaly · Text/Web/Image/Graph · Process mining.
  • Market basket: Support · Confidence · Lift; Diapers-Beer.
  • KM: Davenport-Prusak · Nonaka-Takeuchi 1995 · Sveiby · Drucker (knowledge worker 1959).
  • Tacit vs Explicit (Polanyi 1966).
  • SECI: Socialisation (T→T) · Externalisation (T→Ex) · Combination (Ex→Ex) · Internalisation (Ex→T); + Ba.
  • DIKW — Data → Information → Knowledge → Wisdom (Ackoff 1989).
  • KM strategies (Hansen-Nohria-Tierney 1999): Codification vs Personalisation.
  • Tools: DMS · CMS · portals · wikis · CoP (Wenger 1998) · ES · Slack · semantic search · LLM+RAG.
  • India: TCS · Infosys KShop · CII KM · ISRO · IIM-B · Mu Sigma · Fractal.
  • Modern: Cloud DW (Snowflake, Databricks, BigQuery) · Lakehouse · Kafka+Spark · AutoML · MLOps · DataOps · Knowledge graphs · Vector DBs · Semantic layer · Data Mesh (Dehghani 2019) · differential privacy.