88 Data Warehousing, Mining and Knowledge Management

88.1 Data Warehouse — Concept

Data Warehouse (DW) = “a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process” — W.H. Inmon (1990). Inmon is the “Father of Data Warehousing”; Ralph Kimball is the proponent of the dimensional / star-schema approach.

88.2 Inmon vs Kimball

Inmon vs Kimball

Dimension	Inmon	Kimball
Approach	Top-down	Bottom-up
Data model	3NF / Normalised	Dimensional / Star
Build order	EDW first, then data marts	Data marts first, integrated later
Data marts	Dependent	Conformed
Use	Enterprise integration	Fast analytics

88.3 Data Warehouse Architecture

DW components

Source systems — OLTP DBs, files.
ETL / ELT (Extract, Transform, Load).
Data Warehouse / Lake / Lakehouse.
OLAP cubes.
Data Marts — subject-specific.
Metadata repository.
BI tools — Tableau, Power BI.

88.4 Schemas

DW schemas

Star schema — central fact + dimension tables.
Snowflake schema — normalised dimensions.
Galaxy / Fact-constellation — multiple facts.
Slowly Changing Dimensions (SCD) — Type 1, 2, 3, 6.

88.5 OLTP vs OLAP

OLTP vs OLAP

Dimension	OLTP	OLAP
Purpose	Daily operations	Analysis
Workload	Many small txns	Few large queries
Schema	Normalised	Star/Snowflake
Data age	Current	Historical
Users	Operational staff	Analysts, executives

88.6 Data Mining — Concept

Data Mining = the process of discovering patterns, correlations and useful information from large datasets using statistical, machine-learning and database techniques. Han & Kamber’s textbook is the standard reference. The acronym KDD (Knowledge Discovery in Databases) is from Usama Fayyad (1996).

88.7 KDD Process

KDD process (Fayyad)

Data Selection.
Pre-processing (clean, integrate).
Transformation — feature engineering.
Data Mining — algorithms.
Interpretation / Evaluation.

88.8 CRISP-DM

Cross-Industry Standard Process for Data Mining (1999) — six phases: Business Understanding → Data Understanding → Data Preparation → Modelling → Evaluation → Deployment.

88.9 Data Mining Tasks

Mining task types

Classification — supervised; predict class.
Regression — predict numeric.
Clustering — unsupervised grouping.
Association Rule Mining — Apriori, FP-Growth (market-basket).
Sequential pattern mining — time-ordered.
Anomaly detection — fraud, outliers.
Text / Web / Image / Graph mining.
Process mining — discover business processes from logs.

88.10 Market-Basket and Apriori

Market-basket analysis

Support = P(A ∩ B) = count(A∩B)/N.
Confidence = P(B|A) = support(A∩B) / support(A).
Lift = confidence / P(B).
Apriori algorithm — Agrawal & Srikant (1994).
FP-Growth — Han 2000.
Diapers and Beer — famous anecdote.

88.11 Knowledge Management — Concept

Knowledge Management (KM) = the systematic process of capturing, developing, sharing and effectively using organisational knowledge for competitive advantage. Ikujiro Nonaka & Hirotaka Takeuchi wrote The Knowledge-Creating Company (1995). Karl-Erik Sveiby — intellectual capital. Thomas Davenport & Laurence Prusak — Working Knowledge (1998). Peter Drucker — coined “knowledge worker” (1959).

88.12 Tacit vs Explicit Knowledge

Knowledge types

Tacit — personal, hard to codify; experience, intuition. Michael Polanyi (1966) — “We can know more than we can tell.”
Explicit — documented, codifiable.

88.13 Nonaka’s SECI Model

SECI Model (Nonaka 1995)

Mode	From → To	Example
Socialisation	Tacit → Tacit	Apprenticeship
Externalisation	Tacit → Explicit	Manuals, writing
Combination	Explicit → Explicit	Reports, training
Internalisation	Explicit → Tacit	Learning by doing

Plus Ba — shared context for knowledge creation (Nonaka, Konno 1998).

88.14 DIKW Pyramid

DIKW Pyramid

Data → Information → Knowledge → Wisdom — Russell Ackoff (1989).

88.15 KM Strategies — Hansen-Nohria-Tierney (1999)

KM strategies

Codification — store in databases (Accenture, Andersen).
Personalisation — person-to-person (McKinsey, BCG).

88.16 KM Technologies

KM tools

Document Management Systems (DMS).
Content Management Systems (CMS).
Intranets / Portals.
Knowledge bases / Wikis.
Communities of Practice (CoP) — Wenger 1998.
Expert systems.
Social collaboration — Slack, Teams.
Semantic search.
AI-based KM — Gen-AI chatbots over corporate knowledge (RAG).

88.17 Indian Context

India context

TCS, Infosys, Wipro — pioneers of large-scale KM.
Infosys KM programme (KShop) was an early case study.
CII Centre for Excellence in KM.
ISRO — knowledge re-use across missions.
IITs, IIMs — academic KM research (NMIMS, IIM-Bangalore).
Data Mining Indian firms: Mu Sigma, Fractal, Tiger Analytics, Tredence.
Government: PRAGATI dashboard (PM monitoring), DBT-Aadhaar linkage.

88.18 Modern Trends

Modern DM / KM trends

Cloud data warehouse / Lakehouse — Snowflake, Databricks, BigQuery.
Real-time analytics — Kafka + Spark.
AutoML — Google, DataRobot.
MLOps — production ML pipelines.
DataOps — agile data delivery.
Knowledge graphs — Google, Microsoft.
Generative-AI-powered KM — LLM + RAG over enterprise data.
Vector databases — Pinecone, Weaviate, Milvus.
Semantic layer.
Data mesh (Dehghani 2019) — decentralised.
Federated KM.
Data contracts.
Privacy-preserving analytics — differential privacy.

88.19 Practice Questions

Q 01InmonMedium

"Father of Data Warehousing" is:

AW.H. Inmon
BRalph Kimball
CCodd
DHan

View solution

Correct Option: A

Bill Inmon, 1990.

Q 02KimballMedium

Dimensional / star-schema approach is by:

ARalph Kimball
BInmon
CHan
DCodd

View solution

Correct Option: A

Ralph Kimball.

Q 03SECIMedium

SECI Model is by:

ANonaka & Takeuchi
BDavenport
CPolanyi
DDrucker

View solution

Correct Option: A

Nonaka-Takeuchi (1995).

Q 04CRISP-DMMedium

CRISP-DM has:

A6 phases
B4 phases
C7 phases
D5 phases

View solution

Correct Option: A

Business → Data → Prep → Modelling → Evaluation → Deployment.

Q 05AprioriHard

Apriori algorithm (1994) is by:

AAgrawal & Srikant
BHan
CQuinlan
DCodd

View solution

Correct Option: A

Rakesh Agrawal and Ramakrishnan Srikant (IBM Almaden).

Q 06DIKWMedium

DIKW pyramid order is:

AData → Information → Knowledge → Wisdom
BKnowledge → Data → Information → Wisdom
CWisdom → Knowledge → Information → Data
DData → Wisdom → Knowledge → Info

View solution

Correct Option: A

Ackoff 1989.

Q 07TacitMedium

"We can know more than we can tell" is by:

AMichael Polanyi
BDrucker
CSenge
DNonaka

View solution

Correct Option: A

Michael Polanyi (1966).

Q 08Star schemaMedium

Star schema has central:

AFact table
BDimension table
CIndex
DView

View solution

Correct Option: A

Fact in middle, dimensions around it.

Q 09Knowledge workerHard

"Knowledge worker" was coined by:

APeter Drucker
BNonaka
CPolanyi
DSenge

View solution

Correct Option: A

Drucker (1959).

Q 10HansenHard

McKinsey & BCG use which KM strategy?

APersonalisation
BCodification
CEncryption
DDocumentation

View solution

Correct Option: A

Person-to-person, Hansen-Nohria-Tierney 1999.

Q 11CoPHard

Communities of Practice is by:

AEtienne Wenger
BDrucker
CPolanyi
DNonaka

View solution

Correct Option: A

Etienne Wenger, 1998.

Q 12LiftHard

In association rules, Lift = 1 means:

AIndependence between A and B
BStrong association
CNo data
DNegative

View solution

Correct Option: A

Lift > 1 → positive association; = 1 independent; < 1 negative.

Q 13Data meshHard

"Data Mesh" architecture is by:

AZhamak Dehghani
BInmon
CKimball
DCodd

View solution

Correct Option: A

Zhamak Dehghani (2019).

Q 14OLAPMedium

OLAP optimised for:

AAnalytical queries
BTransaction processing
CEncryption
DStorage

View solution

Correct Option: A

On-Line Analytical Processing.

Q 15MatchHard

Match SECI:

(i)	Socialisation	(a)	Tacit→Explicit
(ii)	Externalisation	(b)	Tacit→Tacit
(iii)	Combination	(c)	Explicit→Tacit
(iv)	Internalisation	(d)	Explicit→Explicit

A(i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)

View solution

Correct Option: A

S = T→T; E = T→Ex; C = Ex→Ex; I = Ex→T.

88.19.1 Advanced Format Questions

AR 1Assertion-ReasonHard

A: Inmon used 3NF and top-down approach.
R: Kimball used dimensional star schema and bottom-up.

ABoth true; R explains A
BBoth true; R does not explain A
CA true, R false
DA false, R true

View solution

Correct Option: B

S 1Statement-basedMedium

SECI Model (Nonaka): (i) Socialisation. (ii) Externalisation. (iii) Combination. (iv) Internalisation.

AAll four
B(i) and (ii) only
C(iii) and (iv) only
D(i) only

View solution

Correct Option: A

S 2Statement-basedHard

Mining tasks: (i) Classification. (ii) Clustering. (iii) Association. (iv) Anomaly detection.

AAll four
B(i) and (ii) only
C(iii) and (iv) only
D(i), (ii), (iii) only

View solution

Correct Option: A

88.20 Quick Recall

Quick recall

DW (Inmon 1990) — subject-oriented, integrated, time-variant, non-volatile.
Inmon (top-down, 3NF) vs Kimball (bottom-up, dimensional/star).
Schemas: Star · Snowflake · Galaxy · SCD types.
OLTP vs OLAP — operations vs analysis.
Architecture: Source → ETL/ELT → DW/Lake → OLAP → Marts → Metadata → BI.
Data mining = KDD — Fayyad (1996) 5 steps; CRISP-DM 6 phases.
Mining tasks: Classification · Regression · Clustering · Association rules (Apriori — Agrawal-Srikant 1994; FP-Growth — Han 2000) · Sequential · Anomaly · Text/Web/Image/Graph · Process mining.
Market basket: Support · Confidence · Lift; Diapers-Beer.
KM: Davenport-Prusak · Nonaka-Takeuchi 1995 · Sveiby · Drucker (knowledge worker 1959).
Tacit vs Explicit (Polanyi 1966).
SECI: Socialisation (T→T) · Externalisation (T→Ex) · Combination (Ex→Ex) · Internalisation (Ex→T); + Ba.
DIKW — Data → Information → Knowledge → Wisdom (Ackoff 1989).
KM strategies (Hansen-Nohria-Tierney 1999): Codification vs Personalisation.
Tools: DMS · CMS · portals · wikis · CoP (Wenger 1998) · ES · Slack · semantic search · LLM+RAG.
India: TCS · Infosys KShop · CII KM · ISRO · IIM-B · Mu Sigma · Fractal.
Modern: Cloud DW (Snowflake, Databricks, BigQuery) · Lakehouse · Kafka+Spark · AutoML · MLOps · DataOps · Knowledge graphs · Vector DBs · Semantic layer · Data Mesh (Dehghani 2019) · differential privacy.