# Chapter 21 Project — Real-Time Customer 360 Case Study

This shared project accompanies Chapter 21, **Capstone 1: Real-Time Customer 360 with CDC, Streaming, and Lakehouse Data Products**. It is intentionally lightweight so readers can run the core ideas locally without Kafka, Flink, or a cloud lakehouse, while still preserving the same data-product thinking used in production systems.

The project simulates a retail organization, TuranMart, that wants a trusted Customer 360 profile for personalization, support, marketing, analytics, and ML features. The code generates reproducible operational events, stages them as bronze data, resolves customer identities into silver entities, creates gold Customer 360 marts, and produces a quality report that mimics production service-level checks.

## Learning outcomes

After completing the project, you should be able to explain how a real-time Customer 360 product combines **source-system capture**, **identity resolution**, **event-time feature aggregation**, **consent-aware activation**, and **operational quality controls**. You should also be able to map the local CSV implementation to a production architecture based on CDC, event streams, stream processors, and lakehouse serving tables.

## Directory layout

| Path | Purpose |
|---|---|
| `scripts/generate_data.py` | Creates deterministic sample operational data: customers, orders, clicks, tickets, campaigns, and consent events. |
| `scripts/build_customer360.py` | Builds bronze, silver, and gold data products using only the Python standard library. |
| `scripts/validate_outputs.py` | Produces a quality and freshness report for the generated Customer 360 outputs. |
| `data/raw/` | Source-like CSV files generated by the project. |
| `data/bronze/` | Immutable raw snapshots copied from the source-like files. |
| `data/silver/` | Cleaned entities and resolved customer profile base tables. |
| `data/gold/` | Serving-ready Customer 360, segment, and KPI data products. |
| `reports/` | Validation and operating-metric reports. |

## Quick start

From the repository root, run the following commands.

```bash
cd shared/projects/ch21_realtime_customer360
python3 scripts/generate_data.py
python3 scripts/build_customer360.py
python3 scripts/validate_outputs.py
```

The pipeline is deterministic and does not require external services. It generates compact CSV artifacts that can be inspected with a spreadsheet, Pandas, DuckDB, or any BI tool.

## Expected validation output

A successful run should end with messages similar to the following.

```text
Generated source data in .../shared/projects/ch21_realtime_customer360/data/raw
Built Customer 360 outputs in .../shared/projects/ch21_realtime_customer360/data
Wrote validation report to .../shared/projects/ch21_realtime_customer360/reports
```

The validation step writes `reports/customer360_quality_report.csv` and `reports/customer360_quality_report.md`. The report should include passing checks for profile coverage, duplicate profile count, and critical null rate. Freshness checks are tuned for the deterministic sample data; in a production system, the thresholds would normally be measured in seconds or minutes rather than hours.

## Case-study mapping

The local project maps production technologies to reader-friendly components. In production, the source tables and events would be captured through Debezium or another CDC connector, transported through Kafka topics, processed by Flink or Spark Structured Streaming, and persisted in a lakehouse. In this project, CSV files represent topic snapshots and standard-library Python transformations represent the data-product logic so the design remains transparent.

| Production concept | Local project implementation |
|---|---|
| CDC stream from operational databases | `data/raw/orders.csv`, `data/raw/customers.csv`, and bronze snapshots. |
| Clickstream topic | `data/raw/clickstream.csv` with session and product events. |
| Support and CRM events | `data/raw/support_tickets.csv`, `data/raw/campaign_events.csv`, and `data/raw/consent_events.csv`. |
| Bronze landing layer | `data/bronze/*_bronze.csv`, preserving source-shaped records. |
| Silver identity and entity layer | `data/silver/customer_identity.csv` and `data/silver/customer_profile_base.csv`. |
| Gold Customer 360 serving mart | `data/gold/customer_360.csv`, `data/gold/segment_summary.csv`, and `data/gold/kpi_summary.csv`. |
| Operational data-quality report | `reports/customer360_quality_report.csv` and `reports/customer360_quality_report.md`. |

## Deliverables for readers

The completed project should produce a concise data-product review. The review should include the generated Customer 360 table, the validation report, and a short explanation of which fields are safe for marketing activation based on current consent state. Strong submissions also explain the difference between a silver profile base and a gold serving mart.

| Deliverable | Evidence to include |
|---|---|
| Profile design | Key columns in `data/gold/customer_360.csv`, especially segment, consent, value, risk, and support metrics. |
| Quality and freshness review | Rows from `reports/customer360_quality_report.md` with interpretation of pass, warn, and info statuses. |
| Production mapping | A brief mapping from local files to CDC topics, streaming jobs, lakehouse tables, and activation APIs. |
| Improvement proposal | One prioritized enhancement, such as late-event handling, graph-based identity resolution, or a low-latency profile API. |

## Extension path

Readers can extend the case study by adding a real Kafka topic, replacing CSVs with a lakehouse table format, exposing the gold profile through a small FastAPI service, or adding a model that predicts churn from the generated Customer 360 features. The most important constraint is that extensions must preserve the same governance contract: current consent must be carried into activation, customer profiles must remain unique, and validation must fail or warn when serving quality degrades.

## Troubleshooting

If the build script fails with a missing-file error, run `python3 scripts/generate_data.py` first. If validation reports duplicate or missing customer profiles, inspect `data/silver/customer_identity.csv` and verify that every customer has exactly one resolved customer key. If freshness warnings appear after you alter the sample dates, update the deterministic `NOW` timestamp in the scripts or explain why the new sample data intentionally represents stale events.
