# Chapter 21 Extension Exercises — Real-Time Customer 360

These exercises extend the Chapter 21 capstone project after the base pipeline has successfully generated `data/gold/customer_360.csv` and `reports/customer360_quality_report.md`. Each exercise reinforces a production design choice from the chapter while remaining small enough to complete locally.

## Exercise 1 — Add a late-arriving order scenario

Modify `data/raw/orders.csv` or `scripts/generate_data.py` so one customer receives a paid order whose `event_time` is older than the latest clickstream event but whose ingestion should still update revenue and recency metrics. Then update `scripts/build_customer360.py` to document whether the pipeline is processing by event time, ingestion time, or a deterministic replay snapshot.

A strong answer explains how the production implementation would use watermarks, event-time windows, and replayable input topics to avoid losing legitimate late events.

## Exercise 2 — Add a consent-aware activation export

Create a new output file named `data/gold/activation_candidates.csv`. It should include only customers whose `marketing_consent` and `personalization_consent` are both `true`, and it should include a short activation reason based on `segment`, `churn_risk_score`, and `retention_priority`.

A strong answer proves that the export excludes customers who have withdrawn consent, even when those customers have high commercial value.

## Exercise 3 — Strengthen identity resolution evidence

Add one additional identifier to the generated customer data, such as `device_hash` or `crm_contact_id`. Update `customer_identity.csv` so the identity rule records which identifier resolved the profile. Then describe how the same logic would become a graph or entity-resolution service in a large enterprise.

A strong answer distinguishes between deterministic matching, probabilistic matching, and manual stewardship queues.

## Exercise 4 — Add a serving contract check

Extend `scripts/validate_outputs.py` with a check that verifies required serving columns are present in `data/gold/customer_360.csv`. The minimum required columns are `customer_id`, `marketing_consent`, `personalization_consent`, `customer_value_score`, `churn_risk_score`, `segment`, and `retention_priority`.

A strong answer fails validation when a required column is missing and explains why serving contracts prevent accidental breaking changes for downstream applications.

## Exercise 5 — Design a production migration plan

Write a one-page plan that migrates the local CSV project to a production stack. Your plan should specify source capture, event transport, stream processing, lakehouse tables, low-latency serving, monitoring, and rollback. It should also define one freshness SLO and one correctness SLO.

A strong answer maps every local artifact to a production component and avoids vague statements such as “use real-time data” without specifying where state is stored and how quality is measured.
