# Chapter 2 Exercises: Data Models, Formats, and Quality

These exercises extend the Chapter 2 guided lab. Work from the repository root after running `shared/notebooks/ch02_formats_quality_lab.ipynb` at least once.

| Level | Exercise | Deliverable |
|---|---|---|
| Basic | Add a `returned` order status to `orders.csv` and update the notebook validation rule so it passes only when the new status is intentionally allowed. | A short note explaining the business meaning of `returned` and the updated quality report. |
| Basic | Add one duplicate `order_id` row and confirm the uniqueness check fails. | A screenshot or copied notebook output showing the failing check. |
| Intermediate | Extract `campaign_id` and `campaign_channel` from events and design a `dim_campaign` table. | A table definition and one DuckDB query that groups revenue by campaign channel. |
| Intermediate | Create a small benchmark that compares CSV and Parquet file sizes after generating 10,000 synthetic order rows. | A Markdown table with file size observations and one paragraph interpreting the result. |
| Advanced | Design a slowly changing `dim_customer` table that preserves customer segment history. | SQL DDL with surrogate key, natural key, effective dates, and current flag. |

## Submission Guidance

Your answer should separate **modeling choices**, **format choices**, and **quality checks**. A strong solution explains why each decision helps downstream users trust and query the data, not merely that the code runs.
