# Chapter 15 Lab: Cost, Performance, and Scalability Benchmark

This lab helps you compare data-layout and query-design choices using a deterministic synthetic workload. The objective is not to produce a universal benchmark score. The objective is to practice the engineering habit of measuring the same workload before and after an optimization, recording the trade-offs, and deciding whether the change improves the product-level service objective.

## What you will build

You will generate a synthetic TuranMart event dataset, load it into SQLite, run repeatable aggregation queries, and compare a baseline table with an indexed variant. If DuckDB, Spark, or a cloud warehouse is available in your environment, you can reuse the same dataset and benchmark protocol as an extension.

## Files

| File | Purpose |
|---|---|
| `benchmark_runner.py` | Starter script that generates data and measures baseline versus optimized SQLite queries. |
| `../../solutions/ch15_cost_performance_scalability/optimization_checklist.md` | Completed checklist template for interpreting benchmark results. |

## Run the starter benchmark

From the repository root, run the following command.

```bash
python shared/labs/ch15_cost_performance_scalability/benchmark_runner.py --rows 100000
```

The script writes generated data and timing results to `shared/labs/ch15_cost_performance_scalability/output/`. Re-run the benchmark with different row counts and record the median runtime, p95 runtime, database size, and rows processed per second.

## Extension tasks

Run the same workload with a larger row count, add a second index, and explain whether the index improves the query enough to justify its write and storage cost. Then port the same dataset to Parquet and evaluate it with DuckDB or Spark if those tools are available in your local environment.