# Chapter 9 Solution Guide: Streaming Clickstream Windows

The reference solution validates that a 60-second tumbling window with 20 seconds of allowed lateness accepts fifteen of the sixteen deterministic events. One event is deliberately too late: `evt-0009`, whose event time belongs to the 09:00 window but arrives after the watermark has advanced beyond it.

| Window | Event Count | Unique Users | Purchases | Revenue | Interpretation |
|---|---:|---:|---:|---:|---|
| `2026-06-02T09:00:00Z` | 4 | 3 | 0 | `0.00` | The first minute captures page and cart activity. The delayed campaign event is excluded with the 20-second lateness policy. |
| `2026-06-02T09:01:00Z` | 5 | 5 | 2 | `125.00` | The second minute contains two purchases and one delayed but still acceptable page view. |
| `2026-06-02T09:02:00Z` | 4 | 4 | 2 | `188.50` | The third minute contains cart and purchase activity. |
| `2026-06-02T09:03:00Z` | 2 | 1 | 0 | `0.00` | The final minute has browsing and cart intent from one user. |

The command below should pass exactly.

```bash
python3 shared/labs/ch09_streaming_clickstream/streaming_window_simulator.py \
  --input shared/labs/ch09_streaming_clickstream/data/clickstream_events.jsonl \
  --window-seconds 60 \
  --allowed-lateness-seconds 20 \
  --output /tmp/ch09_window_metrics.csv \
  --late-output /tmp/ch09_late_events.csv

python3 shared/labs/ch09_streaming_clickstream/validate_outputs.py \
  --actual /tmp/ch09_window_metrics.csv \
  --expected shared/labs/ch09_streaming_clickstream/expected_output/lateness_20_metrics.csv
```

With zero allowed lateness, several out-of-order events are classified as late. This is not a simulator bug; it is the intended lesson. In a production Flink or Kafka Streams job, a tighter watermark reduces correction delay but increases the probability that valid delayed events will be excluded from primary metrics. The design review should therefore document both the technical setting and the business meaning of late corrections.

## Replay runbook answer

A safe replay should read the immutable raw input, deploy the revised windowing logic to a new output namespace such as `clickstream_metrics_v2`, compare row counts and revenue totals with the previous version, and only then switch dashboards. The team should avoid overwriting the old output in place because stakeholders need a clear audit trail for why real-time numbers changed.
