Opening Scenario: TuranMart Needs Trusted Analytics, Not Just Files¶
At the end of Chapter 6, TuranMart had a disciplined object-storage lake. Raw order events landed in Bronze, validated records moved to Silver, and daily revenue aggregates appeared in Gold. The storage design was a major improvement over scattered CSV exports, but the business still had a familiar problem: every team interpreted revenue differently. Finance wanted booked revenue by fiscal date, marketplace operations wanted seller performance by channel, product managers wanted category trends, and the customer team wanted to measure cohort behavior after address and segment changes.
The platform team realizes that a lake is not the same thing as an analytics product. A lake stores evidence. A data warehouse organizes facts and dimensions so business users can query trusted metrics quickly. A lakehouse adds warehouse-like table management, transactions, schema evolution, and time travel on top of open files in object storage. TuranMart now needs both: a dimensional model for stable business reporting and a lakehouse foundation that can keep open data usable across Spark, SQL engines, notebooks, and BI tools.
Figure 1:TuranMart’s analytics reference architecture connects lake zones, warehouse marts, lakehouse table formats, governance, and BI consumption.
Learning Objectives¶
By the end of this chapter, you should be able to:
Explain the architectural differences between data warehouses, data lakes, and lakehouses.
Describe why cloud warehouses separate storage, compute, and services for elastic analytics.
Design a dimensional model by declaring a business process, grain, dimensions, and additive facts.
Implement the core idea of a Slowly Changing Dimension Type 2 table for historical analysis.
Compare Delta Lake, Apache Iceberg, and Apache Hudi as open lakehouse table formats.
Build a deterministic local warehouse mart from lake-style source files and validate the resulting facts, dimensions, and SCD2 history.
7.1 From Data Lake to Analytical Platform¶
A data lake answers the question, “Where can we land many kinds of analytical data cheaply and durably?” A data warehouse answers a different question: “How do we publish trusted, governed, query-efficient tables for repeatable business analysis?” A lakehouse tries to reduce the operational gap between these two worlds by bringing transaction logs, metadata, schema management, and table-level operations to open files on object storage.
A data warehouse is an analytical system optimized for integrated, governed, high-performance reporting and decision support. A lakehouse is an architecture that combines lake storage with warehouse-style table semantics such as ACID transactions, schema evolution, and time travel.
The distinction matters because a platform can have excellent storage and still produce poor analytics. If source files have ambiguous keys, changing customer attributes overwrite history, and metric definitions live only in scripts, analysts will not trust the platform. Conversely, a carefully designed warehouse or lakehouse can make the same raw data usable by finance, operations, machine learning, and product teams.
| Platform pattern | Primary storage model | Primary modeling style | Main consumers | Typical strength | Typical risk |
|---|---|---|---|---|---|
| Data warehouse | Curated managed tables | Dimensional models, marts, semantic views | BI, finance, operations, executives | Fast trusted analytics with clear governance | Less flexible for raw, semi-structured, or exploratory workloads |
| Data lake | Open files on object storage | Zones, prefixes, file formats, catalogs | Data engineers, data scientists, batch jobs | Cheap landing zone for diverse data and ML exploration | Can become a swamp without quality gates and ownership |
| Lakehouse | Open files plus table metadata | Medallion tables, transactional open formats, marts | BI, notebooks, ML, multiple query engines | One copy of data can serve many analytical engines | Metadata, compaction, access control, and interoperability require discipline |
Cloud warehouses changed the economics of analytics by decoupling infrastructure concerns that were historically tied together. Snowflake describes a cloud architecture in which persistent storage is shared across compute while independent virtual warehouses process queries without sharing compute resources with each other.[1] BigQuery similarly presents a fully managed serverless data platform whose storage and compute layers operate independently, with columnar storage optimized for analytical queries.[2] Amazon Redshift Serverless automatically provisions and scales warehouse capacity for variable workloads, while Alibaba Cloud MaxCompute provides a serverless enterprise data warehouse for large-scale analysis.[3] [4]
These services differ in implementation, pricing, and ecosystem, but they share a common direction. Analytical platforms increasingly separate data, compute, metadata, and governance so each can scale and evolve independently. That principle will reappear throughout the chapter.
7.2 Modern Cloud Data Warehouses¶
A traditional on-premises data warehouse was often a specialized appliance: expensive, powerful, and governed by capacity planning cycles. Modern cloud warehouses replaced much of that rigidity with managed services, elastic compute, columnar storage, and workload isolation. The result is not merely “the same warehouse on someone else’s servers.” It is a different operating model for analytics engineering.
The first innovation is separation of storage and compute. Storage can grow as data grows, while compute clusters or serverless slots scale according to workload. This matters for TuranMart because month-end finance reporting, campaign-performance dashboards, and ad hoc fraud investigations do not all need the same compute resources at the same time. The second innovation is columnar analytical storage. Warehouse queries usually scan a few columns across many rows, so column-oriented layouts and compression reduce I/O compared with row-oriented transaction storage. The third innovation is massively parallel processing. Large scans, joins, and aggregations are split across many workers, allowing interactive analytics over datasets that would overwhelm a single server.
| Capability | Why it matters in production | Example design implication |
|---|---|---|
| Managed infrastructure | Engineers spend less time patching clusters and more time modeling data products. | Platform teams define cost, security, and workload policies instead of managing disks and daemons. |
| Independent compute | Teams can isolate BI, ELT, and data science workloads. | Finance dashboards use a stable warehouse; backfills use a temporary larger warehouse or batch pool. |
| Columnar storage | Analytical scans read only the columns needed for a query. | Wide fact tables can remain queryable when partitioning, clustering, and statistics are maintained. |
| Workload governance | Access, cost, and concurrency can be managed by team, role, or workload. | Development, production, and sensitive-data marts receive separate permissions and quotas. |
| SQL ecosystem | BI tools, analysts, and governance systems can use familiar interfaces. | A dimensional model can be exposed through SQL views and semantic layers. |
For TuranMart, the platform decision is not “warehouse or lake.” The lake remains the system of analytical evidence and replay. The warehouse or warehouse-style mart is the product interface for business questions. In many organizations, this interface is built in Snowflake, BigQuery, Redshift, MaxCompute, Hologres, or another managed warehouse. In a lakehouse design, the same logical interface may also be built on open table formats and queried by engines such as Spark, Trino, Flink, or managed cloud services.
Alibaba Cloud’s analytics stack illustrates the hybrid direction. MaxCompute is positioned as a serverless enterprise cloud data warehouse, while Hologres provides real-time warehousing, PostgreSQL compatibility, MPP execution, federated query, and acceleration paths for MaxCompute and lake data.[4] [5] The architectural lesson is broader than any single vendor: modern analytics platforms combine batch storage, real-time serving, metadata management, and SQL access rather than forcing all workloads into one engine.
7.3 Dimensional Modeling: Turning Events into Business Questions¶
Dimensional modeling is the practical craft of making analytical data understandable. It organizes measurements into fact tables and descriptive context into dimension tables. Kimball Group guidance emphasizes that fact tables are the foundation of the warehouse and that the first and most important design step is declaring the grain: the business definition of what one fact row represents.[6]
A dimensional model begins with the business process, not with the source system. TuranMart’s source application may store orders, payments, inventory reservations, refunds, and shipment updates in operational schemas optimized for transactions. The analytical model should instead ask: what event are we measuring, at what level of detail, and which dimensions should describe it?
Figure 2:A TuranMart retail star schema places order-line revenue facts at the center and surrounds them with date, product, customer, channel, and region dimensions.
The four-step design process is simple, but each step must be explicit.
| Step | Design question | TuranMart answer |
|---|---|---|
| Choose the business process | What activity are we measuring? | Marketplace order-line sales. |
| Declare the grain | What does one row in the fact table mean? | One row per order line after validation and currency normalization. |
| Identify dimensions | What context describes the event? | Date, product, customer, sales channel, region, and seller. |
| Identify facts | What numeric measurements are additive at the declared grain? | Quantity, gross amount, discount amount, net revenue, tax amount, and cost amount. |
The grain decision protects the entire model. If the fact table mixes one row per order, one row per order line, and one row per shipment, metrics become ambiguous. If revenue is stored at order level but product category is attached at line level, category revenue queries will double-count or misallocate values. If customer attributes are overwritten without history, past cohort reports will change when a customer moves to a different region.
A strong warehouse model distinguishes additive, semi-additive, and non-additive facts. Revenue and quantity are usually additive across dates, products, and customers. Inventory balance is semi-additive because it can be summed across products or stores at a point in time but not across dates without changing the meaning. Ratios such as conversion rate and average order value are non-additive; they should usually be computed from numerator and denominator facts rather than stored as standalone facts.
| Fact type | Example | Safe aggregation | Common mistake |
|---|---|---|---|
| Additive | net_revenue, quantity | Sum across most dimensions. | Storing duplicate facts at multiple grains. |
| Semi-additive | inventory_on_hand, account_balance | Sum across entities, but not across time periods. | Summing daily balances to report monthly inventory. |
| Non-additive | margin_percent, conversion_rate | Recompute from additive components. | Averaging percentages without weighting. |
7.4 Slowly Changing Dimensions and Historical Truth¶
Operational systems usually store the current state of an entity. Analytical systems often need to know what was true at the time an event happened. If a customer moves from Samarkand to Tashkent, should last month’s revenue move with the customer? Finance may want revenue by the customer region at purchase time. Marketing may want the current region for campaign targeting. Both are valid questions, but they require explicit modeling.
A Slowly Changing Dimension Type 2 design preserves history by inserting a new dimension row when tracked attributes change. The old row is closed with an end date, and the new row becomes current. The fact table references the surrogate key that was valid when the event occurred. dbt documentation describes snapshots as a mechanism for recording changes in mutable source tables over time and explicitly connects this pattern to Type 2 Slowly Changing Dimensions.[7]
Figure 3:SCD Type 2 keeps historical customer versions by closing the previous row and inserting a new current row when tracked attributes change.
| SCD type | Behavior | When to use | Main limitation |
|---|---|---|---|
| Type 0 | Preserve original value and never change it. | Immutable attributes such as original signup date. | Cannot represent corrected or evolving attributes. |
| Type 1 | Overwrite the old value. | Error correction or attributes where history is irrelevant. | Historical reports may change after updates. |
| Type 2 | Add a new row with validity dates and a current flag. | Customer segment, region, loyalty tier, seller status, product category. | More joins, more rows, and careful effective-date logic. |
| Type 3 | Store limited previous value in another column. | Simple “current versus previous” analysis. | Tracks only a small fixed history window. |
The SCD2 pattern introduces two important keys. The business key identifies the real-world entity, such as customer_id. The surrogate key identifies a specific historical version, such as customer_key. Facts should join to the surrogate key, not only to the business key, when historical reporting matters.
For TuranMart, the customer dimension will track customer_name, region, and loyalty_tier. If customer C-1002 moves from Samarkand to Tashkent on 2026-05-30, the warehouse inserts a new customer dimension version effective on that date. Orders from 2026-05-29 still point to the Samarkand version. Orders from 2026-05-30 onward point to the Tashkent version. This design is slightly more complex than overwriting a customer row, but it prevents analytical time travel from becoming fiction.
7.5 Lakehouse Architecture: Warehouse Semantics on Open Data¶
A two-tier lake-and-warehouse architecture often copies data from object storage into a warehouse for BI. This can work well, especially when the warehouse provides governance, performance, and a familiar SQL experience. It also introduces duplication, latency, and reconciliation work. The lakehouse pattern tries to reduce those costs by adding warehouse-grade table semantics directly to open files in the lake.
Figure 4:Traditional warehouse and lakehouse designs differ in where table metadata, transactions, and query engines operate, but both need governance and business modeling.
The critical addition is a transactional metadata layer. Plain Parquet files in object storage do not automatically provide atomic commits, schema evolution, row-level updates, or time travel. A lakehouse table format records which files belong to a table snapshot, which schema is valid, which partitions exist, and how readers should find consistent data. Query engines can then treat open files more like reliable database tables.
Lakehouse architecture does not remove the need for modeling. A poorly named Iceberg or Delta table is still a poor data product. A Hudi table with unclear ownership can still become a swamp. The lakehouse gives engineers table mechanics; dimensional design, governance, testing, and metric definitions still turn those mechanics into business value.
| Lakehouse capability | Problem solved | Engineering responsibility |
|---|---|---|
| ACID-style commits | Readers should not see partial writes during pipeline updates. | Use table-format writers correctly and avoid unsafe direct file modification. |
| Schema enforcement and evolution | Producers change fields over time. | Define compatible changes, review breaking changes, and test downstream consumers. |
| Time travel and snapshots | Analysts need reproducibility and rollback. | Retain metadata and files according to audit, cost, and compliance policy. |
| Compaction and clustering | Small files and poor layout slow queries. | Schedule maintenance and monitor file sizes, partitions, and query plans. |
| Multi-engine access | Different teams prefer different tools. | Standardize catalogs, permissions, version compatibility, and supported engines. |
7.6 Delta Lake, Apache Iceberg, and Apache Hudi¶
Delta Lake, Apache Iceberg, and Apache Hudi are the three table-format families most engineers encounter when building lakehouses. They overlap in purpose but differ in design center. Delta Lake documentation highlights batch and streaming reads and writes, updates, deletes, merges, change data feed, constraints, concurrency control, table optimization, and connectors across several engines.[8] Apache Iceberg defines itself as an open table format for huge analytic datasets and emphasizes schema evolution, hidden partitioning, time travel, rollback, metadata pruning, serializable isolation, and optimistic concurrency.[9] Apache Hudi positions itself as an open data lakehouse platform purpose-built for high-performance writes on incremental data pipelines, with transactions, upserts, deletes, indexes, compaction, clustering, and incremental processing.[10]
Figure 5:Lakehouse table formats maintain metadata that maps table snapshots to schemas, manifests, transaction logs, and data files on object storage.
| Format | Design center | Strong fit | Watch carefully |
|---|---|---|---|
| Delta Lake | Transaction log and operational simplicity, especially in Spark-centered environments. | Teams using Spark heavily, Databricks ecosystems, merge-heavy ELT, time travel, and change data feed. | Cross-engine behavior depends on connector maturity and table feature compatibility. |
| Apache Iceberg | Open table specification with rich metadata, hidden partitioning, and multi-engine interoperability. | Multi-engine lakehouses using Spark, Flink, Trino, Presto, Hive, Impala, or managed services. | Catalog choice, maintenance procedures, and write conflict patterns must be operationalized. |
| Apache Hudi | Incremental pipelines, upserts, deletes, indexes, and streaming-friendly lake ingestion. | Near-real-time ingestion, CDC pipelines, record-level updates, and incremental consumption. | Table type, compaction schedule, indexing strategy, and query mode affect complexity. |
The right choice depends on workload patterns rather than slogans. A Spark-first team that wants straightforward merges may start with Delta. A platform team standardizing tables across many engines may prefer Iceberg. A team ingesting frequent CDC updates into large lake tables may choose Hudi. Large organizations may use more than one format, but every additional format increases support, governance, and training work.
For a first production lakehouse, TuranMart should select one default table format, one catalog strategy, one maintenance policy, and one set of supported engines. The team should document which workloads require merges, which require append-only history, which require streaming ingestion, and which can remain ordinary Parquet files. Not every dataset needs the complexity of a transactional table format.
7.7 Design Pattern: TuranMart’s Analytics Mart and Lakehouse Path¶
TuranMart decides to publish a sales mart for trusted reporting while preparing its lake for future table-format adoption. The design has three layers. First, Bronze and Silver remain in object storage as described in Chapter 6. Second, a warehouse mart creates dim_date, dim_product, dim_customer, and fact_sales tables from curated source files. Third, a lakehouse path gradually promotes selected high-change Silver datasets into transactional tables when schema evolution, upserts, or multi-engine reads justify the operational cost.
Figure 6:The guided lab creates a small dimensional mart with SCD2 customer history and a fact table at order-line grain.
| Data product | Grain | Key users | Quality gate | Future lakehouse trigger |
|---|---|---|---|---|
dim_date | One row per calendar date | All analytical marts | Complete calendar attributes for fact dates. | Usually remains a simple managed dimension. |
dim_product | One row per current product | Category managers and finance | Valid product category and list price. | Add SCD2 if product category history becomes report-critical. |
dim_customer | One row per customer version | Marketing, CRM, finance | Non-overlapping validity windows per customer. | Promote if CDC updates arrive continuously. |
fact_sales | One row per validated order line | BI, finance, operations | Every row joins to valid date, product, and customer keys. | Promote if late-arriving facts and merges become frequent. |
This design teaches a durable principle: use the simplest reliable table mechanism that satisfies the business requirement. A small static dimension may not need a lakehouse table format. A high-volume CDC customer profile table may need one early. A revenue fact table may start as a warehouse table and later become an Iceberg, Delta, or Hudi table if multiple engines need consistent access to the same open data.
7.8 Guided Lab: Build a Dimensional Sales Mart with SCD2 History¶
7.8.1 Lab Goal¶
In this lab, you will build a local TuranMart analytical mart from deterministic source files. The starter pipeline reads order lines, product records, and customer snapshots; creates warehouse-style dimensions; applies SCD Type 2 logic to customer history; writes fact rows at order-line grain; and validates the final metrics. The lab uses DuckDB so readers can practice SQL analytics without provisioning a cloud warehouse.
7.8.2 Lab Materials¶
| Lab material | Purpose | Link |
|---|---|---|
| Lab README | Step-by-step instructions, model explanation, cleanup, and troubleshooting. | README |
| Requirements | Python packages for DuckDB, Pandas, and deterministic validation. | requirements.txt |
| Starter pipeline | Builds dimensions, SCD2 customer history, fact table, and summary exports. | starter.py |
| Validator | Checks row counts, surrogate keys, SCD2 windows, revenue totals, and report outputs. | tests |
| Source data | Order lines, product records, and customer snapshots. | data |
| Expected output | Reference metrics and table manifest. | expected_output |
| Solution guide | Explanation of design choices and expected results. | solution |
7.8.3 Quick Start¶
From the repository root, run the lab locally.
cd shared/labs/ch07_data_warehouse_lakehouse
python3 -m pip install -r requirements.txt
python3 starter.py --warehouse-root .warehouse/turanmart
python3 tests/validate_lab.py --warehouse-root .warehouse/turanmartThe starter writes a DuckDB database file and CSV exports so you can inspect both the relational tables and the published mart outputs. The design is intentionally small, but the same modeling steps apply to cloud warehouses and lakehouse tables.
7.8.4 Expected Output¶
A successful run prints a compact validation summary. The exact file paths can differ by operating system, but the table counts and revenue totals should match.
DIM_DATE rows: 3
DIM_PRODUCT rows: 4
DIM_CUSTOMER rows: 5
FACT_SALES rows: 6
CUSTOMER C-1002 versions: 2
TOTAL net_revenue: 575.50
VALIDATION PASSED| order_date | order_count | quantity | net_revenue |
|---|---|---|---|
| 2026-05-28 | 2 | 3 | 203.50 |
| 2026-05-29 | 2 | 2 | 192.00 |
| 2026-05-30 | 2 | 5 | 180.00 |
7.8.5 Completion Checklist¶
You have completed the lab when the validator prints VALIDATION PASSED, the fact_sales table contains six order-line rows, customer C-1002 has two non-overlapping SCD2 versions, the 2026-05-29 sale joins to the Samarkand customer version, and the 2026-05-30 sale joins to the Tashkent customer version. You should also be able to explain why the fact table uses surrogate keys rather than only business keys.
7.8.6 Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError: duckdb | Requirements were not installed in the active Python environment. | Run python3 -m pip install -r requirements.txt from the lab folder. |
| Validator reports wrong revenue | The output database is stale or source data was edited. | Delete .warehouse/ and rerun starter.py before validating. |
| Customer version count is wrong | SCD2 logic overwrote customer rows instead of inserting new versions. | Compare validity windows for C-1002 against the expected output. |
| Fact rows fail to join to dimensions | The fact loader joined by business key only or ignored effective dates. | Join facts to dimensions using the event date and the dimension validity range. |
| Daily report has duplicate revenue | The fact table grain was changed or customer history joined many-to-many. | Enforce one row per order line and one valid customer version per order date. |
7.8.7 Cleanup¶
rm -rf .warehouseCommon Pitfalls¶
Warehouses and lakehouses fail when teams focus on engines before semantics. A fast query engine cannot repair an ambiguous grain. A table format cannot define revenue. A transaction log cannot decide whether customer region should be historical or current. The engineering work is both technical and semantic.
| Pitfall | Why it hurts | Better practice |
|---|---|---|
| Starting with source tables instead of business processes | Operational schemas leak implementation details into analytics. | Model facts around measurable business events and declared grain. |
| Mixing grains in one fact table | Aggregations double-count or produce inconsistent metrics. | Keep one fact table at one grain and create separate facts for different processes. |
| Overwriting dimensions that need history | Past reports change when current attributes change. | Use SCD2 for attributes that affect historical analysis. |
| Publishing marts without data tests | Broken joins and duplicate keys silently corrupt dashboards. | Validate primary keys, foreign keys, row counts, freshness, and metric totals. |
| Choosing a lakehouse format only by popularity | Operational needs may not match the table format’s strengths. | Select based on engines, update patterns, CDC needs, governance, and team skill. |
| Ignoring table maintenance | Small files, stale metadata, and poor clustering degrade performance. | Schedule compaction, statistics, vacuum/retention review, and metadata monitoring. |
Mini-Capstone: Analytics Architecture Decision Record for TuranMart¶
Write a two-page architecture decision record for TuranMart’s first production analytics mart. Your record should define the business process, fact grain, dimensions, SCD policy, warehouse or lakehouse serving engine, table ownership, quality tests, access model, and future migration trigger for open table formats. The decision should be concrete enough that another engineer could build the first mart and know which trade-offs were intentional.
| Decision area | Minimum answer required |
|---|---|
| Business process and grain | The measured event and one-row meaning for the primary fact table. |
| Dimension strategy | Required dimensions, surrogate keys, conformed dimensions, and SCD policy. |
| Serving architecture | Warehouse, lakehouse, or hybrid design with reasoning. |
| Table format policy | Which datasets remain simple files, which become managed warehouse tables, and which may become Delta, Iceberg, or Hudi tables. |
| Quality and governance | Required tests, owners, lineage, access controls, retention, and cost controls. |
| Migration path | Conditions that would justify changing engines or adopting a lakehouse table format. |
Exercises¶
| Difficulty | Exercise | Expected result |
|---|---|---|
| Easy | Add a new order line for an existing customer and product. | fact_sales grows by one row, and the daily revenue report changes predictably. |
| Medium | Add a customer loyalty-tier change on 2026-05-31. | dim_customer receives a new SCD2 version with a closed previous row. |
| Medium | Add a dim_channel table and replace channel text in the fact table with a surrogate key. | The fact table remains at order-line grain and joins to a channel dimension. |
| Challenge | Add a late-arriving order from 2026-05-29 after a customer attribute change. | The fact joins to the dimension version valid on the event date, not the load date. |
| Team | Compare Delta Lake, Iceberg, and Hudi for TuranMart’s customer snapshot table. | The team can justify a table-format choice using update frequency, engines, CDC, governance, and maintenance requirements. |
Review Questions¶
| Question | What a strong answer should include |
|---|---|
| Why is a data warehouse still useful after a data lake exists? | Warehouses publish trusted, modeled, governed, query-efficient tables for repeatable business analytics. |
| What does it mean to declare the grain of a fact table? | It defines what one row represents as a business measurement event and constrains valid facts and dimensions. |
| Why are surrogate keys important in SCD2 dimensions? | They identify a specific historical version of an entity, while the business key identifies the real-world entity. |
| How does a lakehouse table format improve plain Parquet files? | It adds metadata, consistent snapshots, schema management, transaction semantics, and maintenance operations. |
| When might Iceberg be a better fit than Delta or Hudi? | When multi-engine interoperability, hidden partitioning, and open table specification governance are central requirements. |
| When might Hudi be a better fit? | When frequent upserts, deletes, CDC ingestion, and incremental processing dominate the workload. |
| What should TuranMart validate before publishing a sales mart? | Primary keys, foreign keys, SCD2 windows, row counts, revenue totals, freshness, and metric definitions. |
Summary¶
A data lake stores analytical evidence, but a warehouse or lakehouse turns that evidence into trusted data products. Modern cloud warehouses use managed infrastructure, independent compute, columnar storage, MPP execution, and SQL ecosystems to serve business analytics at scale. Dimensional modeling remains essential because it defines the business process, grain, dimensions, facts, and historical behavior that make metrics trustworthy.
Lakehouse table formats extend the lake with table metadata, consistent snapshots, schema evolution, time travel, and update semantics. Delta Lake, Apache Iceberg, and Apache Hudi all pursue this goal, but they optimize for different ecosystems and workload patterns. The correct architectural decision depends on TuranMart’s engines, update patterns, governance maturity, and operating discipline.
In the next chapter, the book moves from storage and analytical table design into processing frameworks. Apache Spark and Apache Flink will show how engineers transform, enrich, aggregate, and stream the data products that warehouses and lakehouses depend on.