OLAP on Tap: The Art of Letting Go (of Normalization)
Like the previous few articles, this article will show how the fundamental differences in the implementation and purpose of OLAP databases makes working with the database different from what we’re used to from other paradigms.
Here, OLAP databases like ClickHouse and DuckDB:
- Per the name, are analytical focused, not transactional focused, and want to retrieve large batches of data from huge batches of data, and make calculations on them, rather than retrieve or amend specific rows of data.
- Are columnar, focusing on retrieving many instances of one column and doing calculations against that, rather than retrieving the whole row (even where parts of it aren’t related to the analytics you are doing.
This article will talk specifically about how these two differences (which are really two sides of the same coin) affect data modelling and schema design.
Now you’re just a star schema that I used to know
So, when we are thinking about more traditional databases, we immediately think of third normal form for OLTP, and star schemas for more traditional analytical use-cases.
These were based on reducing storage duplication, ensuring the correctness, immediate correctness that is, and integrity of the data.
A quick refresher on star schemas
- You have a central “fact table” (orders, transactions) that references other tables with dimensions
- You have many “dimensions tables” (users, products) that color the fact tables
- To do almost anything, you need to join your fact and dimension tables (often with many joins)
This made sense because storage was expensive, and for the type of queries you were doing, joins were relatively cheap. You’d grab the few rows you wanted to enrich, and only compute joins against them. So why not save a bunch of money by storing your data more efficiently, makes sense!
This is a pretty great solution to a system where transactions, atomicity, reading individual rows and editing or deleting them is important. E.g. if you have 10k rows for “Joj” as a customer, and he lets you know he prefers to be known as “Johanan”, you edit one line in one table, and all 10k rows are “edited”.
If you want to be able to sum across a few hundred thousand rows though, a star schema would make your query orders of magnitude more expensive.
So, star schemas:
- Low storage costs
- High efficiency for row retrieval and use-cases that require mutability
- Higher compute costs because of required joins, especially for large analytical queries
The OLAP approach
As discussed in yesterday’s session, ClickHouse, DuckDB and many other OLAP databases are columnar. So, they have different design guides, preferring less, wider tables.
Why?
- Storage is cheap
- Compression is highly efficient (especially for low cardinality columns)
- Joins are expensive
- Scans are lightning fast if you do your
ORDER BYright (see yesterday’s article).
There’s a bit more going on underneath the hood, specifically with the way OLAP databases store data, scan data, and compute over data.
- The database engine doesn’t need to look at columns that aren’t part of the query, so making the table wider doesn’t increase the cost of a scan / computation (e.g. if you are looking for customers in Portland, in the past Month, and how much they spent, you’d only look at
date,state, andtotal_amount, ignoring all other columns. The fact that you added acategorycolumn wouldn’t affect the computation speed. - Once the database engine finds the right “section” of the column, it can utilize low level CPU optimization to increase computation efficiency (i.e. SIMD “Single Instruction, Multiple Data”, allows the CPU to process many values at once in each batch—columnar storage allows the engine to process thousands of values per CPU tick using SIMD.).
OLAP schema design heuristics
How should you organize your table?
- Fewer tables, fewer joins
- Precompute relationships
- Wider tables, more duplication
- Flatter tables, less nesting
Negatives:
- This increases storage duplication (but not by as much as you think, because of extremely efficient compression!)
- This increases the cost and difficulty of editing and deleting rows
- Increases ingest complexity (depending on the source format; you must enrich data on ingest, shouldn’t be done at query time)
Positives:
- This drastically reduces the cost and time for retrieving and calculating over data.
In analytical use-cases, this positive completely outweighs the increased cost of data duplication.
Exceptions
You can store lesser used data in separate tables, but if your data is appearing in WHERE, GROUP BY or SELECT, it probably deserves to be part of the denormalized table (depending on the frequency, it might also deserve to be part of ORDER BY).
Nesting
We’ll talk more about JSON and unpredictably nested data in the future, but for now, it is sufficient to note that flattened data is almost always more efficient in OLAP use-cases than nested data. This can increase the complexity of ingestion (you need to flatten the data as you ingest it). But ClickHouse has some tools related to this, if your nested JSON is predictable and typed, ClickHouse can show you the data as nested, and infer/implement sub-columns automatically by using their JSON type.
Conclusion: your schema is your query plan
In OLAP, your schema is your query plan.
Every design choice (what you flatten, how you order, what you precompute) shapes how fast the engine can retrieve and aggregate.
Storage is cheap. Compute isn’t. Design for the queries you’ll run most often, not the data you’ll edit.
Agent docs
🐙 https://github.com/oatsandsugar/olap-agent-ref/blob/main/0-schema-design.md
Appendices
Star schema toy example
At this scale, the storage benefits don’t favor star schemas, but with great scale (and repetition), they become more and more efficient for size
fact_orders
| order_id | order_date_key | customer_key | product_key | quantity | total_amount |
|---|---|---|---|---|---|
| 1 | 20250101 | 10 | 100 | 2 | 40.00 |
| 2 | 20250101 | 11 | 101 | 1 | 15.00 |
| 3 | 20250102 | 10 | 102 | 3 | 90.00 |
dim_customer
| customer_key | customer_name | city | state |
|---|---|---|---|
| 10 | Alice | Portland | OR |
| 11 | Bob | Seattle | WA |
dim_product
| product_key | product_name | category | unit_price |
|---|---|---|---|
| 100 | Coffee Mug | Kitchen | 20.00 |
| 101 | T-Shirt | Apparel | 15.00 |
| 102 | Water Bottle | Fitness | 30.00 |
dim_date
| date_key | date | day_of_week | month | year |
|---|---|---|---|---|
| 20250101 | 2025-01-01 | Wed | January | 2025 |
| 20250102 | 2025-01-02 | Thu | January | 2025 |
Flat schema of the same data
orders_flat
| order_id | date | day_of_week | month | year | customer_name | city | state | product_name | category | unit_price | quantity | total_amount |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2025-01-01 | Wed | January | 2025 | Alice | Portland | OR | Coffee Mug | Kitchen | 20.00 | 2 | 40.00 |
| 2 | 2025-01-01 | Wed | January | 2025 | Bob | Seattle | WA | T-Shirt | Apparel | 15.00 | 1 | 15.00 |
| 3 | 2025-01-02 | Thu | January | 2025 | Alice | Portland | OR | Water Bottle | Fitness | 30.00 | 3 | 90.00 |
Schema goals comparison table
| Aspect | OLTP (e.g. Postgres) | OLAP (e.g. ClickHouse) |
|---|---|---|
| Primary Goal | Transaction integrity, fast row-level updates | Analytical speed, fast columnar scans |
| Workload | Many small reads/writes | Fewer, large aggregations |
| Schema Design | Normalized | Denormalized (Wide, Flat) |
| Data Access | Point lookups (by PK / index) | Range scans, GROUP BY, aggregations |
| Joins | Common, cheap | Expensive, avoid when possible |
| Mutation Pattern | Frequent UPDATE / DELETE | Append-only, periodic rebuilds |
| Optimization Strategy | Indexing, normalization | Clustering, compression, partitioning |
| Duplication | Avoid duplication | Embrace duplication for speed |
| Schema Evolution | Flexible, mutable | Write-once, evolve with ETL |
Interested in learning more?
Sign up for our newsletter — we only send one when we have something actually worth saying.
Related posts
All Blog Posts
OLAP
OLAP migration complexity is the cost of fast reads
In transactional systems, migrations are logical and predictable. In analytical systems, every schema change is a physical event—rewrites, recomputations, and dependency cascades. This post explains why OLAP migrations are inherently complex, the architectural trade-offs behind them, and how to manage that complexity.

OLAP, Product
Ship your data with Moose APIs
You’ve modeled your OLAP data and set up CDC—now it’s time to ship it. Moose makes it effortless to expose your ClickHouse models through typed, validated APIs. Whether you use Moose’s built-in Api class or integrate with Express or FastAPI, you’ll get OpenAPI specs, auth, and runtime validation out of the box.