TL;DR

Migrations are hard because of the same tradeoffs that make OLAP fast.

• Immutable parts and physical layouts: schema changes rewrite data, not metadata.

• Materialized views and dependencies: read speed creates rigid, cascading relationships.

• Local vs remote drift: shared ownership and upstream systems mean your code’s schema may not match production.

Migrations feel straightforward in transactional systems: you add a column, tweak an index, maybe rebuild a B-tree, and you’re done. DDL is easy to reason about; most changes are metadata updates or narrowly scoped rewrites. Views and joins are evaluated on read, so upstream edits rarely trigger expensive cascades: the database handles that work at query time.

The analytical world is different. The same kind of change that would be mere “decoration” in OLTP becomes a physical rewrite in OLAP:

Changing ORDER BY means physically rewriting every column. In OLTP this isn’t even part of table design (the closest transactional analogue, an index rebuild, is minor by comparison).
Changing an upstream table forces downstream materialized views to recompute, and requires awareness of data lineage so that changes cascade safely through dependent tables and views. In OLAP, these dependencies are an intentional part of schema design, not incidental.
Changing anything can be expensive simply because of scale: terabytes of columnar data, background merges, and heavy computation that happens on write rather than on read.

This post maps the problem space: the database realities that make schema change costly, the architectural factors that amplify that cost, and the developer expectations that make it easy to underestimate.

Later posts will show how MooseStack helps manage those challenges.

Database fundamentals: the engineering consequences of read optimization

OLAP databases earn their speed by optimizing storage for reads: columnar compression, immutable parts, and pre-aggregated tables. Those same design choices make schema change expensive. This section assumes a MergeTree-based engine (the backbone of ClickHouse), where data lives in immutable parts merged asynchronously in the background. The physics of that layout drive every migration cost that follows.

Asynchronous v transactional rewrites

In analytical systems, data is stored as compressed, immutable parts on disk. Once written, those parts aren’t modified in place. So when you run a MODIFY in ClickHouse, it isn’t applied immediately. It schedules a mutation, a background rewrite of those parts. During that process, queries can see a mix of old and new data, and performance may fluctuate as merges compete for resources.

At scale, even simple schema changes share bandwidth with other background jobs: part merges, TTL cleanup, replication, and ingestion. These tasks never stop running, so DDL isn’t just a metadata update, it’s a queued job competing for the same I/O and CPU that keep the cluster fast.

The risk in OLAP systems is operational unpredictability: partial rewrites, mixed states, and long-running background work. Many teams avoid that risk by rebuilding tables from scratch and swapping them in atomically.

In transactional systems, the same change runs inside a single transaction: the database locks the table, applies the change, and commits. Once done, all reads see the new schema simultaneously.
The risk in OLTP systems is downtime.

Physical keys are fast, but rigid

Let’s build a simple example table where the intention is making it fast to retrieve by the date an event was created at.

In Postgres, we’d use this DDL:

This is pretty simple, it creates a table, with the columns above, where we guarantee that the ID is unique. An INDEX, idx_events_created_at, stored separately from the table, is created to point the query engine to rows by that TIMESTAMP. This allows for fast retrieval of individual rows by date.

In ClickHouse, we’d use this:

Here, the ORDER BY key isn’t a side structure: it defines the table’s physical layout on disk, sorted by created_at within each monthly partition. This layout allows ClickHouse to skip vast ranges of data and scan what remains at columnar speed.

The same design that makes reads and aggregations so fast comes with a trade-off: if you want to change your ORDER BY, you have to rewrite the entire table. The order is baked into the data itself, not stored as separate metadata.

What’s a relatively cheap operation in an OLTP database (adding or changing an index) can be stupendously expensive in OLAP.

Materialized views are common (and complex), and introduce dependencies to DDL execution

In OLAP environments, it is commonplace to do much more work “on write” than in OLTP systems. Precomputing aggregations, transformations etc. means that you can retrieve data extremely quickly. When new data hits, materialized views or aggregating tables pre-compute and store derived results (sums, counts, averages, rollups, etc.).

In OLTP systems, the “logic” for these transformations is usually executed at query time (you are usually just retrieving a few rows, and operating on those rows, so it isn’t anticipated that this is too expensive). In OLAP systems, these aggregations are physically stored on disc, and executed at write time.

Here, when a new batch of rows is inserted into the events table, that triggers the computation in this query, which populates the user_daily_stats table, physically. It is also important to note that before you can run this, the target table (user_daily_stats) must already exist, and must have a schema conformant with the query you are running.

So, you can probably see the rub, here. You get super fast performance on daily stats, but this work is done ahead of time. If you change the materialized view, you need to recompute over all the source data.

But there’s a slightly more subtle challenge too. The materialized view relies on the upstream table. If you have a set of materialized views or aggregations, they have dependencies. Naively recreating your database or applying changes to upstream tables can break those dependency chains. These views form an implicit dependency graph, which most DDL tools don’t track explicitly.

Migrations in OLAP need to understand and preserve these dependency graphs, and these dependencies are a defining feature of OLAP schema design.

Why this makes migrations hard

Each of these patterns (immutable parts, physical keys, and on-write materialization) exists to make reads fast, but that same design makes change hard. In transactional systems, migrations are mostly logical; the database hides the physical cost and guarantees atomicity. In analytical systems, migrations are physical by nature; data is rewritten, merged, and recomputed in the background. What seems like a simple schema edit becomes an operational event with real I/O cost and cascading dependencies.

OLAP is part of a complex system, this also makes migrations hard

In transactional systems, the application team typically controls schema change. A migration often lives in code, runs within a transaction, and ships alongside the application. Tools like ORMs make that relationship even tighter, embedding DDL in the same codebase as the business logic. Other systems might write data, but structural changes tend to flow through a single path, tied to deploys and version control rather than ad-hoc edits.

In analytical systems, the warehouse is a shared space. CDC might add or widen columns, an ELT job might create a materialized view, and an analyst might tweak a table used for exploration. None of that is tied to your code deploy.

Your local code can now have a different understanding of reality than production. You generate a migration that looks valid against your local schema, but by the time you apply it, the database has moved on. A CDC job might have added or widened a column, or an analyst might have created a new materialized view referencing the same table. What was a clean diff locally now fails remotely. Or worse, partially succeeds, leaving mismatched parts or broken dependencies. In OLTP, this sort of drift is rare and usually caught at deploy. In OLAP, it’s common, because the schema can change between plan and execution without you ever touching it.

Migration conflicts in local and prod

Let’s say you have a simple users table in production:

You pull that schema locally and make a change:

Your local table now looks like:

But while you were working, a CDC process upstream widened a column in production:

So production now looks like:

When you apply your migration, your plan assumes the remote column is String, not Nullable(String). To ClickHouse itself, this isn’t necessarily fatal. But to any migration planner that checks remote state, it’s drift. The tool will abort to avoid unsafe rewrites, because your plan was built against a world that no longer exists.

That’s what makes OLAP migrations tricky: even small, harmless changes by another actor can invalidate your understanding of the schema.

Wrapping up

OLAP migrations are hard for structural reasons: they optimize for reads, distribute ownership, and operate at massive scale. But the biggest source of pain isn’t just physical. Its conceptual. Your code and your database can drift apart. You write models and generate a migration that looks correct locally, but the live schema may have changed underneath you: a CDC job added a column, an analyst tweaked a view, or a mutation is still rewriting parts. What you think you’re deploying no longer matches what’s actually running.

In the next post, we’ll look at how to keep your code and your database in sync: generating migrations from the live environment, validating drift before deploy, and letting you move fast locally without losing control in production. By grounding local development in the real state of the warehouse, you can trust that local work will apply cleanly in prod. That makes local iteration meaningful again: safe to experiment, refine models, and test expensive schema changes before committing to hours of rewrites or recomputation.

OLAP migration complexity is the cost of fast reads