In an OLAP (analytical) database, there are no updates, just merges. Every transaction is just writing a new row.
This is fundamental to the performance of OLAP databases, and very different from the update paradigm in OLTP (transactional) databases. So, you have to make certain design decisions related to this, especially when you are trying to capture data that includes updates, like CDC (change data capture) from your OLTP database. In ClickHouse, the key design choice is the table engine (plus your sorting key and partitions). In other OLAP systems you’ll tune partitioning, clustering, and dedup/compaction instead of picking an engine.
This article will show you the effect of which ClickHouse table engine you decide to use, in this case, when you are ingesting CDC data. You can capture changes by reading your OLTP write ahead log (WAL) (e.g., Postgres WAL via Debezium) and streaming those events into ClickHouse using Redpanda. The table engine you pick determines how inserts/updates/deletes are reconciled into a queryable, up‑to‑date view without constantly doing deduplication at query time.
Recommendation:
When ingesting CDC into ClickHouse, model updates and deletes explicitly and pick a table engine that matches your CDC semantics. MooseStack can handle this logic for you automatically.
We receive data in a form something like the below:
op | id | name | balance | before.balance | source.lsn (might be in a hex pair)* | ts_ms | notes | |
---|---|---|---|---|---|---|---|---|
c | 1 | Alice | alice@example.com | 100.00 | null | 100001000 | 2025-10-14T14:00:01Z | initial insert |
u | 1 | Alice | alice@example.com | 125.00 | 100.00 | 100001500 | 2025-10-14T14:00:10Z | balance update |
u | 1 | Alice | alice@example.com | 150.00 | 125.00 | 100002000 | 2025-10-14T14:00:20Z | another update |
d | 1 | null | null | null | 150.00 | 100002500 | 2025-10-14T14:00:30Z | delete event |
*LSN is Postgres’ Log Sequence Number: a strictly increasing WAL position. Perfect as a monotonic ver
for ClickHouse engines to order updates.
If you want to run a query to sum the balance, you can very easily foot-gun:
This returns 375 (even though the final balance is 0 after the delete) because you summed every historical value.
So, how can we turn the above into a set of data that is queryable?
First thing, we use a streaming function to tidy up the data a little (the final form of this tidying may depend on the table engine chosen, as you’ll see below), giving us:
id | name | balance | ver (from lsn) | ts_ms | is_deleted | |
---|---|---|---|---|---|---|
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:01Z | 0 |
1 | Alice | alice@example.com | 125.00 | 100001500 | 2025-10-14T14:00:10Z | 0 |
1 | Alice | alice@example.com | 150.00 | 100002000 | 2025-10-14T14:00:20Z | 0 |
1 | Alice | alice@example.com | 150.00 | 100002500 | 2025-10-14T14:00:30Z | 1 |
All we are doing is grabbing the relevant columns, creating the ver
column, and explicitly calling out row deletes.
Then, we’ll write them to ClickHouse.
The behaviour of a ClickHouse table is defined by the engine used. We’ll compare the following engines and their behaviour for writing the above CDC in (and how you’d query from them):
MergeTree is ClickHouse’s default table engine, and it provides a great balance of fast inserts and optimized queries. However, it does not update rows automatically.
ver
or latest ts_ms
per id
.ALTER DELETE
query, or you need to add logic around the is_deleted
column in your query.Pros | Cons |
---|---|
Fastest writes | Slow reads if you want to deduplicate your data |
Easy to make mistakes in your queries or subqueries | |
Manual deletes if you need real deletes |
Any OLAP table created with Moose OLAP by default uses MergeTree. Just create your OlapTable using defaults:
ReplacingMergeTree is becoming the community standard for use in CDC. Essentially, it extends the MergeTree engine: duplicate rows are cleaned up on merge. It always keeps the newest version of a row, discarding the older duplicates during background merges.
ReplacingMergeTree deduplicates by the ORDER BY
(sorting) key. Among rows with the same sorting key, it keeps the highest ver
(or the last inserted if no ver is configured). If you also pass an is_deleted
column, a delete is represented as a tombstone (latest version with is_deleted=1
). Queries then filter WHERE is_deleted=0
for active rows; physical cleanup of tombstones requires OPTIMIZE … FINAL CLEANUP
with the appropriate setting enabled.
SELECT … FINAL
to deduplicate on read (expensive), or OPTIMIZE TABLE … FINAL
to trigger a background merge.OPTIMIZE … FINAL CLEANUP
, which gets rid of these tombstone rows.Safe query:
Pros | Cons |
---|---|
Fast writes | Stale versions are visible and queryable until a merge occurs |
Very easy implementation | Need to be aware of tombstone approach / have appropriate fields in your ingested data |
There are edge cases where you need to understand how your data is partitioned to ensure “finality” of your data |
To set up an OlapTable with ReplacingMergeTree, we use ClickHouseEngines.ReplacingMergeTree
:
This requires an ORDER BY to be specified, and has optional ver
and isDeleted
parameters for controlling deduplication with more granularity.
CollapsingMergeTree uses a Sign
column (±1) to cancel rows during merges (in what feels a little bit like double entry accounting).
On merge, pairs of positive and negative rows are removed.
This is easier explained with a worked example (I’ll use the same data as above).
First transaction (create) creates the row below:
id | name | balance | ver (from lsn) | ts_ms | sign | |
---|---|---|---|---|---|---|
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:01Z | 1 |
Second transaction (update) creates the second and third rows below:
id | name | balance | ver (from lsn) | ts_ms | sign | |
---|---|---|---|---|---|---|
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:01Z | 1 |
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:10Z | -1 |
1 | Alice | alice@example.com | 125.00 | 100001500 | 2025-10-14T14:00:10Z | 1 |
At this moment, if you sum the balance, taking into account the sign, you get the correct balance of 125 (the first and second row cancel).
Fourth transaction (delete) creates the fifth and sixth rows below:
id | name | balance | ver (from lsn) | ts_ms | sign | |
---|---|---|---|---|---|---|
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:01Z | 1 |
1 | Alice | alice@example.com | 100.00 | 100001000 | 2025-10-14T14:00:10Z | -1 |
1 | Alice | alice@example.com | 125.00 | 100001500 | 2025-10-14T14:00:10Z | 1 |
1 | Alice | alice@example.com | 125.00 | 100001500 | 2025-10-14T14:00:20Z | -1 |
1 | Alice | alice@example.com | 150.00 | 100002000 | 2025-10-14T14:00:20Z | 1 |
1 | Alice | alice@example.com | 150.00 | 100002500 | 2025-10-14T14:00:30Z | -1 |
For aggregates, computing with sign
(e.g., SUM(balance * sign)
) yields the correct result immediately. For row-level views, you’ll still need FINAL
(or a GROUP BY
that encodes the collapsing) until background merges complete.
Safe query:
Pros | Cons |
---|---|
Explicit deletes | Complex to use / query |
Immediately correct for certain queries like SUM(value*Sign) | Only immediately correct for certain queries, not arbitrary SELECT * s |
Shifts complexity onto CDC / data pipeline / preparation layer (you need to know / query for the previous version to be able to write the negative row) | |
“Immediate correctness” requires you don’t have bad luck with partitions | |
Engine doesn’t track global version, which can lead to inconsistency |
This engine extends the CollapsingMergeTree engine to solve that final con on the list, adding a version column to the collapsing algorithm to allow the engine to track global version.
Given the flexibility of ReplacingMergeTree, we haven’t seen demand for support for CollapsingMergeTree or VersionedCollapsingMergeTree. Interested in having it supported? Let us know.
For almost every CDC use-case, ReplacingMergeTree will be the perfect balance of fast write speed, low write complexity and correctness (albeit eventual correctness). For most analytical use-cases, the potential temporary incorrectness is marginal compared to the size of data being queried. If you need perfect row level data immediately, it may be that your OLTP source would be the better database to query.
If you need strict correctness under out‑of‑order ingestion or in replicated clusters with frequent updates/deletes, VersionedCollapsingMergeTree is the safest collapsing choice, but it introduces operational complexity.
In OLAP databases (specifically, ClickHouse with the table engines analysed above), every transaction is just writing a new row. The way the transactions are dealt with on merge depends on the engine, and you can do some operator math to ensure correctness before a merge.
But all rows are still writes. You can get around this, but you pay the cost of that in complexity at write time or query time.
In tomorrow’s article, we’ll show you how to navigate these complexities in the CDC from OLTP use-case.