6 minute read
What: CDC from Postgres to Clickhouse as code
Why: Keep OLTP safe and fast while mirroring data near-real-time to OLAP for heavy reads and analytics.
How (at a glance): Postgres OLTP → Debezium CDC → Redpanda streaming ingest → ClickHouse OLAP (ReplacingMergeTree for upserts/deletes on merge).
Why MooseStack: Everything lives in code—from CDC streams through to OLAP tables. Moose dev server makes it trivial to spin up the full pipeline (ClickHouse, Redpanda, and your OLTP infrastructure too) locally in seconds.
Result: Safe OLTP, lightning-fast OLAP, and schema changes stay sane. Regenerate external models, adjust transform/table, keep streaming.
Replicating data from an OLTP (transactional) system to an OLAP (analytical) system should be simple. But there are fundamental differences between these two architectures that make it difficult. From optimizing writes to your OLAP system, to making sure you have the right ClickHouse table engine set up, there are many complexities to writing Change Data Capture (CDC) data to an OLAP database in a performant and easily queryable way.
So, why would you want to replicate your data to OLAP? To massively speed up your queries, and to be able to do so at scale without jeopardizing your transactional application.
There are two main ways to replicate data from OLTP to OLAP:
This article focuses on the as-code approach. It’s aimed at developers who’d rather define a pipeline once and have it run consistently in local dev, staging, and production.
That’s where MooseStack comes in: it’s the developer layer that sits on top of Redpanda and ClickHouse, giving you code-defined data infrastructure and a local development server that mirrors your full stack.
You model ingestion, transforms, and OLAP tables as code. That means: typed CDC events, typed tables, controlled migrations, and replay—all without hand‑rolling consumers.
In this demo application, the application runs a MooseStack instance that creates and connects:
ReplacingMergeTree
.This blog walks you through how to quickstart the app, and get everything running—so you can see how you can manipulate (create, update, delete) the data in the Postgres database live with Drizzle Studio, and see the changes in the data flow through in real time to ClickHouse. Then we’ll dive deep to help you understand what’s happening in each part of the application infrastructure, so you can apply the same architecture to your own app.
The application repo is here: https://github.com/514-labs/debezium-cdc
Prerequisites:
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
Make sure Docker Desktop is running.
This may take a minute. It will run the application’s infrastructure. Open your Docker Desktop app, you can see what it set up!
The diagram shows exactly what the app spins up (how containers map to the architecture). As soon as it starts, you’ll see the initial sync kick off (lots of logs) as data flows from Postgres into ClickHouse.
E.g.
You now have your Postgres and your ClickHouse databases set up and in sync.
If you want to see the CDC in action, in a new terminal, run Drizzle Studio:
Then open Drizzle Studio in your browser, at http://local.drizzle.studio/
Edit the data in the database: create a new row, delete a row, edit a row.
In your application terminal, you’ll be able to see the Moose Dev logs showing you the change data being streamed, transformed, and written to the database.
This will be written into your ClickHouse table, and because the table uses the ReplacingMergeTree table engine, on merge, those events will be deduplicated.
You can see the data in ClickHouse using your IDE’s database client. You can find the instructions for connecting to your local ClickHouse in the MooseOLAP docs. The deleted row CDCd from above is shown rationalized in ClickHouse in the screenshot be
This section will walk you through how this application achieves the above.
Running your MooseStack application in Development Mode (moose dev
) allows you to spin up your OLAP infrastructure locally for development purposes. It reads your moose.config.toml
and starts the required services, including ClickHouse and Redpanda.
Then it watches your application files, and hot-reloads your schema and function changes to your local ClickHouse and Redpanda. Super useful for when you are iterating on your schemas, streaming transformations, APIs, etc.
But sometimes, like in this demo, your development surface area extends outside of your OLAP infrastructure. That’s expected, and you probably still expect the same development experience.
So, moose dev
allows you to extend the services that are spun up when you run the command. By way of example,this application adds Postgres, Kafka Connect with Debezium and Apicurio’s schema registry.
From docker-compose.dev.override.yaml
Now, running moose dev
spins up the MooseStack services (ClickHouse and Redpanda) and the above: you’ve got your end to end dev stack for building and testing your CDC implementation managed with a single command
moose dev
startup script hooksmoose dev
also gives us hooks to be able to run scripts at startup (see /moose.config.toml
, where the application references on_first_start_script = "./setup-cdc.sh"
). Let’s go through the interesting parts:
From setup-cdc.sh:
This is doing three main things:
db:push
(which runs the Drizzle CLI’s drizzle-kit push
command), Drizzle uses the data models in app/oltp/schema.ts
to create tables in Postgres.db:seed 10
(which runs app/oltp/seed.ts
) 10 rows of data are written into Postgres (change this if you want more data).curl
command at the end, the application configures a Kafka Connect connection with reference to postgres-connector.json
: allowing the CDC data that is pushed from Postgres with Debezium to be published to Redpanda.With this, Debezium creates Redpanda topics for each table fitting the configuration (here "table.include.list": "public.*",
).
So, to be clear, at this moment, the application has (see blue, below):
Now that the OLTP database and CDC provider are emitting events, it’s time to transform those events for the ReplacingMergeTree engine and make sure they’re being written efficiently.
To be able to land the data in ClickHouse, the remaining steps are:
In creating this application, we used the moose kafka pull
command to generate TypeScript Stream
objects from the Debezium managed Redpanda topics. You can optionally supply a URL to your schema registry to generate types for each topic (JSON supported, Avro coming soon).
This writes (and overwrites) generated files under app/cdc/1-sources/
. These streams are externally managed—MooseStream doesn’t create the topics; it connects to them.
From app/cdc/1-sources/externalTopics.ts
:
Define a stable, versioned representation of your upstream streaming data sources in code. This code-first foundation lets you build the rest of your data pipeline (routing, transforming, and syncing into OLAP storage). When the schema changes, just regenerate and your app stays in lockstep.
Generated streams are untyped by default; the application derives types to preserve type safety during transformations.
Re‑use your Drizzle model so transforms are fully typed.
From app/cdc/1-sources/typed-topics.ts
:
Carrying Drizzle models into the CDC pipeline keeps OLTP and OLAP code in sync. No duplicate model drift.
Adapt the base OLTP models to OLAP by adding CDC‑specific columns (used by ReplacingMergeTree
) and ClickHouse-optimized typing. (Low‑cardinality strings are a nice win on country
/state
.)
(If you want a deep dive into why this application uses ReplacingMergeTree
, see yesterday’s blog on Table Engines and CDC).
From app/models.ts
:
Then configure the OlapTable
for proper use with the ReplacingMergeTree
table engine.
From app/cdc/3-destinations/olap-tables.ts
:
ClickHouse stays append‑only (fast), and merges reconcile state (updates/deletes) using your version (ver
) and delete (isDeleted
) fields.
Unwrap once, keep it boring (that’s the goal). You can also denormalize here if you like.
From app/cdc/2-transforms/payload-handler.ts
:
Use the above to create the transform function.
From app/cdc/2-transforms/customer-addresses.ts
:
Wire the transform to a sink stream that writes to the table.
From app/cdc/3-destinations/sink-topics.ts
:
This gets you from raw Debezium output:
To a processed row (ready for ClickHouse’s ReplacingMergeTree
tables):
You now have safe OLTP and near-real-time OLAP. Debezium emits WAL changes, Redpanda buffers them, and ClickHouse lands the latest state with append-only speed. MooseStack helps you glue it all together without hand-rolling connectors, keeps schemas and transforms in code, and runs everything locally so you can iterate quickly.
The real advantage shows up over time: because ingestion, transforms, and OLAP models all live in code, schema changes and data quality issues are governed with git, not guessed at. Moose generates migrations, isolates misaligned events in DLQs, and keeps your data contracts versioned and reviewable: the same way you manage your application code.
In the next releases, we’ll build on this foundation: modeling analytical views and derivative tables for fast metrics, time-series, and drill-downs. All still defined as code and typed end to end.