If you've worked with SQL databases like ClickHouse, you know that changing one object—say, a table—can have ripple effects on dependent objects like Materialized Views (MVs) or other views. Applying these Data Definition Language (DDL) changes in the wrong order often leads to errors and headaches. This post explains how Moose handles these SQL dependencies to make your infrastructure updates reliable during development with moose dev
.
The implementation of the concepts from this post can be found in the Moose GitHub repository Star:
When you define SQL resources, especially MVs that read from source tables and write to target tables, or views that query other tables, there's an inherent order of operations for any changes. For example, you can't create an MV if its source table doesn't exist yet, or alter a table's schema if a dependent view still locks it.
As Moose projects began to incorporate more of these complex, interconnected SQL resources, our initial method of processing each infrastructure change independently hit its limits. If you modified an MV and its source table in the same moose dev
cycle, the lack of guaranteed DDL execution order could cause errors, leaving your ClickHouse database in an inconsistent state. This isn't a unique problem—many tools in the data and CI/CD space address migration ordering—but it was something we needed to solve systematically for Moose users.
To address this, we focused on two main areas: making sure Moose clearly understands the dependencies you define, and then using that understanding to execute DDL changes in a safe, phased order.
Moose needs to know how your SQL resources relate to each other. Here's how that information flows:
// Your TypeScript code (e.g., in your app/ directory)
import { sourceTable1 } from './my_source_table'; // An OlapTable instance
import { MaterializedView, sql } from '@514labs/moose-lib';
interface MyMVSchema {
N: number;
dim1: string;
}
new MaterializedView<MyMVSchema>({
materializedViewName: 'my_mv',
tableName: 'my_mv_target_table', // The physical table storing MV results
selectStatement: sql`
SELECT
count(*) as N,
dim1
FROM ${sourceTable1}
GROUP BY dim1`,
selectTables: [sourceTable1], // Explicitly declares read dependency
});
Using an SDK here allows you to define these resources and their connections within the same programming environment you might be using for other parts of your Moose application, like stream processing logic. The sql
template literal is a way to embed SQL clearly, with type-safe interpolation of resource names from your Moose project.
Structured JSON for SDK-to-CLI Communication: When you run moose dev
, the Moose SDKs process your TypeScript/Python code. They compile these definitions into a structured JSON payload—internally, we call this a "partial infrastructure map." This JSON describes all your resources and their explicitly declared dependencies (e.g., what an MV pullsDataFrom
or pushesDataTo
). This JSON is then passed to the Moose Rust CLI.
Rust InfrastructureMap
and DataLineage
: The Rust CLI parses this JSON and builds a comprehensive in-memory representation called the InfrastructureMap
. Rust structs representing your resources (like SqlResource
for MVs/Views, Table
) hold the resolved dependency information. These structs implement a DataLineage
trait, providing methods like pulls_data_from()
and pushes_data_to()
. This allows the subsequent diffing and ordering logic to query the dependency graph.
This flow—from your explicit code declarations, to a JSON representation, to a queryable InfrastructureMap
in Rust—ensures Moose has the dependency information it needs for the next steps.
Knowing the dependencies is half the battle. The other half is using that information to apply changes correctly. This happens after Moose diffs your desired state (from your code) against the current database state but before any DDL is run:
DataLineage
trait.graph TD
T1[Table: source_table_1] --> MV1[MV: my_mv]
T2[Table: source_table_2] --> MV1
MV1 --> T3[Table: target_table_for_mv1]
T3 --> V1[View: my_view_on_target]
Deconstruct Changes & Plan Phases: Each change (e.g., a table alteration, an MV update) is broken into atomic DDL operations (ALTER TABLE
, DROP VIEW
, CREATE VIEW
, etc.). We then create a two-phased execution plan:
DROP VIEW
). These are ordered using a reverse topological sort of the dependency graph. This means dependents are torn down before the things they depend on are altered or dropped.CREATE TABLE
, ALTER TABLE ADD COLUMN
). These are ordered using a forward topological sort. This ensures dependencies exist and are correct before their dependents are created or updated.Separating teardown from setup/modification helps ensure that, for example, a view locking a table is dropped before the table is altered, and a source table exists before an MV attempts to read from it. This is a common pattern in managing infrastructure changes to avoid DDL conflicts.
sequenceDiagram
participant User as User Code (Desired State)
participant MooseCLI as Moose CLI
participant DepAnalysis as Dependency Analysis & Phasing
participant ClickHouse
User->>MooseCLI: moose dev (changes to T1, MV1)
MooseCLI->>MooseCLI: 1. Diff (Current vs Desired)
Note over MooseCLI: Changes: Update T1, Update MV1
MooseCLI->>DepAnalysis: 2. Pass Changes & Dependencies
DepAnalysis->>DepAnalysis: 3. Build DAG
DepAnalysis->>DepAnalysis: 4. Identify Atomic Ops (Teardown: DROP MV1, Setup: ALTER T1, CREATE MV1)
DepAnalysis->>DepAnalysis: 5. Teardown Plan (Topological Sort): [DROP MV1]
DepAnalysis->>DepAnalysis: 6. Setup/Modify Plan (Topological Sort): [ALTER T1, CREATE MV1]
MooseCLI->>ClickHouse: 7. Execute Teardown Plan: DROP MV1
ClickHouse-->>MooseCLI: Success
MooseCLI->>ClickHouse: 8. Execute Setup/Modify Plan: ALTER T1
ClickHouse-->>MooseCLI: Success
MooseCLI->>ClickHouse: 9. Execute Setup/Modify Plan: CREATE MV1
ClickHouse-->>MooseCLI: Success
With the ordered teardown_plan
and setup_modify_plan
, the execution itself is straightforward:
teardown_plan
. Stop and report on the first error.setup_modify_plan
. Stop and report on the first error.The complexity of ordering is handled before touching the database.
Sometimes, the actual state of your ClickHouse database might differ from what Moose last knew (e.g., due to a previously failed moose dev
run or manual ALTER TABLE
commands run directly against the database).
To handle this, Moose first queries the database to determine the actual current state of your managed resources. The diffing process then compares this actual state to your target state (defined in your Moose code). The generated plan therefore aims to bridge the gap from the database's current reality to your intended state.
This means if you manually add a column to a table, but that column isn't in your Moose code, the next moose dev
run will plan to remove it to align the database with your declarative source of truth. This is how declarative systems work: your code defines the desired state. If you intend for a database change to be permanent, you should reflect that change in your Moose resource definition first.
While we're tailoring this for the Moose ecosystem and ClickHouse, the core ideas—like using DAGs for dependency tracking and phased execution—are well-established. Tools like dbt use ref()
for DAGs in SQL transformation workflows, and Infrastructure-as-Code tools like Terraform manage resource dependencies for broader cloud infrastructure. Migration tools like Flyway or Liquibase also rely on ordered execution, though often manually sequenced.
Moose applies these patterns to manage live DDL changes for SQL resources that you define as part of your application, ensuring that operations happen in a predictable order.
By implementing this dependency-aware, ordered execution system, Moose provides a more robust way to manage changes to your SQL resources in ClickHouse. You get:
moose dev
.Moose aims to handle the complexities of DDL ordering so you can focus on defining your data infrastructure. This means no more manually figuring out which DROP
or CREATE
statement needs to run first when you're refactoring interconnected tables and views. 👍