Getting DDL Dependencies in order with Moose & SQL

Published Sun, May 25, 2025 ∙ Product ∙ by Nicolas Joseph

If you've worked with SQL databases like ClickHouse, you know that changing one object—say, a table—can have ripple effects on dependent objects like Materialized Views (MVs) or other views. Applying these Data Definition Language (DDL) changes in the wrong order often leads to errors and headaches. This post explains how Moose handles these SQL dependencies to make your infrastructure updates reliable during development with moose dev.

Explore the Code

The implementation of the concepts from this post can be found in the Moose GitHub repository Star:

  • Part 1: Add explicit dependency tracking for materialized views (PR #2282)
  • Part 2: Order changes for materialized views based on dependency (PR #2294)
  • Part 3: Implement state reconciliation for infrastructure planning (PR #2341)

The Challenge: Interdependent SQL Objects

When you define SQL resources, especially MVs that read from source tables and write to target tables, or views that query other tables, there's an inherent order of operations for any changes. For example, you can't create an MV if its source table doesn't exist yet, or alter a table's schema if a dependent view still locks it.

As Moose projects began to incorporate more of these complex, interconnected SQL resources, our initial method of processing each infrastructure change independently hit its limits. If you modified an MV and its source table in the same moose dev cycle, the lack of guaranteed DDL execution order could cause errors, leaving your ClickHouse database in an inconsistent state. This isn't a unique problem—many tools in the data and CI/CD space address migration ordering—but it was something we needed to solve systematically for Moose users.

The Moose Solution: Explicit Dependencies & Ordered Execution

To address this, we focused on two main areas: making sure Moose clearly understands the dependencies you define, and then using that understanding to execute DDL changes in a safe, phased order.

Part 1: Defining Resources and Their Dependencies

Moose needs to know how your SQL resources relate to each other. Here's how that information flows:

  1. Your Code as the Source of Truth: With Moose, you define your infrastructure—tables, MVs, streams, APIs—programmatically using TypeScript (or Python). This code is the source of truth. When you define, say, a Materialized View, you explicitly specify its source tables and the schema for its target (output) table within your TypeScript code.
// Your TypeScript code (e.g., in your app/ directory)
import { sourceTable1 } from './my_source_table'; // An OlapTable instance
import { MaterializedView, sql } from '@514labs/moose-lib';

interface MyMVSchema {
  N: number;
  dim1: string;
}

new MaterializedView<MyMVSchema>({
  materializedViewName: 'my_mv',
  tableName: 'my_mv_target_table', // The physical table storing MV results
  selectStatement: sql`
		SELECT 
			count(*) as N, 
			dim1 
		FROM ${sourceTable1} 
		GROUP BY dim1`,
  selectTables: [sourceTable1],  // Explicitly declares read dependency
});

Using an SDK here allows you to define these resources and their connections within the same programming environment you might be using for other parts of your Moose application, like stream processing logic. The sql template literal is a way to embed SQL clearly, with type-safe interpolation of resource names from your Moose project.

  1. Structured JSON for SDK-to-CLI Communication: When you run moose dev, the Moose SDKs process your TypeScript/Python code. They compile these definitions into a structured JSON payload—internally, we call this a "partial infrastructure map." This JSON describes all your resources and their explicitly declared dependencies (e.g., what an MV pullsDataFrom or pushesDataTo). This JSON is then passed to the Moose Rust CLI.

  2. Rust InfrastructureMap and DataLineage: The Rust CLI parses this JSON and builds a comprehensive in-memory representation called the InfrastructureMap. Rust structs representing your resources (like SqlResource for MVs/Views, Table) hold the resolved dependency information. These structs implement a DataLineage trait, providing methods like pulls_data_from() and pushes_data_to(). This allows the subsequent diffing and ordering logic to query the dependency graph.

This flow—from your explicit code declarations, to a JSON representation, to a queryable InfrastructureMap in Rust—ensures Moose has the dependency information it needs for the next steps.

Part 2: Building and Executing an Ordered DDL Plan

Knowing the dependencies is half the battle. The other half is using that information to apply changes correctly. This happens after Moose diffs your desired state (from your code) against the current database state but before any DDL is run:

  1. Build Dependency Graph: First, we construct a Directed Acyclic Graph (DAG) of all relevant SQL resources using the relationships from the DataLineage trait.
graph TD
    T1[Table: source_table_1] --> MV1[MV: my_mv]
    T2[Table: source_table_2] --> MV1
    MV1 --> T3[Table: target_table_for_mv1]
    T3 --> V1[View: my_view_on_target]
  1. Deconstruct Changes & Plan Phases: Each change (e.g., a table alteration, an MV update) is broken into atomic DDL operations (ALTER TABLE, DROP VIEW, CREATE VIEW, etc.). We then create a two-phased execution plan:

    • Teardown Phase: Collect all DDL operations that remove or modify resources in a way that might break dependents (like DROP VIEW). These are ordered using a reverse topological sort of the dependency graph. This means dependents are torn down before the things they depend on are altered or dropped.
    • Setup/Modify Phase: Collect all DDL operations that create or modify resources (like CREATE TABLE, ALTER TABLE ADD COLUMN). These are ordered using a forward topological sort. This ensures dependencies exist and are correct before their dependents are created or updated.

    Separating teardown from setup/modification helps ensure that, for example, a view locking a table is dropped before the table is altered, and a source table exists before an MV attempts to read from it. This is a common pattern in managing infrastructure changes to avoid DDL conflicts.

sequenceDiagram
    participant User as User Code (Desired State)
    participant MooseCLI as Moose CLI
    participant DepAnalysis as Dependency Analysis & Phasing
    participant ClickHouse

    User->>MooseCLI: moose dev (changes to T1, MV1)
    MooseCLI->>MooseCLI: 1. Diff (Current vs Desired)
    Note over MooseCLI: Changes: Update T1, Update MV1
    MooseCLI->>DepAnalysis: 2. Pass Changes & Dependencies
    DepAnalysis->>DepAnalysis: 3. Build DAG
    DepAnalysis->>DepAnalysis: 4. Identify Atomic Ops (Teardown: DROP MV1, Setup: ALTER T1, CREATE MV1)
    DepAnalysis->>DepAnalysis: 5. Teardown Plan (Topological Sort): [DROP MV1]
    DepAnalysis->>DepAnalysis: 6. Setup/Modify Plan (Topological Sort): [ALTER T1, CREATE MV1]
    MooseCLI->>ClickHouse: 7. Execute Teardown Plan: DROP MV1
    ClickHouse-->>MooseCLI: Success
    MooseCLI->>ClickHouse: 8. Execute Setup/Modify Plan: ALTER T1
    ClickHouse-->>MooseCLI: Success
    MooseCLI->>ClickHouse: 9. Execute Setup/Modify Plan: CREATE MV1
    ClickHouse-->>MooseCLI: Success

Part 3: Simplified Execution

With the ordered teardown_plan and setup_modify_plan, the execution itself is straightforward:

  1. Run all operations in teardown_plan. Stop and report on the first error.
  2. If teardown succeeded, run all operations in setup_modify_plan. Stop and report on the first error.

The complexity of ordering is handled before touching the database.

Handling Out-of-Sync Databases: State Reconciliation

Sometimes, the actual state of your ClickHouse database might differ from what Moose last knew (e.g., due to a previously failed moose dev run or manual ALTER TABLE commands run directly against the database).

To handle this, Moose first queries the database to determine the actual current state of your managed resources. The diffing process then compares this actual state to your target state (defined in your Moose code). The generated plan therefore aims to bridge the gap from the database's current reality to your intended state.

This means if you manually add a column to a table, but that column isn't in your Moose code, the next moose dev run will plan to remove it to align the database with your declarative source of truth. This is how declarative systems work: your code defines the desired state. If you intend for a database change to be permanent, you should reflect that change in your Moose resource definition first.

Building on Established Patterns

While we're tailoring this for the Moose ecosystem and ClickHouse, the core ideas—like using DAGs for dependency tracking and phased execution—are well-established. Tools like dbt use ref() for DAGs in SQL transformation workflows, and Infrastructure-as-Code tools like Terraform manage resource dependencies for broader cloud infrastructure. Migration tools like Flyway or Liquibase also rely on ordered execution, though often manually sequenced.

Moose applies these patterns to manage live DDL changes for SQL resources that you define as part of your application, ensuring that operations happen in a predictable order.

Conclusion: More Predictable SQL Schema Evolution

By implementing this dependency-aware, ordered execution system, Moose provides a more robust way to manage changes to your SQL resources in ClickHouse. You get:

  • More reliable infrastructure updates during moose dev.
  • Fewer DDL errors caused by incorrect operation sequencing.
  • A clearer path to evolving your database schemas alongside your application code.

Moose aims to handle the complexities of DDL ordering so you can focus on defining your data infrastructure. This means no more manually figuring out which DROP or CREATE statement needs to run first when you're refactoring interconnected tables and views. 👍

Careers
We're hiring
2025 All rights reserved