In our experience, AI agents were able to deliver data to gold standards in a matter of minutes (low single digit hours if you include context gathering) in a level of complexity where such a job ordinarily took days.
Medallion architecture was widely adopted alongside the adoption of Databrick’s lakehouse architecture, largely to help enterprises with diverse data needs to build scalable data engineering teams. Its primary principals are: data products have different “medallion levels”, and each such level has different quality guarantees.
This allows a team to explore “bronze” unrefined data, or rely on “gold” or “platinum” data products for producing trusted reports. Each level’s standards must be met before that data product is promoted, and the strictness of those standards help build trust in the data product. Ideally, a consumer of the data can consume from the “gold” layer without needing to validate the provenance or quality of the data—a potentially huge increase in velocity on the data consumption side.
On the production side, it also allowed data engineering orgs separate from business orgs to work on promoting those data products by following quality standards set for each level.
The above all looks really great, and that’s why the architecture is so popular amongst data engineering and data platform teams.
As data products get built, promoted and consumed, demand for these high quality data products increase. And demand for the data engineering services of the teams that are administering the architecture tends to increase too. As you can guess, here lies the rub—demand increases but the capacity of the data engineering team doesn’t. This capacity constraint is driven by budget constraints and the sub-linear increase of productivity for labor input in engineering (with overheads in management, coordination, and lack of parallelizeability—two people can’t make 2 minute noodles in a minute).
So for this to succeed the data engineering team needs to become more efficient. This is usually achieved in a couple of ways: (1) implement a systemization of the process (which can lead to being process rather than results or value driven); (2) relax the standards for each layer.
Often it is both: creating standards for compliance and then pushing work through as quickly as demanded, leaving some of those standards behind. And backlogs still grow.
A data engineering lead we spoke to told us that his engineers just didn’t have enough hours in each day to go through all the steps they needed to do to follow and test against the data quality standards that they had signed up for. Our experience tells us that this is common.
Consumers lose trust in the data products they consume, and data engineers have to spend increasing amounts of time fixing the errors caught by the consumers (if they are caught at all).
There are two solutions that are possible here. In the first, a data mesh like federated approach pushes the work to upstream data producers, giving them the responsibility to deliver quality data. The second solution leverages advancements in data engineering frameworks and generative AI.
These are individually effective solutions, jointly more effective and one does not require the other.
Federated data architectures and processes shift data product work away from the central data team–usually the production of the data products falls to producers (i.e. the team that produces the data would be responsible for transforming the data to the bronze or silver layer) and the consumer would build the consumption layer (i.e. producing the gold layer).
The biggest limitation of this approach is the assumption that the data producers and consumers have the data engineering capacity to build. This is the converse of the issue with centralizing—that central data engineering teams lack the subject matter expertise to create meaningful data products.
Accordingly, there comes a pressure for the data engineering team to become more of a data platform team, focusing on empowering the producers and consumers by managing tools and platforms, usually accompanied by building abstractions over the data processing platforms they manage to make it easier for people to build and manage what they mean. Whilst this sometimes reduces the flexibility of the platforms, the ease of use is usually a valid trade-off.
We’re building one of those latest generation developer frameworks: Moose. Data producers can write their data engineering flows needed to create data products in the languages they are using to write their own software (Python, Typescript); this abstracts best in class open source tools (Temporal, Redpanda, Clickhouse, Rust Webservers) etc. to allow these teams without data engineers to build data products scalably.
Data consumers can create their consumption layer similarly—define the egress API or materialized view they want in the language of their choosing and it becomes instantiated in the infrastructure.
As we discussed, why can’t the central data engineering team scale? Increasing demand, steady productivity, increasing quantity of standards, increased minutia and, really, endlessly detail oriented data cleaning / validating / deduplicating / …
We said both solutions add data engineering capacity, but that capacity can be an AI agent. In fact, AI agents are particularly good at this kind of work.
Agents are defined by their loops: they have a goal (i.e. pass some test), they have context, and they can use tools to solve for that goal. The tools usually do a thing with some side effects (e.g. create a data pipeline, side effects being data in a database, logs in a dev server, etc.) that add additional context for the agent. The agent acts and tests until they solve the problem or run out of attempts.
Our current best approach to this is as follows:
The agent loop here is simple:
The agents don’t care about the tedium of the data cleaning work, they can work in parallel, and they can do the repeated data quality improvement work tirelessly and effectively.
A data architect we spoke with recently said: “the agents are doing data engineering the way our data engineers would do if they had more time”.
They deliver a data product at quality, and, unlike human actors, without any extra work they generate a literal checklist showing that all the data quality standards are met (or, if they aren’t, which aren’t and why).
Agents that aren’t limited to an IDE can also gather further context (e.g. via slack, teams, email, jira, etc.) from data producing or consuming teams, helping to draw together the information needed to make a specification for a gold level data product.
One of the hard things about developing with AI agents is ensuring that they can “one shot” the solution, or, if not, that they are able to draw in enough additional context to keep moving towards a real (non-hallucinated) solution to the problem.
At Fiveonefour, we’ve approached this problem with three main pillars (creating our Aurora product):
Our developer framework Moose drastically reduces the “surface area” the agent has to deal with. By way of example, a single Moose data model file allows the agent to instantiate one or more of an ingest API, data stream, OLAP table.
Our developer environments (local or cloud) feed AI comprehensible telemetry and errors back to the agent, as well as giving them recourse to the infrastructure (reading the data models and schemas, streams, tables, etc.) to see the result of their actions.
Our tools help the agents plan, manage context and evolve their prompts to move towards a solution.
Medallion architectures are an enterprise-scale solution to building high quality data products that surprisingly has growing pains with scale. Most of these problems are derived from the architecture’s success: better products mean more demand; with static capacity either lead times are completely blown out or quality drops.
There are solutions to this problem that preserve the premise of medallion architecture:
With MCPs and the latest generation of agents (really, since Claude 3.5), agents are writing good enough code to do this work (see example below).
To get started, it can be as simple as using an agent-enabled IDE (like Cursor or Windsurf) against your existing data engineering code and quality standards documentation. MCPs like Clickhouse’s, Fiveonefour’s Aurora MCP, DuckDB’s MCP and many others can arm your agents with the tools they need to deliver high quality data products at scale.
An early prototype where the agents created data models representing the bronze data product redacted_data_bronze
, compared that against the condensed data standards data_standards.md
, generated a proposed silver data model redacted_data_silver.ts
and set of transformations based on the data product promotion plan RedactedDataPipeline_quality.md
.