Mon, November 24, 202512 min read

From Parquet files in S3 to production CH + NextJS app hosted on Boreal and Vercel

JO
Johanan OttensooserProduct

TL;DR

Scope of this guide

Covers the end-to-end workflow for bootstrapping the MooseStack MCP template with real Parquet data sourced from S3.

  • Walks through installing the Moose CLI, running the MooseStack + Next.js dev services, staging sample data under `packages/moosestack-service/context/`, generating the TypeScript ingest pipelines, bulk-loading local ClickHouse, and validating via the MCP-enabled chat UI.
  • Covers local testing of chat, and developing the NextJS template alongside local MooseStack.
  • Covers deploying MooseStack project to Boreal, and web-app to Vercel, as well as authentication concerns for the same.

Not in scope

  • Continuous data ingestion. This covers the initial bulk load only. If you are considering continuous ingestion, use MooseStack’s Workflows.

Setup

Install Moose CLI

Loading...

Initialize your project locally, and install dependencies

Loading...

Set Anthropic Key env var

If you want to use the chat in the application, (which will have MCP based access to the local ClickHouse managed by MooseDev), make sure to set your Anthropic key as an environment variable

Loading...

If you don’t have an Anthropic API key, you can get one here: Claude Console

Run MooseStack and web app

Make sure you have docker running before you run the following commands

Loading...

Alternatively, you can run either service individually

Loading...

Your web-application is available at http://localhost:3000. Don’t be alarmed! This is a blank canvas for you to use to build whatever user facing analytics you want to. And the chat’s already configured, click the chat icon on the bottom right to open your chat. Our engineer GRoy put some work into that chat panel, and you’ll find that it is resizable, has differentiated scrolling, and more! Feel free to use and modify how you like.

Note: since you’ve not added any data to your MooseStack project, the tools available in the chat won’t be able to retrieve any data. In the next section, we’ll explore and model the data.

Model your data

You can model your data manually (docs) or you can generate a data model from your data. I’ll walk through the generative approach.

This guide assumes you have direct access to the files in S3.

If the files are relatively small, we recommend building up a packages/moosestack-service/context/ directory to gather context, I set mine up like this (you can gitignore your context directory):

Loading...

Copy data from S3

There are many ways to copy data from S3 to this directory, I like this:
S3 CLI docs: https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html

Loading...

If you prefer, you can also do it with the MooseDev SQL client (that uses ClickHouse SQL)
MooseDev Docs: https://docs.fiveonefour.com/moose/moose-cli#query
ClickHouse SQL docs: https://clickhouse.com/docs/sql-reference

Loading...

You can now query your data (and if you like, write that context to a local file!), e.g.

Loading...

Add other context

OLAP, and especially ClickHouse, benefits from rigorous data modeling. LLMs aren’t perfect at understanding the nuance of OLAP data modeling, so I add some docs that I created as a guide to the context. I just clone this repo https://github.com/514-labs/olap-agent-ref into the context/rules directory.

Loading...

Model your data (TypeScript)

To create table(s) in my local dev ClickHouse (and later my production database) that the LLM chat can access with the MCP server in the template, I want to create:

  1. A data model object
  2. An OlapTable object, declaring the table
  3. A reference in index at the root of the MooseStack project (so that MooseOLAP will create the table.
Loading...

I do this sequentially, with instructions like this:

Loading...

(optionally, define the OrderBy fields, but the LLM should have a good guess at this).

Make sure to then add it to moosestack-service/index.ts too:

Loading...

Repeat this for each table you want to model.

Verify the tables were created correctly with moose query

You can verify the tables were created correctly with another Moose Query (or the LLM can use the MooseDev MCP to do the same, MooseDev MCP docs):

Loading...

It is also good practice to double check the model generated against the sample data (LLMs can make assumptions about types of data, even when presented with sample data):

Loading...

If you are interested, you can get the LLM to justify its data modelling decisions in a document it generates:

Loading...

Bulk add data locally

Create a SQL file to load up your data from your remote source to your local ClickHouse:

Loading...

Make sure to properly apply any transformations to conform your S3 data to the data model you’ve created. ClickHouse will do many of these transformations naturally. Notably though:

  • Column renamings will have to be done in SQL
  • Default values in CH only set the value where the column is omitted in an insert. If the column was “null” in the source, you will have to cast that in the insert.

Execute that SQL query:

Loading...

It will return:

Loading...

The query doesn’t return any rows of data to the query client. To validate that it worked, use this command:

Loading...

Chat with your data

Just go to http://localhost:3000: everything should be good to go. Chat away

Create your frontend application

There’s a bootstrapped NextJS application, and an Express app that you can use to add to your framework.

I like to point the LLM at a ShadCN component I am interested in, the Express / MooseStack documentation, and the nextJS application folder:

Loading...

You should see your frontend application update here http://localhost:3000.

Deploy

We’ll prepare the application for deployment by setting up authentication, then deploy the MooseStack application to Boreal, and the web-app to Vercel. You can set up authentication and hosting to your preferences, if we haven’t covered your preferred option in our docs, reach out in our slack community: https://slack.moosestack.com/.

Back-end authentication

MooseStack supports token-based authentication for different API types. JWT and API Key are supported. I’ll use API Key here. Docs link: https://docs.fiveonefour.com/moose/apis/auth.

Ingest & Consumption API Authentication

Generate hash tokens for your core APIs:

Loading...

This will output:

Loading...

Set your environment variables locally:

Loading...

Express API Authentication

⚠️ Important: Custom Express APIs also need authentication! MooseStack now provides createAuthMiddleware() in @514labs/express-pbkdf2-api-key-auth:

Generate an API key:

Loading...

This outputs:

  • ENV API Key: Hashed key for MOOSE_WEB_APP_API_KEY (store on server)
  • Bearer Token: Plain-text token for clients (use in Authorization headers)

Set the Express API environment variable:

Loading...

Protect your Express endpoints:

Loading...
Loading...

Making Authenticated Requests

All APIs use Bearer token authentication. For Express APIs, the token format includes the salt:

Loading...
Loading...

You may want to move to JWT based security when you move your application to production.

Deploy MooseStack to Boreal

Go to boreal.cloud, and set up an account. Create an organization.

Import a MooseStack Project:

![][image1]

Set the path to the root of the MooseStack service:

Loading...

And set the Environment Variables used in the project (like the authentication we set up above):

Loading...

![][image2]

Continue, selecting Boreal default hosting (or point it at your own managed ClickHouse and Redpanda instances if you like).

![][image3]

Click Deploy, and you’ll see your application MooseStack being deployed.

Deploy web-app to Vercel

Authentication

⚠️ Important: The authentication set up above just ensures your back end and frontend can communicate securely. We are going to set environment variables here that give the front end access to the data. Accordingly, please ensure that you are properly adding authentication to your front end. Vercel offers this natively if your deployment is a preview deployment. If you want to productionize this, you may have to implement authentication using something like NextAuth, Auth0 or Clerk.

See Vercel docs

Deployment

  1. From the Vercel home-page, add a new project.
  2. Point it at this project, and set the root directory to packages/web-app
  3. Set the following environment variables
Loading...

Finding your Boreal MooseStack endpoint URL (it is right at the top of the project overview page):
![][image4]

Bulk add data to Boreal ClickHouse

Your project is now deployed and linked.

You have a Vercel hosted frontend. You have a Boreal hosted backend, with tables, APIs etc. set up for you.

Your backend, however, is still unpopulated on Boreal.

Find your boreal connection string / database details

It is in the Database tab of your deployment overview:
![][image5]

Make sure to select the appropriate connection string type:
![][image6]

Connect your SQL client, and run the following ClickHouse SQL query:

Loading...

Where:

  • <clickhouse-database>: ClickHouse Cloud database name
  • <table-name>: target table for the data
  • <bucket-name>: S3 bucket name
  • <path-to-file>: path to the Parquet file in S3
  • <aws-access-key-id>: AWS Access Key ID
  • <aws-secret-access-key>: AWS Secret Access Key

This will again return 0 rows, but you can validate that the transfer worked correctly as follows:

Loading...

Conclusion

You now have a fully fledged NextJS application, with charts, and chat, running off a production-scale ready MooseStack back end on Boreal.

Go to your Vercel deployment, chat with your data!

Next steps

Context is everything in LLM performance. To improve your chat experience, consider embedding more context in the data.

Two methods to try:

Embed your “data dictionary” in your data

Clickhouse allows you to embed table and column level metadata.

Loading...

Add rich context

You can create another context table in ClickHouse with arbitrary JSON context. This might not be performant at scale, but can be useful in providing some small sized context to the MCP (e.g. report formats).

Interested in learning more?

Sign up for our newsletter — we only send one when we have something actually worth saying.

Follow us

Related posts

All Blog Posts