Goodreads Book Review Dataset Template
*Aurora has been rebranded Sloan. See Sloan docs here: docs.fiveonefour.com/sloan
My wife is an author, so of course, when looking for additional templates to add for AI playground, the Goodreads database I stumbled across on Kaggle hit pretty hard!
Unlike the ADSB dataset, this is a single bulk ingestion from a website. However, I think it shows off a couple of cool things our advanced tools were able to do, including (1) figuring out how to use Kaggle's authentication and python SDK, (2) turning the bulk ingest into a stream of individual book datapoints for the ingestion API; (3) previewing the data for the user in the ingestion script.
Also, it is just super interesting to analyse book data!
Here's a sample of the data:
| bookID | title | authors | average_rating | isbn | isbn13 | language_code | num_pages | ratings_count | text_reviews_count | publication_date | publisher |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Harry Potter and the Half-Blood Prince (Harry Potter #6) | J.K. Rowling/Mary GrandPré | 4.57 | 0439785960 | 9780439785969 | eng | 652 | 2095690 | 27591 | 9/16/2006 | Scholastic Inc. |
Here's how you get it running…
Requirements
- NodeJS
- Docker Desktop
- A Kaggle API key
- An Anthropic API key (if you don't have one, here's the setup guide)
- Python 3 (we recommend using pyenv or something similar)
Nice to haves
These will be required if you want to interrogate this data with our MCP tools:
- Claude Desktop or Cursor, or both!
Install Moose / Aurora
This will install Moose (our open source developer platform for data engineering) and Aurora (our AI data engineering product).
🔑 It will ask you for your Anthropic API key, again, if you don't have it, here's the setup guide.
Create a new project using the Goodreads template configured with Claude Desktop
This will use Aurora to initialize a new project called "books" based on the templated "goodreads", whilst configuring Claude Desktop to have the Aurora MCP tools built out with respect to this.
Then, you'll need to run a few more commands to get things ready to go:
This will install the dependencies you have.
☸️ Make sure Docker Desktop is running before the next step!
This will run the Moose local dev server, spinning up all your local data infrastructure including Clickhouse, Redpanda, Temporal and our Rust ingest servers.
Add your Kaggle authentication key to the project's directory
Add your Kaggle authentication key to the root directory, it should be books/Kaggle Settings.json and have the structure:
If you don't have one yet, you can get one here: https://www.kaggle.com/docs/api
Run the ingestion script
In a new terminal, navigate back the the project directory
You'll know you are in the correct directory if the moose-config.toml is in the directory.
Then run the python ingest script!
This will grab a sample of the data, ask you if this is conformant with your expectations, then send batches of data to the data model ingestion point.
If you go back to your original terminal running the Moose dev server, you'll see hundreds of incoming datapoints.
Explore your data in Claude
Ask any questions you might like of Claude!
We had fun with
What's the greatest correlate with book ratings? year? number of pages? whether the book was a sequel?
and
what are the type of books that I should invest in as a publisher if I want to maximize my return (the best investments being non-obvious books that are more affordable to acquire with higher sales).
Productionize your results with Cursor
So first, let's configure Cursor with the Aurora MCP tool-suite pointed at this project.
Navigate to the project directory and open Cursor
Then, run the Aurora command to configure the MCP for cursor
This will create a /.cursor/mcp.json file configured for Aurora's MCP—whenever a this Cursor project is running, this MCP will be started.
If you go to cursor>settings>cursor settings>MCP you'll see the server.
Click enable and refresh, and you should be ready to go!
One line of questioning we liked here was
Create an API that takes in a year and returns the top 50 books for that year ranked by review (excluding any outliers with small numbers of reviews)
Interested in learning more?
Sign up for our newsletter — we only send one when we have something actually worth saying.
Related posts
All Blog Posts
OLAP, AI, ClickHouse
Data modeling for OLAP with AI ft. Michael Klein, Director of Technology at District Cannabis
District Cannabis rebuilt its entire data warehouse in just four hours using AI-assisted OLAP modeling. See how Moose copilots optimized raw Snowflake data for ClickHouse performance—with tight types, smart sort keys, and clean materialized views.

OLAP, Product, Python
Just OLAP It (Python Edition): Derive Moose OLAP Models from SQLModel
This hands-on walkthrough shows how to derive Moose OLAP models directly from Python’s SQLModel. You’ll learn how to map your OLTP schema to ClickHouse through MooseStack — defining OLTP models, mirroring them into Pydantic payloads, layering in CDC metadata, and declaring OLAP tables. While TypeScript automates much of this flow, Python requires explicit type bridging. This post highlights what’s manual today, where automation could fit, and how to keep your OLTP and OLAP layers in sync.