My wife is an author, so of course, when looking for additional templates to add for AI playground, the Goodreads database I stumbled across on Kaggle hit pretty hard!

Unlike the ADSB dataset, this is a single bulk ingestion from a website. However, I think it shows off a couple of cool things our advanced tools were able to do, including (1) figuring out how to use Kaggle's authentication and python SDK, (2) turning the bulk ingest into a stream of individual book datapoints for the ingestion API; (3) previewing the data for the user in the ingestion script.

Also, it is just super interesting to analyse book data!

Here's a sample of the data:

bookID	title	authors	average_rating	isbn	isbn13	language_code	num_pages	ratings_count	text_reviews_count	publication_date	publisher
1	Harry Potter and the Half-Blood Prince (Harry Potter #6)	J.K. Rowling/Mary GrandPré	4.57	0439785960	9780439785969	eng	652	2095690	27591	9/16/2006	Scholastic Inc.

bookID

title

authors

average_rating

isbn

isbn13

language_code

num_pages

ratings_count

text_reviews_count

publication_date

publisher

Harry Potter and the Half-Blood Prince (Harry Potter #6)

J.K. Rowling/Mary GrandPré

4.57

0439785960

9780439785969

eng

652

2095690

27591

9/16/2006

Scholastic Inc.

Here's how you get it running…

Requirements

NodeJS
Docker Desktop
A Kaggle API key
An Anthropic API key (if you don't have one, here's the setup guide)
Python 3 (we recommend using pyenv or something similar)

Nice to haves

These will be required if you want to interrogate this data with our MCP tools:

Claude Desktop or Cursor, or both!

Install Moose / Aurora

bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose,aurora

This will install Moose (our open source developer platform for data engineering) and Aurora (our AI data engineering product).

🔑 It will ask you for your Anthropic API key, again, if you don't have it, here's the setup guide.

Create a new project using the Goodreads template configured with Claude Desktop

aurora init books goodreads --mcp claude-desktop

This will use Aurora to initialize a new project called "books" based on the templated "goodreads", whilst configuring Claude Desktop to have the Aurora MCP tools built out with respect to this.

Then, you'll need to run a few more commands to get things ready to go:

cd goodreads

npm install

This will install the dependencies you have.

☸️ Make sure Docker Desktop is running before the next step!

moose dev

This will run the Moose local dev server, spinning up all your local data infrastructure including Clickhouse, Redpanda, Temporal and our Rust ingest servers.

Add your Kaggle authentication key to the project's directory

Add your Kaggle authentication key to the root directory, it should be books/Kaggle Settings.json and have the structure:

{"username":"your_username","key":"your_key"}

If you don't have one yet, you can get one here: https://www.kaggle.com/docs/api

Run the ingestion script

In a new terminal, navigate back the the project directory

cd path/to/books

You'll know you are in the correct directory if the moose-config.toml is in the directory.

Then run the python ingest script!

python ingest_goodreads_data.py

This will grab a sample of the data, ask you if this is conformant with your expectations, then send batches of data to the data model ingestion point.

If you go back to your original terminal running the Moose dev server, you'll see hundreds of incoming datapoints.

Explore your data in Claude

Ask any questions you might like of Claude!

We had fun with

What's the greatest correlate with book ratings? year? number of pages? whether the book was a sequel?

and

what are the type of books that I should invest in as a publisher if I want to maximize my return (the best investments being non-obvious books that are more affordable to acquire with higher sales).

Productionize your results with Cursor

So first, let's configure Cursor with the Aurora MCP tool-suite pointed at this project.

Navigate to the project directory and open Cursor

cd path/to/your/project

cursor .

Then, run the Aurora command to configure the MCP for cursor

aurora setup --mcp cursor-project

This will create a /.cursor/mcp.json file configured for Aurora's MCP—whenever a this Cursor project is running, this MCP will be started.

If you go to cursor>settings>cursor settings>MCP you'll see the server.

Click enable and refresh, and you should be ready to go!

One line of questioning we liked here was

Create an API that takes in a year and returns the top 50 books for that year ranked by review (excluding any outliers with small numbers of reviews)

ClickHouse for Cyborgs

Goodreads Book Review Dataset Template

Published Thu, April 3, 2025 ∙ Templates, Product, AI ∙ by Johanan Ottensooser