OLAP on Tap: You're just my type

Published Mon, September 29, 2025 βˆ™ ClickHouse, OLAP βˆ™ by Johanan Ottensooser

I've been learning a bunch about OLAP fundamentals. I will be trying to encode that learning into blog posts and agent guides. Let me know your thoughts!

When I was prototyping schemas in PostgreSQL, I'd often be fast an loose with my typing. Storage was cheap, and the flexibility afforded to me in not thinking about the types was more important to me then saving those few cents.

What was important to me was retrieval, and in OLTP, the important thing were the indices. The type efficiency wasn't that material.

When you move to OLAP, you move to a different paradigm. It isn't about finding the one record, using an index. It's about retrieving thousands or millions of records out of potentially billions, and doing aggregations over them.

So, like PostreSQL's indices, we have order by fields. But they help us find batches of data.

Then, you need to extract from each of those potentially millions of rows in that batch. The size of each row therefore becomes critical; with inefficient types potentially doubling or more the size of the data, bloating storage, CPU usage, and query latency.

That’s why typing matters in ClickHouse. Use UInt16 instead of UInt32 when you can.Prefer Float32 over Float64 unless you need the precision.And for small sets of repeating strings (states, manufacturers, etc.),wrap them in LowCardinality(String) β€” ClickHouse will store them as a dictionary under the hood.

ClickHouse can help you out here a bit too, with low cardinality tags allowing you to decrease the size of a column dramatically by mapping the data in that column (even if they are typed like a string) into a dictionary. So if you only have a few strings (like states, manufacturers, etc), this becomes an optimal, easy solution. More on that later.

The downsides to strict typing are:

  1. don't be too strict, it is not trivial to change a type of a column in ClickHouse
  2. it takes some thought to get the right types

That second point becomes moot with LLM assisted development. And, to help your agents with typing correctly, I've created a guide you can import into your project, ready to be used by your copilot of choice πŸ‘‡

πŸ™ https://github.com/oatsandsugar/olap-agent-ref/blob/main/1-type-rules.md

References

Appendix: Example with UInt and Float

Use UInt16 instead of UInt32 when you can If your IDs, counts, or measurements never exceed 65,535, UInt16 is enough. Each value takes half the space. For example, if you’re tracking sensor states, small category codes, or regional IDs, you don’t need 4 billion possibilities.

Prefer Float32 over Float64 unless you need extra precision In analytics, you usually care about trends and aggregates, not 17 decimal places. Float32 is accurate to ~7 digits β€” plenty for metrics like latitude/longitude, temperatures, percentages, and financials rounded to cents. Reserve Float64 for scientific or high-precision math.