OLAP on Tap: High time for low cardinality
As we discussed yesterday, the strictness of a type is very much tied to the read and aggregate performance of OLAP databases. The amount of variance in a column is called "cardinality". Fields with higher cardinality have more variance, like UUIDs. Fields with lower cardinality have lower variance: like boolean.
The corollary of this is compression and encoding efficiency. Low cardinality, great compression. High cardinality, bad compression.
| Column | Distinct Count | Compression | Storage Strategy |
|---|---|---|---|
| country | 200 | Excellent | LowCardinality(String) |
| state | 50 | Excellent | Enum('CA','TX',...) |
| user_id | 50M | Poor | Plain UInt64 |
Cardinality comparison table
Low Cardinality Wrapper
You notice there in country, that the column type is string, but that it uses a wrapper, LowCardinality().
What that does is encode each of those 200 countries as a dictionary, drastically reducing the possible "amount of information" that could be possible in that column, allowing for improved compression.
This is really powerful, and you should use it if you have less than 10-50k unique values in a column (assuming OLAP scale tables, in the millions of rows). As you get past this limit, the overhead of dictionary lookup can obviate the performance gains from using this wrapper.
Enums
When the possible space of results is small, manually defining them as enumerates (e.g. CA = 1, NY = 2, TX = 3) makes for an even faster and smaller column than using LowCardinality(). Don't use this when you expect new values (will require a migration to update).
Agent docs
Use these in with your Claude, Cursor or other agents:
References
- 🔗 CH Low Cardinality: https://clickhouse.com/docs/sql-reference/data-types/lowcardinality
- 🔗 CH Enum: https://clickhouse.com/docs/sql-reference/data-types/enum
Appendix: Practical Heuristics for Cardinality
| Cardinality | Type | Notes |
|---|---|---|
| ≤ 10 | Enum8 | perfect static set |
| 10–10k | LowCardinality(String) | best trade-off |
| 10k–100k | LowCardinality (watch memory) | depends on workload |
| >100k | base String / numeric | avoid dictionary |
Table showing heuristics for Types / Wrappers and expected cardinality
Interested in learning more?
Sign up for our newsletter — we only send one when we have something actually worth saying.
Related posts
All Blog Posts
OLAP
OLAP migration complexity is the cost of fast reads
In transactional systems, migrations are logical and predictable. In analytical systems, every schema change is a physical event—rewrites, recomputations, and dependency cascades. This post explains why OLAP migrations are inherently complex, the architectural trade-offs behind them, and how to manage that complexity.

OLAP, Product
Ship your data with Moose APIs
You’ve modeled your OLAP data and set up CDC—now it’s time to ship it. Moose makes it effortless to expose your ClickHouse models through typed, validated APIs. Whether you use Moose’s built-in Api class or integrate with Express or FastAPI, you’ll get OpenAPI specs, auth, and runtime validation out of the box.