Cardinality in OLAP Databases: LowCardinality Wrappers, Enums & Compression Heuristics

As we discussed yesterday, the strictness of a type is very much tied to the read and aggregate performance of OLAP databases. The amount of variance in a column is called "cardinality". Fields with higher cardinality have more variance, like UUIDs. Fields with lower cardinality have lower variance: like boolean.

The corollary of this is compression and encoding efficiency. Low cardinality, great compression. High cardinality, bad compression.

Column	Distinct Count	Compression	Storage Strategy
country	200	Excellent	LowCardinality(String)
state	50	Excellent	Enum('CA','TX',...)
user_id	50M	Poor	Plain UInt64

Cardinality comparison table

Low Cardinality Wrapper

You notice there in country, that the column type is string, but that it uses a wrapper, LowCardinality().

What that does is encode each of those 200 countries as a dictionary, drastically reducing the possible "amount of information" that could be possible in that column, allowing for improved compression.

This is really powerful, and you should use it if you have less than 10-50k unique values in a column (assuming OLAP scale tables, in the millions of rows). As you get past this limit, the overhead of dictionary lookup can obviate the performance gains from using this wrapper.

Enums

When the possible space of results is small, manually defining them as enumerates (e.g. CA = 1, NY = 2, TX = 3) makes for an even faster and smaller column than using LowCardinality(). Don't use this when you expect new values (will require a migration to update).

Agent docs

Use these in with your Claude, Cursor or other agents:

🐙 https://github.com/oatsandsugar/olap-agent-ref/blob/main/1-type-rules.md#2-lowcardinality-rules-dictionary-encoding

References

🔗 CH Low Cardinality: https://clickhouse.com/docs/sql-reference/data-types/lowcardinality
🔗 CH Enum: https://clickhouse.com/docs/sql-reference/data-types/enum

Appendix: Practical Heuristics for Cardinality

Cardinality	Type	Notes
≤ 10	Enum8	perfect static set
10–10k	LowCardinality(String)	best trade-off
10k–100k	LowCardinality (watch memory)	depends on workload
>100k	base String / numeric	avoid dictionary

Table showing heuristics for Types / Wrappers and expected cardinality

OLAP on Tap: High time for low cardinality

Published Tue, September 30, 2025 ∙ OLAP, ClickHouse ∙ by Johanan Ottensooser

Low Cardinality Wrapper

Enums

Agent docs

References

Appendix: Practical Heuristics for Cardinality