The Lookup Kit: ClickHouse® Dictionaries and JOINs
You have data in two tables, and you need to combine them at query time. In OLTP-land you JOIN without a second thought. In ClickHouse, that reflex is wrong about 80% of the time.
ClickHouse gives you different tools for combinatorial work, and the wrong choice costs about 20×. We measured it on our own cluster this week: 11 seconds with dictGet, 225 seconds with the equivalent JOIN, on the same 10-billion-row enrichment. The full numbers are in the benchmark section below.
This post is a deep dive into dictGet (the right default), an operational nuance on JOINs (the escape hatch), and the move that beats both when the enrichment is stable: resolving it once at ingest so the query-time lookup disappears. At the end, a one-page decision tree that you can save.
Why OLAP makes joining expensive
ClickHouse stores data on disk, sorted by ORDER BY, with a sparse index in RAM. We covered the mechanics in the first post in this series. Joining two tables breaks that sorted access pattern. At least one side has to be reorganized in memory, usually as a hash table, and sometimes as a sorted stream. That reorganization is what costs you.
The tools each solve a different shape of the problem:
- Dimension lookup with low-cardinality reference data that updates slowly: a dictionary, via
dictGet. - Two large analytical tables combined ad-hoc: a JOIN, with the right algorithm.
- A stable lookup you read constantly while the dim barely moves: bake it in at ingest, either upstream in your pipeline or with
dictGetinside a materialized view, and the query-time lookup disappears entirely.
The cheaper alternatives hierarchy from the query-optimization post puts dictGet at the top, JOINs at the bottom, but in this case, we are going deeper.
Dictionaries
A dictionary is a lookup primitive, and you don't query it like a table most of the time; you call dictGet from inside another query. The dict lives in RAM on the server, refreshed from an external source on a schedule, and serves O(1) lookups instead of the JOIN's hash-table-build-then-scan.
The materialized-views post had the canonical war story: an MV joined a 500-million-row dimension table on every insert, and INSERT latency went from 50ms to 2 seconds. The fix was replacing the JOIN with dictGet. That story is still the best one-paragraph summary of why this tool exists. Read it here if you missed it.
Sources
You can create a dictionary with PostgreSQL, MySQL, another ClickHouse table, an HTTP endpoint, an executable, or a file on disk. The source is decoupled from how ClickHouse stores the dict in memory. The most common production case is "PostgreSQL holds the truth, ClickHouse holds the lookup":
CREATE DICTIONARY product_metadata (
sku UInt64,
name String,
category UInt8,
base_price Decimal(10, 2)
) PRIMARY KEY sku
SOURCE(POSTGRESQL(
host 'pg.internal' port 5432
user 'reader' password 'secret'
db 'inventory' table 'products'
invalidate_query 'SELECT max(updated_at) FROM products'
))
LIFETIME(MIN 60 MAX 300)
LAYOUT(HASHED());
For same-cluster dictionaries that read from another ClickHouse table, use SOURCE(CLICKHOUSE(...)) and pass explicit USER and PASSWORD. Managed clusters (including ours) do not let dicts authenticate implicitly. We hit that one during the benchmark setup for this post; the dict failed to load with a misleading "AUTHENTICATION_FAILED" until we added the credentials inline.
Layouts: pick by key shape and memory budget
The layout is the decision that determines memory usage and lookup speed. ClickHouse has more layouts than most teams need to know about. Here are the ones you actually pick from:
-
flatis the fastest layout that exists. It's a plain array indexed directly by the key, so a lookup is one array offset with no hashing involved. The constraint is baked into that design: keys have to be denseUInt64values under themax_array_sizecap, 500K by default and configurable higher. Sequential integer keys, an auto-incrementidor aCounterID, are the sweet spot. Feed it sparse or non-integer keys, and you either burn memory on the gaps or it refuses to build. -
Reach for
hashedwhenflatdoesn't fit, which is most of the time. It's a standard hash table: any single key type goes in, gets hashed to a bucket, and walks a short chain on collision. Strings, wide integer ranges, and sparse keys, all handled. You pay a little more RAM thanflatand a hash computation per lookup, but it stays O(1) effectively, and you stop worrying about key shape. When you don't know which layout you need, this is the right place to start. -
complex_key_hashedis the same hash table with a composite key. Identical memory profile, except the key is a tuple of columns instead of a single one. Whenever your real key is "these two or three columns together," this is the layout.
Those three cover the large majority of dictionaries you'll build. ClickHouse ships about a dozen more for narrower jobs: range_hashed for time-versioned lookups like currency rates by day, ip_trie for CIDR and GeoIP matching, cache and ssd_cache for dicts too large to keep fully in RAM, and complex_key_hashed_array for keys carrying many attributes. If your key shape doesn't match the three above, skim the layouts documentation before committing.
Refresh strategies
Three mechanisms, each with a tradeoff.
-
LIFETIME(MIN n MAX m)refreshes the dict at a random interval betweennandmseconds. The randomization prevents thundering herd across replicas refreshing the same source at the same instant. UseLIFETIME(0)to disable auto-refresh entirely. -
invalidate_queryis a SQL query whose result ClickHouse treats as a watermark with massive load reduction for slow-moving dictionaries. It compares against the previous result and the dict only refreshes when the watermark moves. Pattern:
SOURCE(POSTGRESQL(... invalidate_query 'SELECT max(updated_at) FROM products' ...))
update_fieldis the incremental option. ClickHouse appendsWHERE update_field >= last_update_timeto the source query and pulls only changed rows. Worth its weight in gold for large dictionaries where a full reload is operationally painful.
The atomicity guarantee is worth knowing. Queries see either the old version or the new version of a dict, never a mix. Failed refresh keeps the previous snapshot live and only logs the error. Monitor system.dictionaries.last_exception because nothing else will tell you a refresh broke.
The dictGet family
The function you call most of the time is dictGet. The variants exist for a reason:
dictGet(name, attr, key)is the base call: it returns the attribute value for the key. On a miss, it falls back to the default declared in the dictionary's attribute definition (or the type's zero value if none is set).dictGetOrDefault(name, attr, key, default)returns a typed fallback for missing keys. Prefer this when missing keys are normal in your data; it keepsNULLhandling out of downstream queries.dictGetOrNullreturnsNULLon miss. Right choice when missing should be distinguishable from a legitimate empty/zero value (v21.4+).dictHasis a yes/no check, cheaper thandictGetwhen you only need to know whether a key exists.dictGetAllreturns all matching attribute values forregexp_treedictionaries that may match multiple patterns (v23.5+).dictGetKeysis the new reverse lookup: value to keys. Available in v25.12+. Useful when a JOIN-style operation is really "find all keys whose attribute matches this value."- The hierarchical family (
dictGetHierarchy,dictGetChildren,dictGetDescendants,dictIsIn) handles parent-child dictionaries.
Type-specific variants exist too: dictGetUInt64, dictGetString, dictGetIPv4, etc. They return typed scalars without dynamic-type wrapping, which matters on hot paths. The polymorphic dictGet is more readable; the typed ones are faster. Pick based on whether you're optimizing or expressing.
SELECT
transaction_id,
dictGetOrDefault(
'address_labels',
'label',
(chain_id, from_address),
'unknown'
) AS sender_label
FROM transfers
WHERE block_date = today();
Dictionaries in ObsessionDB
In self-hosted ClickHouse, in-memory dicts live in RAM on every replica that may serve the query. ObsessionDB's architecture is decoupled, with data on S3 and compute on stateless nodes, but dictionaries are the deliberate exception.
Each compute node holds a full in-memory copy of every dict, refreshed exactly the way a self-hosted single-node instance would do it. S3 round-trips would defeat the entire point of dictGet (sub-millisecond enrichment), so we accept the RAM cost on every node and keep its behavior identical to vanilla ClickHouse.
For the article you're reading, this is what it means in practice: a 6,506-row dim takes 1.62 MiB of RAM and loads in 27 milliseconds on every node. The numbers below come from running dictGet on a stateless compute node reading 40 GB from S3-backed storage.
dictGet vs JOIN, on our own cluster
We ran the comparison this week. ClickHouse 26.3.9.1 LTS. ObsessionDB SharedMergeTree. The standard clickbench hits_10b table (10 billion rows, 54 GiB) joined against a 6,506-row synthetic counter dim. Filesystem cache disabled per query for cold-vs-cold fairness. Output discarded via outer aggregation, so client-side overhead doesn't pollute the numbers. Three runs each, log_comment markers, results pulled from system.query_log.
| Approach | Wall time | Memory |
|---|---|---|
| Baseline (no enrichment) | 11.0 s | 22 MiB |
dictGet (HASHED) | 11.5 s | 22 MiB |
dictGet (FLAT) | 11.3 s | 22 MiB |
JOIN parallel_hash (default 26.3) | ~225 s | 52 MiB |
JOIN hash (single-threaded) | 141–212 s | 40 MiB |
The headline reads itself: dictGet is statistically indistinguishable from the no-enrichment baseline. The lookup happens for every row of a 10-billion-row scan and adds zero measurable overhead. The equivalent JOIN takes about 20× longer with 2.4× the memory.
Now the part I didn't expect. Single-threaded hash was faster than parallel_hash on this workload, and parallel_hash is the default in 26.3. The reason: with a 6,506-row right side, the parallel hash-table-build coordination overhead doesn't pay off. The single-threaded build finishes faster than the parallel orchestration. The 24.12 default flip from hash to parallel_hash is the right call for most dim sizes, but small-dim workloads benefit from explicitly setting join_algorithm='hash'. Always benchmark against your actual data shape before trusting defaults.
How does the gap evolve as the dim grows? We ran the comparison at 1K, 6.5K, 10K, 100K, and 1M dim sizes:
The dictGet curve stays in the 10–23 second band as dim size scales 1K→1M; the JOIN curve sits in the 100–225 second band. The variance on the JOIN side at the 6,506-row dim, between the original 225s and a later 105s run, is real and cluster-load-dependent. Even at the favorable end, JOIN is ~10× slower.
When dictGet doesn't win
The 20× headline is true at scale on the right shape of problem. There's a shape where it isn't, and we have internal benchmark data on it from a quantile pipeline we ran a while back.
Threshold table: ~500 rows. Fact table: full table scan. Tested four combinations of percentile function and lookup strategy:
| Approach | Pipeline latency |
|---|---|
quantile + window-function JOIN | 21 ms (winner) |
quantileTDigest + dictionary lookup | 49 ms |
quantileTDigest + window-function JOIN | 93 ms |
quantile + dictionary lookup | 436 ms |
Why the JOIN won: with a 500-row right side and a window function on the left, ClickHouse compiles the join plan once. Hash the small threshold table once, probe in lockstep, done. dictGet is a per-row function call with no bulk optimization. On dim tables small enough that JOIN's hash-table build is essentially free, JOIN beats dictGet because dictGet's per-row overhead doesn't get amortized.
Rule of thumb: dicts win when the dim is big enough that the per-row lookup is cheap relative to the JOIN's hash-build cost. On dim tables of a few hundred rows accessed via window functions, the math flips.
JOINs: today's behavior
The query-optimization post had the full algorithm matrix for JOINs in ClickHouse. Read that for the comparison table; we'll keep this section to the operational nuance.
The right-table-in-memory rule is the key one, a hash JOIN puts the right side into a hash table. For this reason, always put the smaller table on the right. The IMDB benchmark from ClickHouse's own blog post puts a number on it: 1M actors on the right runs in ~0.8s with 716 MB; 100M roles on the right takes 5+ seconds and 8.95 GB. For the same JOIN and same result, that's 6× slowdown purely from table order.
Regarding algorithm selection, parallel_hash is the default, and that's the right one for most dim sizes (more than ~5K rows). For small dims, see the surprise from the benchmark section above: explicit join_algorithm='hash' can beat the default by ~30%. For tables physically sorted on the join key, full_sorting_merge skips the sort entirely and runs at hash-join speed with much less memory. For right sides that don't fit in RAM, grace_hash spills to disk in tunable buckets. partial_merge is the memory-of-last-resort.
There are some extra specific learnings that can impact your joins depending on the specific cluster conditions. For example, in environments where duplicates can happen ANY versus ALL is the silent Cartesian killer. If your right side has K duplicates per join key, ALL multiplies the left's row count by K as any default database, but ANY can mitigate this with a low effort fix:
-- Table B has 3 rows for id=1
SELECT a.id, b.value FROM A INNER JOIN B ON a.id = b.id;
-- Returns 3 rows. Silent multiplication.
SELECT a.id, b.value FROM A INNER ANY JOIN B ON a.id = b.id;
-- Returns 1 row. What you wanted.
The production failure mode is depressingly common: a dim table accumulates duplicates over time (slowly changing dimensions, late inserts, missed deduplication), and a JOIN that ran fine for months suddenly OOMs or returns a 10× row count. Switch to ANY to mitigate quickly, but deduplicate the dim or model it as a dictionary to fix.
GLOBAL JOIN on distributed setups is also something to keep in mind: the cost model is right_size × num_shards bytes broadcast on every query. Worth it when the right side is genuinely small (low-MB) and changes too often to replicate.
Enrich at ingest, not at query
The fastest lookup at read time is the one you already did at write time.
Everything above assumes you resolve the lookup when the query runs. dictGet makes that cheap, but cheap isn't free, and you pay it on every query that touches the column. When the enrichment is stable, a label, a category, or a denormalized attribute, there's a better move: resolve it once, on the way in, and store the result. Read-time queries see a plain column with nothing left to resolve.
There are two places to do it. The first is upstream, before the data ever reaches ClickHouse. If your producer, stream processor, or ETL job already has the reference data on hand, attach the label there and let the rows arrive enriched. ClickHouse stores what it's given and never looks anything up. This is often the simplest path when you're already running something like Kafka with a consumer, or Flink, in front of the database.
The second is in-database, when the reference data is best kept as a ClickHouse dictionary. Put dictGet in a materialized view: rows land in the raw table, the MV fires, the lookup runs once per row at insert, and the resolved value gets written into the target table.
CREATE MATERIALIZED VIEW transfers_enriched_mv TO transfers_enriched AS
SELECT
transaction_id,
chain_id,
from_address,
dictGetOrDefault(
'address_labels', 'label',
(chain_id, from_address), 'unknown'
) AS sender_label
FROM transfers_raw;
Either way, the tradeoff is that data is frozen. Query-time dictGet always reflects the current dictionary. A value enriched at ingest reflects the reference data as it was when the row landed. Relabel a user tomorrow, and every row already written keeps yesterday's label until you backfill. So the rule splits cleanly:
- Enrich at ingest when the historical value is the correct one, or close enough that staleness doesn't bite. Transaction labels, geo lookup at event time, and the category a product had when it was sold.
- Keep query-time
dictGetwhen you need the always-current answer. Live ownership, current tier, anything where "what is it now" beats "what was it then."
The decision tree
The takeaway artifact is what you should learn from this article.
Need to combine data from two sources?
│
├── Reference data (lookup), updates infrequently, fits in RAM
│ └── DICTIONARY
│ ├── Dense UInt64 keys < 500K → flat
│ ├── Composite keys → complex_key_hashed
│ ├── Time-ranged (rates, ownership) → range_hashed
│ ├── CIDR / IP ranges → ip_trie
│ └── Many attributes per key → complex_key_hashed_array
│
├── Reference data doesn't fit in RAM
│ └── DICTIONARY with cache or ssd_cache
│
├── Two analytical tables, ad-hoc, both large
│ ├── Right is Dict / RocksDB / Join engine, ANY ok → direct
│ ├── Right fits in RAM, dim ≥ ~5K rows → parallel_hash
│ ├── Right fits in RAM, dim < ~5K rows → hash (less coordination)
│ ├── Both share physical order on join key → full_sorting_merge
│ ├── Right doesn't fit → grace_hash
│ └── Distributed, right on every shard → GLOBAL JOIN
│
└── Stable lookup, read constantly, dim barely changes
└── Enrich at INGEST → pre-enrich upstream, or dictGet in an MV
(value frozen into the row, zero query-time lookup)
Next time you reach for LEFT JOIN, pause and run this tree first. The default (LEFT JOIN against a hash-built right side) is the right call maybe 20% of the time. The rest of the time, dicts win.
What's next
dictGet is the right default for enrichment at query time. JOIN is the escape hatch when the data shape doesn't fit a dictionary. And when the value is stable, the cheapest lookup of all is the one you resolve once at ingest and never run again.
These levers work on any ClickHouse deployment, including ObsessionDB.
Continue Reading
Originally written for obsessionDB. Read the original article here.
ClickHouse is a registered trademark of ClickHouse, Inc. https://clickhouse.com