Datalenses
Data Analytics · Data Pipelines · BigQuery · DBT
Datalenses was our first data product at Numia. An analytics dashboard for the Cosmos ecosystem covering Osmosis, Cosmos Hub, dYdX, Celestia, and several other chains. Protocol teams and institutions used to dig through raw blockchain data or build their own indexers just to get basic metrics. We gave them aggregated analytics they could check daily. Built on top of NumiaSQL, our indexed blockchain data warehouse.
The problem
Blockchain data is public, but "public" and "usable" are very different things. If you wanted meaningful insights from Cosmos chain data, your options were grim: spin up your own infrastructure and maintain it forever, or write raw SQL against blockchain tables that weren't designed for analytics. Neither worked for protocol teams tracking growth or institutions running due diligence.
What was missing was an analytics layer. Not another block explorer for looking up individual transactions, but aggregated metrics: TVL trends, trading volumes, user retention, cross-chain IBC flows. The kind of data that helps teams make decisions.
The product
We sat with protocol teams and investors until we understood what they actually needed day to day. The dashboard covers three areas:
- Chain metrics. TVL, trading volume, cross-chain IBC flows, and liquidity data per chain. The daily overview that protocol teams and investors check to track ecosystem health.
- Historical snapshots. Export balances and metrics as CSV for any point in time. Governance participants use it before votes, institutions pull it for research reports.
- Calculators. Tools like the Celestia cost savings estimator that let teams model scenarios against real data.
The architecture
We did the math on real-time analytics across 5+ chains. It came out to $50k+/month in compute. We were a startup, that budget didn't exist.
So we asked: do our users actually need real-time data? We talked to protocol teams. They check dashboards once a day, maybe twice. Institutions run reports weekly. The answer was no. Most analytics use cases work fine with 15-minute to 1-hour latency. That realization cut our infrastructure bill by roughly 80%.
The data foundation comes from NumiaSQL, which already handles ingestion, reorg handling, and normalization for the Cosmos chains we cover. Instead of building data pipelines from scratch, we built DBT transformation models on top of NumiaSQL's clean tables, adding Datalenses-specific aggregations. The system has three layers:
- Ingestion and transformation. NumiaSQL handles the raw blockchain data capture across 5+ Cosmos chains into BigQuery. On top of that, DBT incremental models transform the data into dashboard-specific metrics on scheduled intervals. Data quality checks validate outputs before serving.
- Query layer. BigQuery is too slow for dashboard reads, so we sync processed data to Postgres for sub-second responses. p99 queries under 150ms.
- Dashboard. The frontend layer serving aggregated metrics to protocol teams, investors, and the broader community.
Results
Datalenses was Numia's first analytics product, and it proved the model. Osmosis adopted it as their official metrics source and deprecated their own info page. Institutions started referencing our numbers in research reports. The snapshot tool saw heavy traffic around governance votes and airdrops, a use case we didn't design for but that found us anyway.
- p99 dashboard queries under 150ms
- ~80% cost reduction vs. real-time processing
- Adding new chains takes days, not weeks
- Data quality reliable enough for institutional reports
The approach worked well enough that we spun off two specialized products from it. Celestia Data adapted the same model to the specific needs of a DA layer, where the metrics revolve around blob submissions and rollup economics rather than DeFi activity. Token Pulse took the opposite direction, pushing latency below one second for real-time holder behavior and exchange flow tracking. Both built on the same foundation but served use cases that a general-purpose dashboard couldn't handle well enough on its own.
What I learned
Batch processing isn't exciting. Nobody gets on stage to talk about scheduled DBT runs. But that 80% cost reduction is the reason Datalenses exists today instead of being a burned-out experiment we couldn't afford. We watched competitors launch with real-time dashboards and shut down six months later when the cloud bills caught up.
Every early conversation with protocol teams included "we need real-time data." But when we looked at actual behavior, it was daily check-ins and weekly reports. The snapshot tool, which we almost didn't build because nobody asked for it, turned into one of the most-used features during governance votes and airdrops. Listening to requests matters, but observing behavior is where real product insights come from.
Building on NumiaSQL meant we spent our time on Datalenses-specific models and the dashboard itself rather than data plumbing. The hard parts of blockchain data, ingestion, reorgs, schema changes, were already solved. That let a small team ship a multi-chain analytics product that would have taken months longer if we'd started from scratch.