Celestia Data
Data Analytics · Data Engineering · BigQuery · DBT
The Celestia Foundation needed answers about their own network that no existing tool could provide. They'd built the leading data availability layer, rollups were posting data every day, but nobody could easily answer basic questions. How much data is being posted? Which rollups are the biggest consumers? What do the token economics look like? Getting those answers meant building custom infrastructure, and they didn't want to divert their engineering team from protocol work to do it.
The problem
Generic blockchain explorers are built for transaction-level queries on monolithic chains. Celestia is a DA layer, a fundamentally different data model. Rollups posting blobs, paying in TIA, with economics that revolve around data volume rather than transaction counts. None of the existing tools understood that architecture, and adapting them would have meant fighting the abstractions the whole way.
The product
We designed the dashboard around the specific questions the Foundation kept asking in meetings, manually calculating in spreadsheets, and getting asked by investors. DA layer economics are fundamentally about data volume and cost efficiency. Every rollup cares about how much it costs to post data here versus alternatives. We made that comparison front and center.
Three focus areas:
- DA Layer Economics. Total data posted per rollup, TIA revenue from blob submissions, cost comparisons against alternatives. Currently tracking 3.6TB across 51 networks.
- Rollup Health. Each network gets its own profile: data volume trends, transaction activity, TVS. Heatmaps for cross-rollup comparisons because the Foundation wanted ecosystem health at a glance, not individual charts one by one.
- Token Dynamics. Staking distribution, inflation rate, APR, and the relationship between DA fee revenue and inflation rewards. Not just the sticker numbers, but how everything connects.
The architecture
The data foundation comes from NumiaSQL, our indexed blockchain data warehouse that already handles ingestion, reorg handling, and normalization for 30+ chains including Celestia. Instead of building data pipelines from scratch, we built DBT transformation models on top of NumiaSQL's clean tables, adding Celestia-specific aggregations: blob submission metrics, per-rollup cost breakdowns, token economics calculations.
BigQuery handles the heavy transformations through incremental models. The ecosystem keeps growing. 51 networks now, more joining regularly, and incremental updates keep costs predictable instead of scaling linearly with data volume. That matters when you're processing terabytes.
For the serving layer, we sync processed data into Postgres. BigQuery is great for batch transformations, but too slow for a dashboard that needs sub-second responses. Postgres gives us p99 under 150ms on all views. Each layer does what it's best at.
Results
The dashboard is fully public at celestiadata.com. Anyone can check Celestia's DA metrics without running infrastructure or writing a single query.
The Celestia team uses it as their primary source for ecosystem health. Internal meetings, investor updates, public reports, all pulling from the same data. One source of truth instead of scattered spreadsheets and ad-hoc queries.
Rollups building on Celestia use it to benchmark against the rest of the ecosystem: data volume, costs, activity trends. It became the reference point for teams deciding where to post their data.
What I learned
Purpose-built tools beat generic solutions when the domain is different enough. We could have spent months customizing a general blockchain explorer to sort-of work for a DA layer. Instead, we sat with the Foundation until we understood their real questions: what kept coming up in meetings, what they were manually calculating, what investors kept asking. Then we built exactly that. The best features weren't the ones we imagined, they were the ones that solved problems we heard about directly.
Building on top of NumiaSQL's data layer made the whole project feasible within the timeline. The hard parts of blockchain data (ingestion, reorgs, schema changes) were already solved. We spent our time on Celestia-specific models and the dashboard itself rather than data plumbing.
Once the Foundation started using the dashboard for investor decks and public reports, accuracy became non-negotiable. Every number had to be right, every metric consistent, every update reliable. We built reconciliation checks and anomaly detection on our own outputs. Being the canonical source for an ecosystem's metrics means trust is earned through boring, relentless correctness.