NumiaAI
AI Agents · Real-Time · BigQuery · ClickHouse
Blockchain data is hard to work with. Dashboards that crawl, SQL queries that break every time a chain upgrades, cached snapshots that go stale before you even look at them. We wanted something radically simpler: ask a question in plain English, get a real answer from live on-chain data. No batching, no waiting. So we built it.
NumiaAI launched in partnership with dYdX and now serves protocols and institutions that want answers, not dashboards. You can ask "What's the current TVL on Osmosis?" or "Show me the biggest trades on dYdX in the last hour" and get live data back in seconds. The real challenge was making it reliable enough to trust in production: 99.9% uptime, p99 under 200ms, thousands of queries handled every day.
The Problem
Most blockchain data tools assume you know SQL or can navigate complex dashboards. That works fine for analysts. It falls apart for a fund manager who just needs a quick check, or a protocol team that wants to monitor their metrics without hiring a data engineer.
Connecting LLMs to live financial data is risky in ways people underestimate. Models hallucinate. "The current price of ATOM is $47.32" sounds perfectly convincing even when it's entirely fabricated. Queries can get expensive fast if you don't watch them. And freshness matters more than people think: nobody wants yesterday's price when they're making a decision right now.
So the challenge was never about building a chat interface. It was about getting deterministic, verifiable answers out of probabilistic models while keeping latency low and costs under control.
The Product
We designed and built the data infrastructure and AI controller from scratch. Three layers that work together: live data pipelines, a semantic layer for consistency, and an AI controller with serious guardrails.
The BigQuery warehouse runs continuous ingestion from multiple Cosmos chains. The trick was finding the right balance between freshness and cost. Price data has to be current. Historical volume can afford to lag a bit. We defined freshness SLAs per table type so we're not burning money on real-time updates for data that doesn't actually need it.
DBT handles all the transformations. Pre-aggregated tables for common queries keep reads under 200ms. The semantic layer standardizes definitions across protocols so that TVL means the same thing whether you're asking about Osmosis or dYdX. It sounds obvious, but honestly, getting that right took way more work than we expected.
On the AI side, this is where natural language turns into SQL. The controller routes between GPT and Gemini depending on the query type and cost constraints. Simple lookups go to the cheaper model. Complex analysis gets routed to the bigger one.
Prompt schemas constrain the output format tightly. The model can't just freeform hallucinate on financial data because we force it to return structured queries that we validate before executing anything. If a query doesn't make sense (asking about a chain we don't index, or requesting data that doesn't exist), it gets rejected with a clear explanation instead of returning garbage.
The Architecture
Retry logic handles model failures gracefully. If GPT returns something malformed, we fall back to Gemini. If both fail, we return an honest error instead of a confident wrong answer. In production, reliability always beats optimism.
The system is three layers deep. At the bottom, BigQuery with continuous ingestion and DBT transformations. In the middle, the semantic layer that standardizes metrics across chains and protocols. On top, the AI controller that takes natural language, generates validated SQL, routes between models, and returns structured answers.
Everything runs on GCP. The data layer refreshes on per-table SLAs. The AI layer is stateless and horizontally scalable. Monitoring covers the full path from ingestion lag to model response time to end-user latency.
Results
The system has been running in production since the dYdX launch:
- 99.9% uptime since launch
- p99 latency under 200ms on AI-backed queries
- Thousands of queries processed daily
- Adopted by protocols including dYdX and institutional data teams
What I Learned
LLMs without structure are just confident liars. Point a language model at raw blockchain data and it will happily make things up. The semantic layer is what makes the whole thing work: defined metrics, validated schemas, constrained outputs. The model needs to know what's actually possible before it tries to answer anything.
We spent a lot of early effort chasing blanket real-time freshness before realizing it was wasteful. Not everything needs to be current to the second. Figuring out which data does and which can lag a bit saved us significant compute costs without anyone noticing a difference.
The most underrated design decision was teaching the system to say "I don't know." Users trust it because it tells them honestly when it can't answer something. A confident wrong answer destroys trust way faster than a straightforward "I don't have that data." We built the guardrails not as restrictions but as a core part of what makes the product reliable.