Web3 API Infrastructure
Data Architecture · Real-Time · API Infrastructure · ClickHouse
When TradingView needs blockchain data for their charts, when Osmosis users check their portfolio, when CoinGecko updates token prices, it goes through this API. We built the data layer that sits between raw blockchain transactions and state and the products people actually use. 10M+ requests per day, p99 under 150ms, 99.99% uptime over 18 months. The kind of infrastructure that only gets noticed when it breaks, and it hasn't.
The Real Problem
Blockchain nodes prune state. Yesterday's balances, last week's pool prices, the transaction you made three months ago. Most of that data no longer exists on-chain. Nodes keep current state and discard the rest. Even transactions get pruned from most nodes eventually.
So when someone asks "show me my portfolio value in USD with 24h change," there's nowhere to look it up. That data has to be captured before it disappears, indexed off-chain, and then computed on demand: fetching token balances, resolving prices across multiple DEXs, handling tokens with no direct USD pair, all in under 150ms.
When we started, every protocol was solving this independently. Building their own price feeds, writing custom indexers, spending engineering time on the same off-chain data problems instead of their core product. Most of these solutions weren't battle-tested for high volatility, which is exactly when users needed them most. We got good at solving this problem, and now we build it custom for each client.
Why This Was Hard
Blockchain state changes constantly. You can't just query "current price." You need custom algorithms that calculate it from pool states in real-time. Add 3,000+ pools in one big chain, edge cases like low liquidity or circular routes, and clients who need answers in milliseconds.
The harder part is that most useful answers require mixing current chain state with historical data. APR calculations need current pool balances plus weeks of past fee and incentive events. Concentrated liquidity positions need the current tick range plus historical fee accrual to show actual returns. Portfolio tracking needs live balances but also every past swap, stake, and claim to build transaction history. None of this lives in one place, and the current and historical parts come from completely different systems.
Each client also wanted something different. TradingView needed OHLCV data formatted for their charting engine. CoinMarketCap wanted standardized token metadata. DexScreener needed real-time pool updates. Osmosis required portfolio tracking with transaction history. We had to build custom solutions for each while keeping a shared data layer underneath.
The Architecture
A lot of moving pieces, and honestly each one was its own project. Three main layers: a data pipeline that captures blockchain state before nodes prune it, a calculation engine that turns raw indexed data into the answers clients actually need, and a reliability layer that keeps everything running at scale.
Data Pipeline
The fundamental challenge is capturing blockchain state fast enough that nothing gets lost, and storing it in a way that makes real-time queries possible. We process 1,500+ blocks per minute across 25+ chains, each block containing transactions, state changes, and events that need to be parsed and stored before the node prunes them.
- Self hosted nodes for reliable blockchain data capture
- Pub/Sub for event streaming with automatic retries and dead-letter queues
- BigQuery and DBT for complex non-real-time history
- ClickHouse cluster for real-time time-series data
- PostgreSQL for live entity state: pools, tokens, validators
- Multi-region deployment on Cloudflare with complex dynamic caching
Custom Calculation Engine
Raw indexed data is useless to a frontend. Nobody wants to parse pool state and calculate a token price themselves. Off-the-shelf solutions didn't exist for any of this, so we built custom algorithms that sit between the raw data and the API responses:
- Token pricing across 3,000+ pools per chain with multi-hop route resolution
- APR/APY calculations accounting for incentives, fees, and compounding
- Historical user balances for portfolio tracking
- Liquidity depth analysis for slippage estimation
- TVL aggregation that handles double-counting and LP token valuation
- Historical snapshots for charting and analytics providers
- Any custom calculation a client may need
Making It Reliable
When this goes down, traders see stale prices and UIs break. We serve the frontend for protocols where real money moves every second, so reliability isn't optional. The optimizations that got us to 99.99% uptime:
- Smart caching that cut latency and cost by 5x, by knowing which data can be stale
- SQL optimization that reduced database costs by 70%
- DDoS protection for 100K+ requests/sec bursts
- Rate limiting per client with graceful degradation instead of hard failures
Results
After 18 months in production:
- 99.99%+ uptime, less than 4 minutes of downtime per month
- 10M+ daily requests, p99 latency under 150ms
- 500K+ daily users across all integrated platforms
- $100M+/day trading volume flowing through these endpoints
- 5x cost efficiency vs competitors. Four protocols migrated to us after comparing
- Zero security incidents
Who Uses It
Two categories of clients with different needs:
- Data providers: TradingView, CoinMarketCap, CoinGecko, DexScreener, DefiLlama, Token Terminal. They need standardized feeds for charting and analytics.
- Protocol frontends: Osmosis, Neutron, Xion, Quasar. They need custom endpoints for portfolio tracking, transaction history, and DeFi metrics.
We also built a partner SDK that reduced integration time from weeks to hours. Onboarding used to be a multi-week project. Now it's an afternoon.
What I Learned
Eighteen months of running this taught me that reliability at scale is mostly boring work. Good caching. Proper failover. Knowing which calculations can be batched and which need real-time computation. The custom algorithms everyone asks about are maybe 20% of the work. The other 80% is infrastructure that just keeps running. Nobody is excited about cache invalidation strategies, but that's what keeps 10M daily requests flowing without anyone noticing.
The biggest architectural win was realizing every client thinks their use case is unique, but the data model underneath isn't. TradingView wants OHLCV candles. CoinGecko wants token metadata. Osmosis wants portfolio values. They all need the same thing: accurate token prices computed from pool states in real-time. Once we stopped treating each integration as a custom project and started treating it as a different view on the same engine, onboarding went from months to weeks.
If I had to point at one thing that defines this system, it's edge cases. Computing a token price when there's a clean USD pair is trivial. Computing it when you need to route through three pools, one with $500 of liquidity and another that just got exploited, that's where the real work lives. Our pricing algorithm handles thousands of these cases. Each one was a production incident that broke something for a real user. The algorithm is basically a record of every weird thing we've seen on-chain.
We also can't overstate how much monitoring mattered. We track p99 latency per endpoint, per client, per chain. When we cut database costs by 70%, it wasn't one big optimization. It was hundreds of small query rewrites, each informed by watching actual production patterns. The monitoring infrastructure took almost as long to build as the API itself, and it paid for itself within the first month.
Want to see more?
Technical docs, endpoints, and integration guides
Read about the launch and architecture decisions
Live integration powering portfolio and transaction history
Protocol frontend with real-time DeFi data
Another live integration with wallet activity feeds