Convea AI
Data Engineering · Data Analytics · AI agents · ClickHouse · DBT
eCommerce teams have too many dashboards. Shopify says one thing, Google Ads says another. Marketing thinks ROAS is up, Finance says revenue is flat. Everyone's looking at different numbers.
The founders came from the eCommerce world. At every company, the same problem: connecting ads data with Shopify to figure out which campaigns actually worked. They built custom pipelines, over and over. They decided to productize it. I joined to build the data infrastructure.
Convea pulled everything into one place and let AI answer questions instead of making people dig through charts. It worked. We built it. Then we couldn't sell it. The project shut down after eight months.
My Role
I owned the data architecture end to end. That meant designing how data flowed from 50+ sources into ClickHouse, building the DBT models that turned raw events into business metrics, and creating the semantic layer that made AI useful instead of confidently wrong.
The team was small, so I also handled infrastructure, monitoring, and the occasional frontend fix when something broke. Most of my time went into the modeling layer. Getting the abstractions right there determined whether the AI could actually answer questions or just hallucinate plausible-sounding nonsense.
Ingestion
We built native connectors for Shopify, Klaviyo, Google Ads, Meta Ads, TikTok, and a couple hundred other platforms. Each one is its own mess. Webhooks here, batch pulls there. Rate limits that vary wildly. Schemas that change without warning. The connectors handled backfills, incremental syncs, and automatic recovery when APIs broke.
Most of the unglamorous work was edge cases. Partial failures mid-sync. Duplicate events from retry logic. Timezone inconsistencies between platforms. API version migrations that break everything. The kind of stuff that doesn't make it into architecture diagrams but eats most of your debugging time.
Data Modeling
ClickHouse ran the analytical queries. We tuned partitioning by date ranges that matched how eCommerce teams actually look at data (daily, weekly, monthly cohorts). Sorting keys optimized for time-series rollups, cohort breakdowns, attribution windows. Materialized views pre-computed the expensive joins so dashboards loaded fast. The target was p99 under 200ms. We hit it.
DBT transformed raw events using dimensional modeling. Staging tables to clean the mess from each source. Dimension tables for customers, products, campaigns. Fact tables for orders and ad spend. Classic Kimball, every metric traced back to one source of truth. You could click on any number and see exactly where it came from. Incremental models kept costs sane when we were processing 100M+ events per day.
Semantic Layer
This is where most analytics platforms fall apart, and honestly where we spent more time than I expected. "Revenue" means something different in Shopify (gross), Stripe (net), and whatever payment processor you're using. We built a semantic layer that defined each metric once with explicit logic everyone could read.
Attribution models were configurable: first touch, last touch, linear, time decay. Campaign taxonomies got standardized across Google, Meta, and TikTok so you could actually compare them. The layer also tracked data quality. Freshness checks, row count anomalies, schema drift. If something looked wrong, you'd know before it hit a dashboard.
AI Layer
The AI sat on top of the semantic model with context about the business. Campaign history, seasonality patterns, what normal benchmarks look like, which experiments were running. When someone asked "why did ROAS drop last week," the model could actually investigate. Check if spend changed. See if a campaign paused. Look for CPM spikes or conversion rate shifts. Without that structured context feeding it, you just get a language model making confident guesses.
Natural language queries came back in under 3 seconds. The frontend showed the reasoning too, not just the answer. Which metrics it checked, what it ruled out, where the anomaly appeared. Marketers don't trust black boxes, so we made it explainable.
What We Built
The infrastructure worked. p99 under 200ms on analytical queries. 50+ connectors processing 100M+ daily events. AI insights in under 3 seconds. Metric definitions that Marketing and Finance could finally agree on. Pipeline orchestration with alerting when SLOs slipped. Monitoring across the whole stack.
We had a product. We just didn't have enough customers to make it profitable.
What I Learned
You can build something that works and still fail. We solved a real problem. But real problems aren't enough. You need buyers who feel pain urgently, who can actually sign a check, and who you can reach without burning through your runway. We had the first part.
Consistency matters more than speed in analytics. Making queries fast is the easy part. Getting "revenue" to mean the same thing across Shopify, Stripe, and your payment processor is hard. We spent more time on the semantic layer than on ClickHouse tuning.
AI without structure is useless. Point a language model at raw data and it makes things up. The semantic layer is what made it work. The model knew what metrics existed, what values were normal, what context mattered. Garbage in, garbage out applies to LLMs too.
Incremental or die. Full refreshes stop working past a certain scale. Every model, every materialization, every sync had to be incremental or we'd spend all our compute budget fighting data volume instead of adding features.
This one didn't work out. Eight months of building real infrastructure, figuring out what makes analytics actually useful, learning why distribution matters as much as product. I'd rather ship something that fails than never ship at all.