DEX Anomaly Detection
Real-time ML pipeline running on GCP that detects trading anomalies in Osmosis, estimates market impact, and triggers millisecond-level alerts to anticipate price moves.
Streaming ML on GCP with Pub/Sub, Cloud Functions, BigQuery, and Vertex AI; unsupervised models (dense autoencoder, OCSVM, Isolation Forest) evaluated via Silhouette and Mann-Whitney; end-to-end latency under a second.
Duration
Introduction
Built a real-time ML pipeline to detect on-chain trading anomalies in Osmosis, estimate market impact, and trigger alerts in milliseconds—surfacing signal before price moves are obvious. The system runs streaming ingestion on Pub/Sub and Cloud Functions with BigQuery and Vertex AI for training/serving, leveraging unsupervised models validated by Silhouette and Mann-Whitney tests.
The Challenge
The challenge was to separate signal from noise in a continuous stream of millions of transactions. Beyond working without labels (unsupervised) and handling pattern drift, the infrastructure had to operate with sub-second latency to turn data into truly actionable signal.
Solution & Approach
The solution was an end-to-end research platform that combines unsupervised models with a real-time infrastructure designed to be reliable and efficient:
ML/DL Model Research
- Dense autoencoder, One-Class SVM, and Isolation Forest as primary models.
- K-Means and baseline approaches for contrast; outlier ratio fixed at 5% for tests.
- Feature engineering: gas patterns, wallet clustering, cross-chain activity.
- Label-free evaluation with Silhouette Score and population comparison.
Real-Time Infrastructure (GCP)
- Pub/Sub for ingestion from blockchain nodes.
- Cloud Functions processing transactions with <1 s latency.
- Firestore for real-time alert delivery to dashboards.
- BigQuery for historical analysis and model training.
MLOps & Automation
- Vertex AI for automated training and deployment.
- CI/CD for zero-downtime model releases.
- A/B testing across model versions.
- Monitoring for prediction accuracy and latency.
Results & Impact
The platform consistently surfaced actionable patterns: dense autoencoder achieved a Silhouette score of 0.909, alerts fired in under 5 seconds from confirmation, and whale accumulation was detected ahead of rallies. Sub-second processing and automated deployments turned research into repeatable intelligence suitable for production-grade anomaly detection.
Research Findings
- Silhouette 0.909 with dense autoencoder; stronger results than SVM and Isolation Forest.
- Alerts in <5 s from transaction confirmation.
- Consistently detected whale accumulation patterns prior to rallies.
- Mann-Whitney tests showed significant differences in volatility, price, and volume.
Technical Achievement
- End-to-end ML pipeline with sub-second latency.
- Automated releases: hours to minutes per model update.
- Efficient GCP architecture processing millions of transactions.
- Reusable framework for on-chain ML research.
The project shows that real-time on-chain analysis can anticipate actionable market intelligence. While research-focused, the infrastructure and models are applicable to production environments to detect significant moves before they are broadly recognized.