Skip to main content

DEX Anomaly Detection

Real-time ML pipeline running on GCP that detects trading anomalies in Osmosis, estimates market impact, and triggers millisecond-level alerts to anticipate price moves.

Streaming ML on GCP with Pub/Sub, Cloud Functions, BigQuery, and Vertex AI; unsupervised models (dense autoencoder, OCSVM, Isolation Forest) evaluated via Silhouette and Mann-Whitney; end-to-end latency under a second.

Anomaly DetectionOn-Chain AnalyticsDeFiMLOpsReal-Time Systems
Silhouette 0.909Latency < 1s from tx to alert10M+ tx/dayValidated with Mann-Whitney
GCPPub/SubCloud FunctionsBigQueryPythonFirestoreTensorFlowscikit-learn

Duration

Introduction

Built a real-time ML pipeline to detect on-chain trading anomalies in Osmosis, estimate market impact, and trigger alerts in milliseconds—surfacing signal before price moves are obvious. The system runs streaming ingestion on Pub/Sub and Cloud Functions with BigQuery and Vertex AI for training/serving, leveraging unsupervised models validated by Silhouette and Mann-Whitney tests.

GCP pipeline architecture

The Challenge

The challenge was to separate signal from noise in a continuous stream of millions of transactions. Beyond working without labels (unsupervised) and handling pattern drift, the infrastructure had to operate with sub-second latency to turn data into truly actionable signal.

Solution & Approach

The solution was an end-to-end research platform that combines unsupervised models with a real-time infrastructure designed to be reliable and efficient:

ML/DL Model Research

  • Dense autoencoder, One-Class SVM, and Isolation Forest as primary models.
  • K-Means and baseline approaches for contrast; outlier ratio fixed at 5% for tests.
  • Feature engineering: gas patterns, wallet clustering, cross-chain activity.
  • Label-free evaluation with Silhouette Score and population comparison.

Real-Time Infrastructure (GCP)

  • Pub/Sub for ingestion from blockchain nodes.
  • Cloud Functions processing transactions with <1 s latency.
  • Firestore for real-time alert delivery to dashboards.
  • BigQuery for historical analysis and model training.

MLOps & Automation

  • Vertex AI for automated training and deployment.
  • CI/CD for zero-downtime model releases.
  • A/B testing across model versions.
  • Monitoring for prediction accuracy and latency.
How a liquidity pool works
OSMO price distribution
Distribution of OSMO amount per trade
Dense autoencoder diagram

Results & Impact

The platform consistently surfaced actionable patterns: dense autoencoder achieved a Silhouette score of 0.909, alerts fired in under 5 seconds from confirmation, and whale accumulation was detected ahead of rallies. Sub-second processing and automated deployments turned research into repeatable intelligence suitable for production-grade anomaly detection.

Research Findings

  • Silhouette 0.909 with dense autoencoder; stronger results than SVM and Isolation Forest.
  • Alerts in <5 s from transaction confirmation.
  • Consistently detected whale accumulation patterns prior to rallies.
  • Mann-Whitney tests showed significant differences in volatility, price, and volume.

Technical Achievement

  • End-to-end ML pipeline with sub-second latency.
  • Automated releases: hours to minutes per model update.
  • Efficient GCP architecture processing millions of transactions.
  • Reusable framework for on-chain ML research.

The project shows that real-time on-chain analysis can anticipate actionable market intelligence. While research-focused, the infrastructure and models are applicable to production environments to detect significant moves before they are broadly recognized.