DEX Anomaly Detection

Real-time ML pipeline running on GCP that detects trading anomalies in Osmosis, estimates market impact, and triggers millisecond-level alerts to anticipate price moves.

Streaming ML on GCP with Pub/Sub, Cloud Functions, BigQuery, and Vertex AI; unsupervised models (dense autoencoder, OCSVM, Isolation Forest) evaluated via Silhouette and Mann-Whitney; end-to-end latency under a second.

Anomaly DetectionOn-Chain AnalyticsDeFiMLOpsReal-Time Systems

Silhouette 0.909Latency < 1s from tx to alert10M+ tx/dayValidated with Mann-Whitney

GCPPub/SubCloud FunctionsBigQueryPythonFirestoreTensorFlowscikit-learn

Duration

Jan 2023 Jun 2023

Introduction

Built a real-time ML pipeline to detect on-chain trading anomalies in Osmosis, estimate market impact, and trigger alerts in milliseconds—surfacing signal before price moves are obvious. The system runs streaming ingestion on Pub/Sub and Cloud Functions with BigQuery and Vertex AI for training/serving, leveraging unsupervised models validated by Silhouette and Mann-Whitney tests.

Real-time ML pipeline architecture — GCP pipeline architecture

The Challenge

The challenge was to separate signal from noise in a continuous stream of millions of transactions. Beyond working without labels (unsupervised) and handling pattern drift, the infrastructure had to operate with sub-second latency to turn data into truly actionable signal.

Solution & Approach

The solution was an end-to-end research platform that combines unsupervised models with a real-time infrastructure designed to be reliable and efficient:

ML/DL Model Research

Dense autoencoder, One-Class SVM, and Isolation Forest as primary models.
K-Means and baseline approaches for contrast; outlier ratio fixed at 5% for tests.
Feature engineering: gas patterns, wallet clustering, cross-chain activity.
Label-free evaluation with Silhouette Score and population comparison.

Real-Time Infrastructure (GCP)

Pub/Sub for ingestion from blockchain nodes.
Cloud Functions processing transactions with <1 s latency.
Firestore for real-time alert delivery to dashboards.
BigQuery for historical analysis and model training.

MLOps & Automation

Vertex AI for automated training and deployment.
CI/CD for zero-downtime model releases.
A/B testing across model versions.
Monitoring for prediction accuracy and latency.

OSMO price distribution where supports and resistances are visible — OSMO price distribution

Distribution of OSMO amount per trade where small amounts predominate — Distribution of OSMO amount per trade

Dense autoencoder diagram with n encoding and decoding layers — Dense autoencoder diagram

Results & Impact

The platform consistently surfaced actionable patterns: dense autoencoder achieved a Silhouette score of 0.909, alerts fired in under 5 seconds from confirmation, and whale accumulation was detected ahead of rallies. Sub-second processing and automated deployments turned research into repeatable intelligence suitable for production-grade anomaly detection.

Research Findings

Silhouette 0.909 with dense autoencoder; stronger results than SVM and Isolation Forest.
Alerts in <5 s from transaction confirmation.
Consistently detected whale accumulation patterns prior to rallies.
Mann-Whitney tests showed significant differences in volatility, price, and volume.

Technical Achievement

End-to-end ML pipeline with sub-second latency.
Automated releases: hours to minutes per model update.
Efficient GCP architecture processing millions of transactions.
Reusable framework for on-chain ML research.

The project shows that real-time on-chain analysis can anticipate actionable market intelligence. While research-focused, the infrastructure and models are applicable to production environments to detect significant moves before they are broadly recognized.