Reproducible Benchmark · NAB + ETT Datasets

33,000–48,000 Predictions/Second.
Lower Error Than Deep Learning.

DriftMind was benchmarked against Adaptive ARIMA, Prophet, and OneNet — the leading deep learning architecture for online forecasting. It ran on a standard CPU with no GPU, no pre-training, and no retraining.

33–48K
Predictions per second (CPU)
Across 4 NAB datasets
0.82
MAE on NAB Machine Temp.
Best accuracy on drift-heavy data
140×
Faster than OneNet (GPU)
25 seconds vs 58 minutes
0
Retraining cycles
vs up to 17,486 for Prophet

Benchmark 1 — NAB Multi-Dataset

Three-Way Direct Comparison

All three models were evaluated on six datasets from the Numenta Anomaly Benchmark (NAB) — spanning machine sensors, cloud infrastructure, urban demand, ad-exchange pricing, and social-media volume — at three forecast horizons (h = 1, 6, 24). Same streaming protocol, same hardware, no HTTP overhead, no artificial batching. Results are honest and reported per horizon: ARIMA is competitive on stable, non-seasonal signals at short horizons; DriftMind dominates seasonal multi-step forecasting and is 46–635× faster even at its most demanding horizon (and over 1,000× faster at h = 1) with zero retraining.

NAB — Machine Temperature System Failure · 22,695 points · 5-min interval

Model MAE (lower is better) Throughput (pred/s) Total Time Retraining Cycles
DriftMind Accuracy Speed 0.8526 10,274 2.21 s Continuous
Adaptive ARIMA 0.8529 47 483.10 s 901
Triggered Prophet 2.9901 13 1,740.02 s 4,105

Methodology. DriftMind uses one engine per horizon, with the input window sized to the horizon (h = 1 → 15 steps, h = 6 → 30, h = 24 → 120); MAE, throughput and time are therefore reported per horizon. ARIMA and Prophet have no input-window parameter — they fit once on the 200 most-recent points and are scored at all three horizons from the same forecast vector, so their throughput, total time and retraining count are engine-level (unchanged across the horizon tabs) while only their MAE varies. ARIMA / Prophet "Total Time" covers a single all-horizon engine pass; DriftMind "Total Time" is for that horizon's engine alone — a conservative comparison for DriftMind.

Key finding: On seasonal workloads (NYC Taxi, CPU Utilization, Twitter Volume), DriftMind wins accuracy at every horizon — by 1.4–5× at h = 6 and h = 24 — and is orders of magnitude faster. On non-seasonal, low-amplitude signals (Machine / Ambient Temperature, Exchange-2) ARIMA is competitive or better at short and long horizons, where its drift model fits well; the gap is single-digit percent at h = 1. Throughput is decisive on every dataset and horizon: DriftMind sustains thousands of predictions per second with zero retraining, while ARIMA triggers 5,837 retrains on CPU Utilization and Prophet 17,486 — turning a sub-10-second DriftMind run into tens of minutes.

Mean Absolute Error

Throughput (predictions / second) — log scale

Total Execution Time (seconds) — log scale

Retraining Cycles

Benchmark 2 — ETTh2 & ETTm1 vs OneNet

DriftMind vs. State-of-the-Art Deep Learning

DriftMind was benchmarked against OneNet, the leading deep learning architecture for online time series forecasting under concept drift (NeurIPS 2023), which itself outperforms PatchTST, FEDformer, FSNet, and OnlineTCN across all reported settings. OneNet ran on an NVIDIA RTX 3080 Ti GPU. DriftMind ran entirely on CPU.

ETTh2 — Hourly Electricity Transformer Temperature

Model MAE MSE Total Runtime Hardware Warm-up Required
DriftMind MAE MSE Speed 0.232 0.145 00:00:25 CPU only None (cold-start)
OneNet (NeurIPS 2023) 0.348 0.380 00:58:32 RTX 3080 Ti GPU 25% dataset

ETTm1 — 15-minute Interval (Higher Frequency)

Model MAE MSE Total Runtime Hardware Warm-up Required
DriftMind 0.218 0.138 00:03:56 CPU only None (cold-start)
OneNet (NeurIPS 2023) MAE/MSE 0.187 0.082 01:48:11 RTX 3080 Ti GPU 25% dataset
What this shows: On ETTh2, DriftMind is both faster and more accurate than OneNet — the strongest available deep learning forecaster under concept drift. On ETTm1 (the larger, higher-frequency dataset), OneNet achieves a marginal accuracy edge at longer horizons, while DriftMind remains 27–28× faster and requires no warm-up window. DriftMind produces valid forecasts from the very first observation. OneNet requires 25% of the dataset as a pre-training window before inference begins.

MAE — ETTh2 (Horizon = 1)

Runtime — ETTh2 (seconds, log scale)

What the Numbers Mean

The Latency of Learning

The benchmark exposes not just a performance gap, but a structural one. The real constraint in production systems is not how well a model predicts — it is the delay between a change in the system and the model's ability to incorporate it.

Retraining Lag

ARIMA and Prophet spend most of their time rebuilding internal models as new data arrives. Between retraining cycles, the model operates on assumptions that are already becoming outdated. At scale — thousands of time series — this lag becomes structural.

GPU Dependency

OneNet achieves competitive accuracy but requires a high-end GPU and 25% of the dataset as a warm-up window. In edge environments, new sensor deployments, or cold-start scenarios, this dependency is a hard blocker.

Continuous Adaptation

DriftMind does not pause, does not retrain, and does not reprocess history. Each new observation updates the internal representation immediately. Learning and inference are the same operation — happening as data flows, at 33,000–48,000 predictions per second.

CPU-Only Economics

Running on commodity hardware eliminates GPU cluster costs. In telecom, IoT, and industrial environments — where thousands of independent time series must be monitored simultaneously — this changes what is economically and operationally feasible.

Benchmark Methodology

Setup & Reproducibility

Both benchmarks were designed to be as fair and straightforward as possible. All models were exposed to the same data in the same streaming order. No cherry-picking of segments. No hyperparameter tuning on test data.

1

NAB Dataset

Machine Temperature System Failure series from the Numenta Anomaly Benchmark. 22,695 data points. Continuous streaming simulation — no HTTP overhead, no artificial batching.

2

ETTh2 & ETTm1 Datasets

Publicly available electricity transformer temperature datasets. Univariate (OT column) for fair comparison with OneNet's published protocol. MAE/MSE computed on standardised data as per OneNet paper.

3

Hardware

DriftMind: Intel Core i7-12700K @ 3.60GHz, 32 GB DDR4, JVM 17 (OpenJDK). CPU-only. OneNet: same machine, NVIDIA RTX 3080 Ti (12 GB), PyTorch 2.1.0 with CUDA.

4

Cold-Start Condition

DriftMind benchmarks were conducted with no warm-up and no pre-loaded history. Prediction begins from the very first observation. OneNet uses the first 25% of each dataset as its mandatory pre-training window.

5

OneNet Replication

The official OneNet GitHub implementation was cloned and executed without modification. Published MAE and MSE results were successfully replicated, confirming benchmark integrity.

6

DriftMind Settings

Input length: 20–60. Max clusters: 200. Sliding window gap rate: 2.0. These defaults were fixed before benchmark runs and not tuned on test data.

Reproduce the NAB benchmark yourself — run the full comparison locally with a single Docker command, then open the Jupyter notebook:
docker run -p 8080:8080 -p 8888:8888 thngbk/driftmind-benchmark # Then open: http://localhost:8888/notebooks/dm_vs_arima_prophet.ipynb

The academic paper detailing the full architecture and ETT benchmark methodology is available here: DriftMind: A Self-Adaptive, Cold-Start Framework for Time Series Forecasting.

See It Running on Your Data

Run DriftMind on your own time series — CSV upload, no setup, no GPU required. Results in seconds.

 Try with Your CSV  Read the Architecture