Reproducible Benchmark · NAB + ETT Datasets

33,000–48,000 Predictions/Second.
Lower Error Than Deep Learning.

DriftMind was benchmarked against Adaptive ARIMA, Prophet, and OneNet — the leading deep learning architecture for online forecasting. It ran on a standard CPU with no GPU, no pre-training, and no retraining.

33–48K
Predictions per second (CPU)
Across 4 NAB datasets
0.82
MAE on NAB Machine Temp.
Best accuracy on drift-heavy data
140×
Faster than OneNet (GPU)
25 seconds vs 58 minutes
0
Retraining cycles
vs up to 17,486 for Prophet

Benchmark 1 — NAB Multi-Dataset

Three-Way Direct Comparison

All three models were evaluated on four datasets from the Numenta Anomaly Benchmark (NAB) — spanning machine sensors, cloud infrastructure, and urban demand forecasting. Same streaming protocol, same hardware, no HTTP overhead, no artificial batching. Results are honest: ARIMA wins accuracy on stable datasets; DriftMind wins on datasets with concept drift and wins speed on every dataset by 250–1,225×.

NAB — Machine Temperature System Failure · 22,695 points · 5-min interval

Model MAE (lower is better) Throughput (pred/s) Total Time Retraining Cycles
DriftMind Accuracy Speed 0.8213 33,672 0.67 s Continuous
Adaptive ARIMA 0.8529 98 228.51 s 901
Triggered Prophet 2.9901 21 1,093.97 s 4,105
Key finding: On drift-heavy datasets (Machine Temperature, CPU Utilization), DriftMind wins both accuracy and speed. On stable datasets (NYC Taxi, Ambient Temperature), ARIMA achieves slightly lower MAE with zero retrains — but DriftMind is still 250–362× faster and requires no warm-up window. The retraining cost becomes catastrophic on longer datasets: ARIMA triggers 5,837 retrains on the CPU utilization series.

Mean Absolute Error

Throughput (predictions / second) — log scale

Total Execution Time (seconds) — log scale

Retraining Cycles

Benchmark 2 — ETTh2 & ETTm1 vs OneNet

DriftMind vs. State-of-the-Art Deep Learning

DriftMind was benchmarked against OneNet, the leading deep learning architecture for online time series forecasting under concept drift (NeurIPS 2023), which itself outperforms PatchTST, FEDformer, FSNet, and OnlineTCN across all reported settings. OneNet ran on an NVIDIA RTX 3080 Ti GPU. DriftMind ran entirely on CPU.

ETTh2 — Hourly Electricity Transformer Temperature

Model MAE MSE Total Runtime Hardware Warm-up Required
DriftMind MAE MSE Speed 0.232 0.145 00:00:25 CPU only None (cold-start)
OneNet (NeurIPS 2023) 0.348 0.380 00:58:32 RTX 3080 Ti GPU 25% dataset

ETTm1 — 15-minute Interval (Higher Frequency)

Model MAE MSE Total Runtime Hardware Warm-up Required
DriftMind 0.218 0.138 00:03:56 CPU only None (cold-start)
OneNet (NeurIPS 2023) MAE/MSE 0.187 0.082 01:48:11 RTX 3080 Ti GPU 25% dataset
What this shows: On ETTh2, DriftMind is both faster and more accurate than OneNet — the strongest available deep learning forecaster under concept drift. On ETTm1 (the larger, higher-frequency dataset), OneNet achieves a marginal accuracy edge at longer horizons, while DriftMind remains 27–28× faster and requires no warm-up window. DriftMind produces valid forecasts from the very first observation. OneNet requires 25% of the dataset as a pre-training window before inference begins.

MAE — ETTh2 (Horizon = 1)

Runtime — ETTh2 (seconds, log scale)

What the Numbers Mean

The Latency of Learning

The benchmark exposes not just a performance gap, but a structural one. The real constraint in production systems is not how well a model predicts — it is the delay between a change in the system and the model's ability to incorporate it.

Retraining Lag

ARIMA and Prophet spend most of their time rebuilding internal models as new data arrives. Between retraining cycles, the model operates on assumptions that are already becoming outdated. At scale — thousands of time series — this lag becomes structural.

GPU Dependency

OneNet achieves competitive accuracy but requires a high-end GPU and 25% of the dataset as a warm-up window. In edge environments, new sensor deployments, or cold-start scenarios, this dependency is a hard blocker.

Continuous Adaptation

DriftMind does not pause, does not retrain, and does not reprocess history. Each new observation updates the internal representation immediately. Learning and inference are the same operation — happening as data flows, at 33,000–48,000 predictions per second.

CPU-Only Economics

Running on commodity hardware eliminates GPU cluster costs. In telecom, IoT, and industrial environments — where thousands of independent time series must be monitored simultaneously — this changes what is economically and operationally feasible.

Benchmark Methodology

Setup & Reproducibility

Both benchmarks were designed to be as fair and straightforward as possible. All models were exposed to the same data in the same streaming order. No cherry-picking of segments. No hyperparameter tuning on test data.

1

NAB Dataset

Machine Temperature System Failure series from the Numenta Anomaly Benchmark. 22,695 data points. Continuous streaming simulation — no HTTP overhead, no artificial batching.

2

ETTh2 & ETTm1 Datasets

Publicly available electricity transformer temperature datasets. Univariate (OT column) for fair comparison with OneNet's published protocol. MAE/MSE computed on standardised data as per OneNet paper.

3

Hardware

DriftMind: Intel Core i7-12700K @ 3.60GHz, 32 GB DDR4, JVM 17 (OpenJDK). CPU-only. OneNet: same machine, NVIDIA RTX 3080 Ti (12 GB), PyTorch 2.1.0 with CUDA.

4

Cold-Start Condition

DriftMind benchmarks were conducted with no warm-up and no pre-loaded history. Prediction begins from the very first observation. OneNet uses the first 25% of each dataset as its mandatory pre-training window.

5

OneNet Replication

The official OneNet GitHub implementation was cloned and executed without modification. Published MAE and MSE results were successfully replicated, confirming benchmark integrity.

6

DriftMind Settings

Input length: 20–60. Max clusters: 200. Sliding window gap rate: 2.0. These defaults were fixed before benchmark runs and not tuned on test data.

Reproduce the NAB benchmark yourself — run the full comparison locally with a single Docker command, then open the Jupyter notebook:
docker run -p 8080:8080 -p 8888:8888 thngbk/driftmind-benchmark # Then open: http://localhost:8888/notebooks/dm_vs_arima_prophet.ipynb

The academic paper detailing the full architecture and ETT benchmark methodology is available here: DriftMind: A Self-Adaptive, Cold-Start Framework for Time Series Forecasting.

See It Running on Your Data

Run DriftMind on your own time series — CSV upload, no setup, no GPU required. Results in seconds.

 Try with Your CSV  Read the Architecture