Why I Spent Years Building a 33,000–48,000 Prediction/s Forecasting Engine

We Cannot Analyse Everything

I have spent most of my career working in high-frequency data environments where data moves continuously and at a large scale. In telecom networks, every cell, every subscriber interaction, every network function generates a time series, each one carrying relevant information about performance, behaviour, and risk.

Yet for years, one assumption remained largely unchallenged:

We cannot analyse everything.

Not because it wasn't valuable, but because it wasn't practical. It has traditionally been too expensive, too computationally heavy, and too complex to operate at scale. In many cases, the business case simply did not justify the infrastructure required to sustain it. That assumption wasn't wrong — it was a reflection of the tools we had.

The problem was never the absence of data, but the inability to keep up with it sustainably. As systems scale, a gap emerges between how fast the system evolves and how fast models can adapt — and it is within this gap that many failures occur.

From Accuracy to Adaptation: The Bottleneck Shift

Over the past decade, time series modelling has improved significantly: better handling of non-stationarity and drift, increased robustness to noise, support for multiple forecasting horizons, and strong performance in controlled benchmark environments. These are meaningful advances.

But in production, a different limitation becomes visible. The challenge is no longer how well models can predict, but how fast they can adapt without stopping. Most approaches still rely on a cycle of training on historical data, deploying the model, and retraining as performance degrades — creating an unavoidable delay between a change in the system and adaptation of the model. At small scale, this delay is manageable. At large scale, it becomes structural.

What I realised over time is that the problem was misframed. We were optimising for accuracy — we were obsessed with accuracy. But the real constraint was the cost and latency of learning.

In environments where thousands of time series evolve continuously, a model that adapts too slowly, or requires too many resources to stay relevant, is effectively operating on outdated assumptions — regardless of how accurate it once was.

Why This Matters

This is why I spent years building a system capable of processing tens of thousands of predictions per second, while continuously absorbing drift and maintaining competitive accuracy on challenging datasets. Not to make forecasting faster for its own sake, but to remove the constraint that made large-scale, continuous analysis economically and operationally unfeasible.

Because once prediction becomes the bottleneck, you are no longer limited by data — you are limited by your ability to learn in time.

The Hidden Assumption: Train → Freeze → Predict

Most time series systems — statistical, machine learning, or deep learning — follow the same cycle: train on historical data, freeze the model, use it for prediction, and retrain when performance degrades. This is not a limitation of specific algorithms, but of an underlying assumption about how learning should occur: only when necessary.

A Model That Stops to Think

While this approach is intuitive — learn from the past, deploy, update when needed — it introduces a critical limitation in continuously evolving systems: the model cannot learn while it is acting. Each adaptation requires retraining, creating a learning gap where the system has already changed but the model still operates on outdated assumptions, missing new patterns and potential anomalies.

This paradigm performs well in controlled environments where training data is stable, retraining is manageable, and computing resources are abundant. But in production systems — non-stationary, distributed, resource-constrained, and continuously evolving — retraining shifts from an update mechanism to a fundamental bottleneck.

The Scalability Challenge

As the number of time series grows, the cost of retraining scales rapidly: more signals require more models, more drift demands more frequent updates, and more updates increase compute, latency, and operational complexity. This leads to a structural trade-off: either simplify models to keep them deployable, or centralise processing and absorb the latency cost — but in both cases, adaptability is compromised.

Underlying this is a deeper assumption: intelligence can be centralised, periodically refreshed, and scaled by adding compute. Yet in practice, this scales infrastructure, not adaptation.

From Functions to Transitions

Most time series approaches treat the problem as function approximation: learn a global relationship from past data and use it to predict the future. But in continuously evolving systems, signals do not behave like stable functions. They behave like sequences of transitions between states.

Instead of asking "what is the function that fits this data?", a different question emerges:

"What are the patterns, and how do they evolve over time?"

In this view, learning is no longer a periodic event — it becomes a continuous process. Each new observation updates the internal representation, refines existing patterns, or creates new ones as the system evolves. There is no retraining cycle and no freeze phase. Learning and inference are no longer separate steps; they become the same operation, happening continuously as data flows.

Rather than fitting a model to the entire history, the system builds a structure of observed transitions. Patterns are identified online, relationships between them are tracked, and future behaviour is inferred from how these transitions have occurred over time. This can be understood as a graph of evolving states, rather than a fixed equation.

By removing the need for retraining, adaptation becomes immediate, and latency disappears. Complexity no longer scales with data volume, but with the diversity of patterns the system encounters.

The system no longer needs to "catch up" with reality. It evolves with it.

The Benchmark

To evaluate this approach under realistic conditions, I ran a direct benchmark against two widely used baselines: Adaptive ARIMA using Kalman Filters and Triggered Prophet. The setup was intentionally simple — all models exposed to the same input data, simulating a continuous streaming scenario, with no HTTP overhead and no artificial batching.

To avoid the classic trap of single-dataset benchmarks, I ran the same protocol on four datasets from the Numenta Anomaly Benchmark (NAB), chosen to span different domains and drift profiles.

Dataset / Model	MAE	Throughput	Time	Retrains
Machine Temperature · 22,695 points
DriftMind	0.8213	33,672/s	0.67s	continuous
Adaptive ARIMA	0.8529	98/s	228.5s	901
Triggered Prophet	2.9901	21/s	1,094s	4,105
CPU Utilization (ASG) · 18,050 points
DriftMind	7.5389	35,531/s	0.51s	continuous
Adaptive ARIMA	9.2491	29/s	618.9s	5,837
Triggered Prophet	9.7014	16/s	1,138s	17,486
NYC Taxi Demand · 10,320 points
DriftMind	1,755	47,778/s	0.22s	continuous
Adaptive ARIMA	981	132/s	76.7s	0
Triggered Prophet	6,393	29/s	354.8s	2,715
Ambient Temperature · 7,267 points
DriftMind	0.729	33,032/s	0.22s	continuous
Adaptive ARIMA	0.706	133/s	53.0s	0
Triggered Prophet	1.843	19/s	379.0s	1,755

DriftMind is both faster and more accurate than ARIMA on drift-heavy data, while being orders of magnitude faster on all four datasets. On stable datasets (NYC Taxi, Ambient Temperature), ARIMA achieves slightly lower MAE with zero retrains — exactly what theory predicts. But DriftMind is still 248–362× faster and requires no warm-up window.

More importantly, DriftMind achieves this without any retraining phase. ARIMA and Prophet spend most of their execution time repeatedly rebuilding their internal models as new data arrives. DriftMind, by contrast, updates continuously. It does not pause, does not retrain, and does not need to reprocess history to remain relevant.

The Latency of Learning

What this benchmark exposes is not just a performance gap, but a structural one. In most systems, learning is not continuous. It happens in discrete steps, triggered by retraining cycles. Between those steps, the model operates on assumptions that are already becoming outdated.

This introduces what can be described as the latency of learning: the delay between a change in the system and the model's ability to incorporate it.

In static environments, this delay is acceptable. In continuously evolving systems, it becomes the defining limitation. When learning is delayed, everything downstream is affected:

Anomalies are detected later than they occur.
Predictions reflect past conditions rather than present ones.
Decisions are made with partial awareness of what is actually happening.

Removing retraining does more than improve performance. It removes the delay between observation and adaptation. Learning becomes immediate. Prediction becomes a byproduct of continuous state tracking. The system no longer reacts to change — it evolves with it.

Reproducibility

The full benchmark can be reproduced locally with a single command:

docker run -p 8080:8080 -p 8888:8888 thngbk/driftmind-edge-lab
# Then run: python3 /notebooks/multi_benchmark.py

No API keys, no cloud credentials, no hidden configuration. You can verify any number in the table above on your own laptop in under an hour.

In real-time systems, intelligence is not defined by how well you predict the future, but by how fast you adapt to the present.

Why I Spent Years Building a Forecasting Engine That Never Retrains