Overview
In many real-world systems, forecasting is not an offline optimisation problem. Data arrives continuously, distributions shift without warning, and predictions must be delivered under hard latency and infrastructure constraints. Telecom networks, industrial IoT, energy systems, financial platforms, and edge-deployed sensors all share this property: the system must forecast while the process is changing.
DriftMind was designed explicitly for this regime. It avoids a separate training phase, processes each observation at most once, and keeps per-observation cost bounded. Instead of continuously updating a global model, it organises incoming windows into reusable behavioural patterns and forecasts through structural memory.
What DriftMind assumes
Streaming data, concept drift, immediate forecasting, bounded compute, and large-scale deployment.
What DriftMind avoids
Warm-up dependence, batch retraining, GPU-heavy inference, replay buffers, and global normalisation.
Why Conventional Forecasting Breaks in Fast Data Environments
Most modern forecasting architectures, especially deep-learning systems, are designed under assumptions that often do not hold in operational streaming environments:
- Warm-up dependence: they need historical context before they can produce reliable forecasts.
- Parameter-based adaptation: concept drift is handled through weight updates, which are intrinsically delayed and computationally expensive.
- Global normalisation: many models rely on statistics that become invalid as soon as level or variance shifts occur.
- Fixed inference cost: even simple series incur the same heavyweight computational pipeline as complex ones.
DriftMind starts from a different premise: forecasting under drift is often better framed as reuse of observed behaviour than as repeated parameter optimisation.
DriftMind’s Design Principles
1. Adaptation without retraining
DriftMind does not attempt to constantly recalibrate a global model. It answers a simpler and more operationally relevant question:
2. Structural memory as a first-class primitive
Each behavioural pattern is stored explicitly as a prototype sequence together with information about how that behaviour evolved in the past. Memory is inspectable, updateable, and prunable.
3. Cost proportional to predictive complexity
Simple patterns should be cheap to forecast. More complex or novel behaviours can trigger richer internal processing. This keeps compute aligned with actual forecasting difficulty.
4. Cold-start as a recurring condition
DriftMind treats cold-start as normal, not exceptional. This matters because in fast data systems, abrupt resets and regime shifts repeatedly invalidate prior assumptions.
Online Forecasting Pipeline
DriftMind operates on a sliding window of length L and produces a forecast horizon
H whenever sufficient observations are available. It processes each observation
once
and routes each window through one of three paths:
| Path | Used when | Forecasting behaviour |
|---|---|---|
Naïve extension |
Simple pattern detected (e.g. linear / geometric) | Direct analytical extension with minimal latency |
Cluster + TTG |
Input matches an existing behavioural prototype | Sequence-to-sequence forecast using structural memory |
Fallback extrapolation |
No sufficiently similar cluster exists | Momentum-style extrapolation to guarantee a forecast |
Per-request standardisation
Before clustering, DriftMind separates shape from scale. Each input window is standardised using statistics computed on the current request only, not global history. This makes matching robust to level shifts and scale changes and removes dependency on warm-up statistics.
Incoming window
├─ Detect simple pattern? → extend analytically
├─ Else standardise current window
├─ Find best matching cluster
│ ├─ match found → forecast through TTG
│ └─ no match → fallback extrapolation
├─ descale forecast
└─ update memory / decay unused clusters
Online Behavioural Clustering
DriftMind continuously organises incoming windows into a dynamic set of behavioural clusters. Each cluster contains:
- a prototype sequence that represents the canonical shape of the behaviour,
- a Temporal Transition Graph that captures how the behaviour evolves over time,
- a usage score that determines whether the pattern remains relevant.
Why Pearson correlation?
Similarity is measured using Pearson correlation, not Euclidean distance and not rank-based measures such as Spearman or Kendall. This is deliberate.
What correlation preserves
Shape similarity under affine transformation, making it robust to offset and scale drift.
Why that matters
Inside a cluster, future evolution must remain locally coherent so the transition graph stays stable and usable.
If the best similarity exceeds a threshold, the window is assigned to the corresponding cluster. Otherwise, a new cluster is created immediately. This means novel behaviour can be incorporated at first occurrence, with zero retraining latency.
Temporal Transition Graphs (TTGs)
Once a window is assigned to a cluster, forecasting is performed using a Temporal Transition Graph. This is DriftMind’s core forecasting structure.
A TTG is a lightweight, non-parametric memory of the temporal transitions observed inside a cluster. Instead of fitting a regression function or training a neural predictor, DriftMind stores the transitions explicitly and forecasts by deterministic traversal.
What a TTG does
- discretises cluster-specific sequences into temporal states,
- stores one-step transitions and short continuations,
- uses a compact signature to select candidate entry points,
- traverses the graph deterministically to build the forecast horizon.
Why this matters
Because clustering already constrains each cluster to a locally coherent behavioural manifold, TTG traversal remains stable, interpretable, and computationally light. Confidence bands can be derived empirically from stored transitions rather than from a learned probabilistic head.
Anomaly Scoring
DriftMind does not define anomaly as raw forecast error alone. Instead, anomaly reflects structural novelty relative to memory.
When a simple pattern is recognised, DriftMind tracks the relevance of that pattern family over time. When no naïve pattern is found, it evaluates the current window against known clusters and derives anomaly from the strength and recency of the best candidate match.
| Situation | Interpretation | Anomaly tendency |
|---|---|---|
| Frequently re-observed pattern | Behaviour is well represented in memory | Low |
| Weak best cluster match | Behaviour resembles memory only partially | Moderate / High |
| Novel or long-dormant behaviour | Pattern is structurally unusual in current context | High |
This makes anomaly scoring immediately useful for real-time monitoring, because it measures how surprising the current behaviour is relative to recent operational memory.
Benchmark Results
NAB Multi-Dataset — vs ARIMA & Prophet
DriftMind was benchmarked against Adaptive ARIMA and Triggered Prophet across four NAB datasets spanning industrial sensors, cloud infrastructure, and urban demand forecasting. Same streaming protocol, same CPU, same triggered-retrain logic for baselines (MASE + correlation breach with 20-step debounce).
| Dataset / Model | MAE | Throughput | Time | Retrains |
|---|---|---|---|---|
| Machine Temperature · 22,695 points | ||||
| DriftMind | 0.8213 | 33,672 / s | 0.67 s | continuous |
| Adaptive ARIMA | 0.8529 | 98 / s | 228.5 s | 901 |
| Triggered Prophet | 2.9901 | 21 / s | 1,094 s | 4,105 |
| CPU Utilization (ASG) · 18,050 points | ||||
| DriftMind | 7.5389 | 35,531 / s | 0.51 s | continuous |
| Adaptive ARIMA | 9.2491 | 29 / s | 618.9 s | 5,837 |
| Triggered Prophet | 9.7014 | 16 / s | 1,138 s | 17,486 |
| NYC Taxi Demand · 10,320 points | ||||
| DriftMind | 1,755 | 47,778 / s | 0.22 s | continuous |
| Adaptive ARIMA | 981 | 132 / s | 76.7 s | 0 |
| Triggered Prophet | 6,393 | 29 / s | 354.8 s | 2,715 |
| Ambient Temperature · 7,267 points | ||||
| DriftMind | 0.7290 | 33,032 / s | 0.22 s | continuous |
| Adaptive ARIMA | 0.7060 | 133 / s | 53.0 s | 0 |
| Triggered Prophet | 1.8431 | 19 / s | 379.0 s | 1,755 |
OneNet (Deep Learning) — ETTh2 & ETTm1
DriftMind was also benchmarked against OneNet (NeurIPS 2023), the leading deep learning architecture for online forecasting under concept drift. OneNet ran on an NVIDIA RTX 3080 Ti GPU. DriftMind ran entirely on CPU.
Values shown as MSE / MAE. Bold indicates best result per cell.
| Dataset | Method | Horizon = 1 | Horizon = 24 | Horizon = 48 |
|---|---|---|---|---|
| ETTh2 | OneNet (GPU) | 0.348 / 0.380 | 0.407 / 0.535 | 0.436 / 0.609 |
| ETTh2 | DriftMind (CPU) | 0.232 / 0.145 | 0.351 / 0.284 | 0.399 / 0.332 |
| ETTm1 | OneNet (GPU) | 0.187 / 0.082 | 0.225 / 0.098 | 0.238 / 0.108 |
| ETTm1 | DriftMind (CPU) | 0.218 / 0.138 | 0.424 / 0.357 | 0.481 / 0.430 |
Runtime comparison (ETTh2 / ETTm1)
| Dataset | Method | Horizon = 1 | Horizon = 24 | Horizon = 48 |
|---|---|---|---|---|
| ETTh2 | OneNet (GPU) | 00:58:32 | 01:01:27 | 01:01:56 |
| ETTh2 | DriftMind (CPU) | 00:00:25 | 00:00:52 | 00:01:55 |
| ETTm1 | OneNet (GPU) | 01:48:11 | 01:49:36 | 01:51:02 |
| ETTm1 | DriftMind (CPU) | 00:03:56 | 00:03:53 | 00:06:19 |
docker run -p 8080:8080 -p 8888:8888 thngbk/driftmind-edge-labThen run
python3 /notebooks/multi_benchmark.py. See interactive benchmark results.
Operational Implications
Deploy everywhere — one engine, every scale
DriftMind is the only forecasting engine that deploys identically from managed cloud to a Raspberry Pi. Same REST API, same model, same results — the deployment target changes, the intelligence doesn't.
Cloud / SaaS
Managed platform at api.thingbook.io. Start in minutes, free tier, elastic scaling.
On-Prem / Kubernetes
Full SaaS replica inside your infrastructure boundary. Helm chart, air-gapped capable.
Edge / Docker
One container, ~15 MB binary. Any Linux box. Offline capable. thngbk/driftmind-edge.
On-Device
Native binary on ARM or x86. Raspberry Pi, industrial gateways, embedded controllers.
Agent-ready by design
DriftMind is natively accessible to AI agents via MCP (Model Context Protocol, Anthropic), A2A (Agent-to-Agent, Google), and OpenAPI 3.0. Agents can create forecasters, push observations, and read predictions without integration code.
Industry applications
- Telecom networks: anomaly detection across RAN, core, and transport KPIs. FM integration via TMF642 / TMF656. Deploys in under 2 weeks.
- Industrial IoT: edge-deployed predictive monitoring. OPC-UA, MQTT, Modbus, Profinet, Profibus, MSMQ. PoC validated: 3–6% energy reduction in RO desalination.
- Data centers: PUE optimization, thermal hotspot prediction, cooling predictive maintenance. Reads via SNMP, IPMI/Redfish, BACnet, Modbus.
Resources
Benchmark & Articles
- Interactive Benchmark Results — 4-dataset NAB + OneNet, with dataset switcher and animated charts
- Real-Time Forecasting Without Retraining — full benchmark narrative with retrain trigger methodology
- Engineering Blog — all technical articles
Developer
- DriftMind Developer Guide
- Live API / Swagger
- DriftMind Python Client
- Docker Hub —
thngbk/driftmind-edgeandthngbk/driftmind-edge-lab - Thingbook GitHub