Case Study · EDF via NeoStair · Gas Forecasting

PrevCtGazTr — Remediating
Systematic Data Drift in Production.

847 Non-residential PCEs

1.83% Final MAPE (48h horizon)

−25% vs 2.44% baseline

01 / The Problem

A Drift That Was Invisible Until It Wasn't

PrevCtGazTr is a Python package for daily gas consumption forecasting of non-residential industrial gas clients (PCEs) for EDF. Horizon: 48 hours. Frequency: daily retraining on fresh meter data.

The pipeline was in production and seemingly stable — until a systematic pattern emerged in residual analysis: the 6_11_h feature (consumption from 6am to 11am) was being systematically underestimated at the 13h10 prediction window for a significant subset of non-LI PCEs.

The drift was subtle enough to pass monitoring thresholds on individual PCEs but compound across the portfolio. The business impact: systematic under-supply forecasts for high-consumption industrial clients during morning peak hours — exactly when accuracy matters most.

02 / Root Cause Analysis

The feature wasn't late — it was systematically incomplete. At 13h10, the 6_11_h feature was being populated with partial data for a subset of PCEs with delayed telemetry. The model had learned to compensate — but in the wrong direction, overcorrecting for the delay and underestimating true morning consumption.

The anti-leakage discipline made it worse. Strict anti-data-leakage rules prevented the model from using any information that wouldn't be available at inference time. Correct in principle — but it meant the model had no way to signal "this feature value is unreliable." The fix required making the uncertainty explicit as a feature, not hiding it.

03 / The Fix — 18-Feature Redesign

6_11h_available flag

Boolean feature signaling whether the 6_11_h value was complete at inference time. Made the uncertainty explicit — the model could now distinguish 'zero consumption' from 'missing data'.

Per-PCE Exponentially Weighted Mean

groupby().transform() with per-PCE EWM weights. Preserved individual industrial client consumption rhythms. Critical: a global EWM would have smoothed the variance that defined high-consumption PCE behavior.

Feature store snapshot system

Captures the exact feature vector seen at inference time, stored alongside predictions. Enables post-hoc drift detection: compare what the model saw vs what actually arrived later. First systematic proof of drift in 6 months of production data.

Sample weight redesign

3× weight on winter morning peaks (highest business impact), 2× on summer anomalies, 1× baseline. Trained the model to prioritize accuracy precisely where the systematic error was largest.

04 / Architecture Decisions

LightGBM over deep learning for 48h gas forecasting

✓ LightGBM with Optuna hyperparameter optimization · LSTM / Temporal Fusion Transformer

LSTM and TFT require large sequence lengths to learn temporal dependencies. With 847 PCEs and heterogeneous consumption profiles, deep learning introduced training instability and poor generalization on rare consumption patterns (holiday shutdowns, maintenance windows). LightGBM with explicit lag features and domain-informed feature engineering outperformed by 12% on MAPE and was 40× faster to retrain — critical for daily retraining pipelines.

Per-PCE EWM over global rolling mean for drift correction

✓ groupby().transform() with per-PCE exponentially weighted mean · Global EWM across all PCEs

The drift was not uniform. Non-LI industrial PCEs had consumption profiles 3–8× more volatile than residential PCEs. A global EWM smoothed away the signal that mattered most for high-consumption clients. Per-PCE EWM via groupby().transform() preserved individual consumption rhythms while correcting for the systematic 6_11_h underestimation at 13h10.

Stateless package design over stateful pipeline

✓ Stateless ProjectConfig — all state externalized · Embedded pipeline state

A stateful ML package accumulates technical debt as config options multiply. By externalizing all state into a ProjectConfig object with a COLUMN_SPECS/Pandera validation system, the package becomes independently testable, version-controllable, and deployable across environments without runtime mutation. Discovered this the hard way after version 3.

05 / Package Architecture

ProjectConfig

Single source of truth — all state externalized, validated at construction time

COLUMN_SPECS

Pandera schema contracts — enforced at ingestion and before model training

score.py

Stateless inference — receives config, returns predictions, no side effects

cron_predict_daily.py

Daily orchestration — feature capture → predict → store → alert on drift

cron_capture_features.py

Snapshot system — captures feature vectors at inference time for post-hoc analysis

supervision.py

Drift detection — compares inference-time features vs actuals, triggers alerts

06 / Results

−25%

MAPE

2.44% (baseline) → 1.831% post-remediation

Fixed

Morning peak accuracy

Systematic underestimation → Error distribution centered on zero across non-LI PCEs

New

Drift detection

No capability → Feature snapshot system catches delayed telemetry within 1 prediction cycle

Package design

Stateful v3 with embedded config → Stateless v4 with externalized ProjectConfig + Pandera contracts

Stack

PythonLightGBMOptunaPandasPanderascikit-learnpytestDockerCron

PrevCtGazTr — RemediatingSystematic Data Drift in Production.

A Drift That Was Invisible Until It Wasn't

PrevCtGazTr — Remediating
Systematic Data Drift in Production.