An automated toolchain for the data-driven and dynamical modeling of combined sewer systems

Sara Troutman, Nathaniel Schambach, Nancy Love, Branko Kerkez · Elsevier (Water Research) · 2017

combined-sewer-systemsdata-driven-modelingsmart-water-systemssystem-identificationgaussian-processesreal-time-control

An Automated Toolchain for the Data-Driven and Dynamical Modeling of Combined Sewer Systems

Authors: Sara Troutman, Nathaniel Schambach, Nancy Love, Branko Kerkez Year: 2017 Tags: combined-sewer-modeling, gaussian-processes, system-identification, smart-water-systems, real-time-recalibration, urban-hydrology

TL;DR

A two-component data-driven toolchain separates combined sewer flows into dry-weather (Gaussian Process) and wet-weather (linear transfer function via System Identification) sub-models that are continuously re-calibrated on streaming sensor data. The central empirical finding is that an optimal, site-specific training lookback window exists — roughly 6–9 months for dry-weather and ~15 months for wet-weather — beyond which additional historical data degrades forecast accuracy.

First pass — the five C's

Category. Research prototype / applied methodology with real-world evaluation.

Context. Urban hydrology / smart water infrastructure. Builds on: Williams & Rasmussen (2006) GP regression framework; Ljung (1999) System Identification and transfer function theory; Dawson & Wilby (2001) and Mounce et al. (2014) for NN-based sewer flow modeling as the implicit alternative; Vanrolleghem et al. (2005) on physical model limitations for real-time control.

Correctness. Load-bearing assumptions: (1) combined flow is a linear superposition of dry- and wet-weather components (Eq. 1); (2) wet-weather response is adequately captured by a linear ODE / transfer function (violated at pipe capacity, acknowledged); (3) a periodic + rational-quadratic kernel structure captures all relevant dry-weather covariance; (4) three nearest rain gauges are sufficient inputs for System Identification. All appear reasonable for typical operating conditions; assumption (2) is explicitly known to fail for large storms.

Contributions. - Automated end-to-end toolchain (GP + System ID) requiring only paired flow and rainfall data, with no manual parameterization. - Empirical characterization of site-specific optimal training lookback periods for each sub-model, demonstrating that more data beyond a threshold degrades performance. - Cloud-deployed implementation (AWS + InfluxDB + Grafana) with open-source code release (GitHub: kLabUM/DRIPS). - Evidence that the dry-weather GP hindcast enables reliable wet-weather signal reconstruction where frequency-based filtering fails.

Clarity. Writing is clear and well-structured; the System Identification derivation (Eqs. 13–21) reproduces textbook material at length without adding insight for readers familiar with Ljung (1999), consuming space that could address experimental limitations.

Second pass — content

Main thrust: A lightweight, automatically re-calibrated toolchain combining GP dry-weather modeling and linear System Identification for wet-weather response achieves good combined-flow forecasts in a real urban sewer system, with site-specific optimal training windows of ~6–9 months (dry) and ~15 months (wet) that differ between sub-models and across sites, motivating continuous automated re-calibration rather than one-time calibration.

Supporting evidence: - Dataset: 10 flow sites, 18 tipping-bucket rain gauges, 3 years (2013–2016) at 5-minute resolution across one real combined sewer system. - GP dry-weather NRMSE reported as 0.7427 for Site Q02 (representative example); full per-site values deferred to SI Table 1 (not included in manuscript text). - Average optimal dry-weather lookback: 6–9 months; using fewer than ~3 months causes overfitting; using more than ~9 months degrades average NRMSE. - Best wet-weather model was 3rd-order transfer function at all 10 sites; average optimal lookback ~15 months; minimum ~9 months required for satisfactory performance. - Storm count within a 9-month window ranged from 5 to 17, illustrating why wet-weather model needs longer lookbacks than dry-weather.

Figures & tables: Figure 4 (GP predictions vs. observed dry-weather) and Figures 5/7 (NRMSE vs. lookback per site) carry the core argument. Axes are labeled and units are present. No error bars or confidence intervals appear on any performance curve. No statistical significance is reported for differences between lookback periods or between sites. Figure 8 usefully documents failure modes (rain-gauge mismatch, pipe capacity exceedance). SI Tables 1 and 2 contain full per-site NRMSE data but are not reproduced in the paper.

Follow-up references: - Williams & Rasmussen (2006) — GP theory and kernel design underlying the dry-weather model. - Ljung (1999) — System Identification foundations; entire wet-weather methodology derives from this. - Mounce et al. (2014) — NN-based sewer flow modeling; the implicit comparison class the authors discuss but do not benchmark against. - Vanrolleghem et al. (2005) — physical model limitations for real-time control; motivates the data-driven approach.

Third pass — critique

Implicit assumptions: - Linearity of wet-weather flow across storm magnitudes: explicitly violated during large events (pipe capacity exceedance, Figure 8b); the transfer function cannot represent the saturation nonlinearity. This would break predictions for the highest-consequence events. - Strict additivity (Eq. 1) assumes dry- and wet-weather dynamics are fully decoupled; upstream control actions (pumps, gates, storage dams) create coupling that the model does not represent and that the authors acknowledge introduces systematic errors at five of ten sites. - Stationarity within the chosen lookback window: the whole motivation for re-calibration is non-stationarity, yet the model itself assumes stationarity over the training period. - Three nearest gauges are adequate rainfall inputs: no analysis of gauge-distance sensitivity or spatial interpolation is provided; the paper notes unexplained cases of rainfall with no flow response and vice versa.

Missing context or citations: - No quantitative comparison to any baseline: not to ARIMA/seasonal decomposition, not to a simple moving-average diurnal model, not to a neural network, and not to SWMM or any physical model. The claim of "good performance" is therefore relative only to an implicit mean-flow baseline (NRMSE = 0). - Sparse GP approximations (e.g., inducing-point methods) are not discussed despite acknowledging O(N³) scaling of exact GP inference; this omission leaves scalability for large sensor networks unaddressed. - No engagement with existing diurnal wastewater generation models (Butler & Schütze, 2005; Gernaey et al., 2011) in quantitative terms.

Possible experimental / analytical issues: - Only 10 flow sites: the "average optimal lookback" conclusions are drawn from a very small sample with high site-to-site variability; the generalizability of the 6–9 month (dry) and 15 month (wet) guidelines is unsubstantiated. - No uncertainty quantification for wet-weather predictions: the GP sub-model produces a posterior variance, but the System Identification sub-model does not, so combined-flow prediction intervals are not reported. - Raw data cannot be shared (privacy agreement); the anonymized example dataset may not reproduce the spatial heterogeneity findings. - Cross-validation methodology is incompletely described: the exact number of training/test storm splits, how storms are defined (duration, threshold), and how the "best" rain-gauge pair is selected are described qualitatively but lack algorithmic precision sufficient for reproduction. - No physiographic analysis successfully explains inter-site variability in optimal lookback — the paper reports a null result but offers no alternative explanatory framework, leaving practitioners without guidance for new deployments. - NRMSE for individual sites beyond Q02/Q03/Q05 examples is relegated to unreproduced SI tables.

Ideas for future work: 1. Add upstream control-action states (pump on/off, gate position) as exogenous inputs to the System Identification model (ARX/ARMAX structure) to handle the five sites where infrastructure operations distort the flow signal. 2. Benchmark against a NN baseline and a lumped physical model (e.g., SWMM with simplified parameterization) on identical train/test splits to quantify what the toolchain gains or loses versus alternatives. 3. Apply sparse GP approximations (e.g., variational inducing points) and profile runtime vs. accuracy as sensor count scales, to validate the cloud deployment claim of "low computational overhead" for city-scale networks. 4. Conduct a prospective experiment where the toolchain's automated re-calibration trigger is varied (e.g., every storm, weekly, monthly) and measure downstream impact on forecast NRMSE and CSO prediction accuracy, to produce operational guidelines.

Methods

Gaussian Process regression
dynamical System Identification
transfer function modeling
Butterworth bandpass filtering
Gauss-Newton optimization
normalized root mean square error (NRMSE) evaluation
rolling cross-validation

Datasets

Real-world combined sewer sensor dataset (10 flow sites, 18 precipitation sites, 2013-2016, 5-minute resolution, anonymized)

Claims

A data-driven toolchain using Gaussian Processes for dry-weather flows and System Identification for wet-weather flows can accurately forecast combined sewer flows.
There exists a near-optimal 'data age' for model training: using too little or too much historical data degrades forecasting performance.
On average, 6-9 months of lookback data optimizes dry-weather model performance, while 15 months optimizes wet-weather model performance.
Dry-weather and wet-weather sub-models require different volumes of historical data for optimal performance, necessitating a flexible per-component re-calibration approach.
Continuous automated re-calibration is more critical to forecast accuracy than model complexity itself.