Flooding and Overflow Mitigation Using Deep Reinforcement Learning Based on Koopman Operator of Urban Drainage Systems

Wenchong Tian, Zhenliang Liao, Zhiyu Zhang, Hao Wu, Kunlun Xin · Water Resources Research · 2022

[doi] urban-drainagereinforcement-learningkoopman-operatorflood-mitigationreal-time-controlsurrogate-modeling

Flooding and Overflow Mitigation Using Deep Reinforcement Learning Based on Koopman Operator of Urban Drainage Systems

Authors: Wenchong Tian, Zhenliang Liao, Zhiyu Zhang, Hao Wu, Kunlun Xin Year: 2022 Tags: urban-drainage-systems, reinforcement-learning, koopman-operator, surrogate-emulator, real-time-control, combined-sewer-overflow

TL;DR

A Koopman-operator-based emulator (DLEDMD) is trained on SWMM simulations and used as a drop-in replacement for SWMM during RL agent training for pump control in a combined sewer system. The approach achieves 79.67× higher data usage efficiency and faster training than SWMM-based RL, with similar flooding and CSO mitigation outcomes. Interpretability of the Koopman emulator is claimed to surpass black-box MLP emulators via its explicit linear structure.

First pass — the five C's

Category. Research prototype — new methodology applied to a single real-world case study.

Context. Urban drainage real-time control (RTC) subfield. Builds on: Mullapudi et al. (2020) and Saliba et al. (2020) — SWMM-based RL for UDS control (primary baseline); Li et al. (2017) — DLEDMD algorithm for Koopman approximation via deep learning; Lund et al. (2020) — linear surrogate model inside MPC for CSO mitigation; Korda & Mezić (2018) — Koopman operator theory applied to system control.

Correctness. Load-bearing assumptions: (a) a 4-variable aggregated state (CSO+flooding volume, stored volume, inflow, outflow) captures sufficient system dynamics for a 139-node network; (b) an emulator trained on Chicago-histogram synthetic rainfalls generalizes to real rainfall and to the RL sampling distribution; (c) on/off binary pump control is adequate for the operational task. All three are plausible but not formally verified in the paper.

Contributions. - Koopman emulator (DLEDMD) for nonlinear UDS dynamics, achieving higher recursive-prediction NSE (0.931–0.994) than linear regression (0.072–0.961) and comparable to MLP, with an explicit linear structure enabling interpretability. - Emulator-based RL framework substituting SWMM with the Koopman emulator during training, yielding 79.67× higher data usage rate (DUR). - New DUR metric quantifying ratio of control-relevant state data to total environment output data during RL sampling. - Uncertainty analysis evaluating robustness to 50 Monte Carlo rainfall events and ±5% state input noise.

Clarity. Well-structured with a clear route-map figure; mathematical notation is dense but consistently defined; the "interpretability" discussion in Section 6.1.2 is qualitative rather than demonstrating quantitative physical insight from the eigenfunctions.

Second pass — content

Main thrust: Replacing SWMM with a Koopman emulator (DLEDMD) during RL training for pump scheduling reduces training cost and data waste while achieving flood/CSO mitigation performance comparable to SWMM-trained agents, and outperforms linear and MLP baseline emulators on recursive prediction accuracy.

Supporting evidence: - DLEDMD recursive-prediction NSE: 0.931–0.994 across 4 test rainfalls; Linear NSE: 0.072–0.961; MLP NSE: 0.817–0.963 (Table 5, unitless). - DUR of emulator-based RL is 79.67× higher than SWMM-based RL (Tables 10–11; absolute DUR values not reproduced in the extracted text but ratio stated). - Training run on a single office laptop (Intel Core i7-9750 @ 2.60 GHz, 16 GB RAM); emulator-based RL training with n = 1,000 steps completes faster than SWMM-based RL at n = 200 steps (specific wall-clock times in Tables 10–11, not fully extracted here). - CSO + flooding volumes (10³ m³) at n = 20: DLEDMD-DQN 20.4–61.3 vs. SWMM-DQN 16.3–55.4 vs. water-level system 19.8–59.1 across Rain1–8 (Table 6); emulator-based agents approach SWMM-based performance by n = 1,000. - Uncertainty analysis: 50 rainfall Monte Carlo runs yielding RSI distributions; 20 noisy-input trajectories per test event using U(0.95, 1.05) multiplicative noise.

Figures & tables: Figures 9–10 show recursive and one-step prediction trajectories — axes appear labeled, but no uncertainty bands on predictions. Figure 11 shows Koopman eigenvalue/singular-value spectra and 2D-projected eigenfunctions — axes labeled (s1, s2); interpretation is visual only with no quantitative link to physical UDS states. Tables 5–9 carry the core quantitative argument; no confidence intervals or statistical significance tests are reported anywhere in the paper. Visualization of eigenfunction projections (Figures 11c–11d) is illustrative but not connected to physical meaning.

Follow-up references: - Mullapudi et al. (2020) — primary SWMM-RL baseline this work extends; essential context for computational burden claim. - Li et al. (2017) — DLEDMD algorithm; needed to understand the emulator's approximation machinery. - Lund et al. (2020) — linear surrogate in MPC for CSO mitigation; natural direct comparison the paper references but does not benchmark against. - Saliba et al. (2020) — source of the input-noise uncertainty methodology and another SWMM-RL baseline.

Third pass — critique

Implicit assumptions: - The 4-variable aggregated emulator state is sufficient to characterize a 139-node, 140-pipe network — spatial information is entirely lost; if inter-node dynamics matter for pump scheduling this could break control performance. - Emulator and RL training rainfalls must be drawn from the same Chicago-histogram distribution — explicitly acknowledged in Section 3.3, but left as a hard design constraint with no solution; limits deployment flexibility. - Real rainfall events (Rain5–8) were scaled by a factor of 20 in intensity — this is physically extreme and unmotivated; results under these events may not reflect realistic performance. - On/off binary pump control covers the full operational space — throttling or variable-speed pumps are not considered.

Missing context or citations: - No benchmark against MPC with the same linear or DLEDMD surrogate (Lund et al., 2020 does this for a different system); a direct same-system MPC comparison is absent. - Model-based RL methods that integrate surrogate models with online uncertainty (e.g., MBPO, PETS frameworks cited by Chua et al., 2018) are mentioned but not tested. - The interpretability claim is not compared against post-hoc explanation methods for neural networks (e.g., SHAP, LIME), so whether the Koopman linear structure adds practical interpretability over explainable MLP approaches is undemonstrated. - No engagement with transfer-learning or domain-randomization literature for generalizing RL agents across different UDS configurations.

Possible experimental / analytical issues: - SWMM-based RL is tested only up to n = 200 steps (resource constraint stated), while emulator-based RL runs to n = 3,000; the comparison is therefore unequal — SWMM agents might improve further with more steps. - Emulator training cost (18 SWMM simulations) is excluded from the DUR and training-time comparisons; the 79.67× data efficiency advantage does not account for this overhead. - Single case study (one Eastern China combined sewer system) precludes generalizability claims; no sensitivity analysis on network topology or size. - 50 Monte Carlo runs for the rainfall uncertainty analysis may undersample tail events; no convergence test for the Monte Carlo estimator is reported. - No statistical significance testing (e.g., t-tests, confidence intervals) on CSO/flooding volume differences between methods; numerical differences in Tables 6–9 are reported without uncertainty quantification. - Error accumulation in recursive prediction over the 2 hr simulation horizon is not systematically analyzed (only final NSE reported); long-horizon drift could corrupt RL reward signals in a way not captured by aggregate MSE/NSE.

Ideas for future work: - Test the framework on multiple UDS topologies and climatic regions to establish generalizability; include a case where emulator training distribution differs from RL sampling distribution to stress-test the distribution-matching constraint. - Provide a same-system head-to-head comparison against MPC using the DLEDMD surrogate as the internal model, controlling for the same rainfall events. - Develop an online emulator update scheme (e.g., Dyna-style) that refines the Koopman emulator during RL training when the agent encounters out-of-distribution states, relaxing the distribution-matching requirement. - Quantitatively link Koopman eigenfunctions (Figure 11c–11d) to physical UDS states (e.g., pump-station water levels, sub-catchment storage) to substantiate the interpretability claim beyond mathematical existence of a linear structure.

Methods

Koopman operator approximation
dictionary learning extended dynamic mode decomposition (DLEDMD)
deep Q-network (DQN)
proximal policy optimization (PPO)
multilayer perceptron (MLP)
linear regression
SWMM simulation
Monte Carlo uncertainty analysis
Chicago histogram rainfall generation

Datasets

Eastern China combined sewer system SWMM model (139 nodes, 140 pipelines, 3 pump stations)
Chicago histogram designed rainfall events
real rainfall monitoring data from surrounding city

Claims

The Koopman emulator achieves better recursive prediction accuracy than linear regression and MLP baseline emulators for urban drainage system dynamics.
RL agents trained with the Koopman emulator achieve similar flooding and overflow mitigation control effects as SWMM-based RL agents while requiring fewer training steps.
The emulator-based RL framework achieves a data usage rate 79.67 times higher and faster training speed compared with SWMM-based RL.
RL agents based on the Koopman emulator exhibit acceptable robustness under diverse rainfall events and imperfect state inputs as shown by Monte Carlo uncertainty analysis.
The linear structure of the Koopman emulator provides interpretability via eigenfunction and singular function analysis, unlike black-box neural network emulators.