Statistical Learning Approaches for the Control of Stormwater Systems

Abhiram Mullapudi · University of Michigan (doctoral dissertation) · 2020

stormwater-controlreinforcement-learningbayesian-optimizationsmart-water-systemsurban-hydrologyreal-time-control

Statistical Learning Approaches for the Control of Stormwater Systems

Authors: Abhiram Mullapudi Year: 2020 Tags: stormwater-control, reinforcement-learning, bayesian-optimization, real-time-control, urban-hydrology, smart-water-systems

TL;DR

This dissertation develops and evaluates statistical learning methods — deep reinforcement learning (RL) and Bayesian optimization (BO) — for autonomous real-time control of distributed stormwater assets (valves, gates) to achieve watershed-scale flood and water quality objectives. It also delivers two open-source artifacts: a modular simulation toolchain coupling hydraulic and water quality models, and pystorms, a Python benchmark library for quantitatively comparing stormwater control algorithms.

First pass — the five C's

Category. Dissertation comprising a position/framework chapter, one real-world field study, two simulation-based research prototypes, and one open-source software contribution.

Context. Urban stormwater real-time control subfield; builds on: Kerkez et al. (smart stormwater systems review and sensor-actuator deployment in Ann Arbor); EPA SWMM (hydraulic simulator underlying all computational experiments); deep Q-network (DQN) RL methodology from the ML literature; genetic algorithms as the incumbent search-based stormwater control baseline.

Correctness. Load-bearing assumptions: (1) SWMM simulation is a sufficiently faithful proxy for real watershed dynamics to train and evaluate control policies; (2) valve position can be held approximately constant through a storm event, motivating BO as a pre-storm planner; (3) sensor readings are noise-free and communications are reliable (both assumed throughout simulation chapters). All three are asserted rather than validated.

Contributions. - First formulation and simulation evaluation of a deep RL (DQN) controller for urban stormwater networks, including reward-function sensitivity and multi-basin scalability analysis. - First field demonstration of coordinated, multi-asset real-time flow shaping in a real watershed (Ann Arbor), achieving a flat set-point hydrograph over a 6 km stream reach. - Bayesian optimization framework for pre-storm valve setting that outperforms genetic algorithms at 30-iteration budgets (30 random seeds) and quantifies rainfall uncertainty via a multi-output latent heteroscedastic Gaussian process (MLH-GP). - pystorms: open-source Python library providing anonymized real-world stormwater scenarios, a SWMM interface, and a standardized performance metric for algorithm benchmarking.

Clarity. Generally well-organized for a multi-paper dissertation; Chapter 2 is more conceptual and review-heavy than the others, and the provided text is truncated before results appear — the abstract must substitute for the missing detail in several places.

Second pass — content

Main thrust: Model-free statistical learning (RL for reactive control, BO for proactive pre-storm planning) can manage distributed stormwater assets to achieve system-scale flood and water quality benefits without requiring explicit linearized dynamics, outperforming or matching incumbent rule-based and genetic-algorithm approaches in simulation and, for flow shaping, in the field.

Supporting evidence: - Ch. 3 (field): 30-minute pulse releases from a single controlled basin, dispersed over a 6 km stream, produce a flat hydrograph at the watershed outlet; coordinated interleaved releases from two basins successfully superpose and offset flow peaks over an 18–44 hour window. - Ch. 4 (simulation): RL agent evaluated on a 25-year, 6-hour storm and a 10-year, 24-hour storm on an 11-basin urban network; batch-normalized DQN achieves consistently higher rewards and faster convergence than a generic DQN; in the multi-basin scenario the agent preferentially exploits the most upstream controllable asset (basin 4) to shift the peak hydrograph. - Ch. 4 (appendix): Equal-filling degree heuristic successfully maintains outflows below threshold for all tested storm events, providing a strong deterministic baseline that RL must compete with. - Ch. 5 (simulation): BO identifies lower-cost control decisions than a genetic algorithm across 30 iterations × 30 random seeds (Table 5.1 reports mean ± SD; lower values = better; exact numbers not legible in provided text); MLH-GP uncertainty bounds align more closely with empirically computed bounds than standard GP for the same 200 samples. - Ch. 6 (demonstration): Equal-filling degree controller holds outlet flow below 0.5 m³s⁻¹ in Scenario theta; outperforms rule-based control, which outperforms uncontrolled (Table 6.3 gives performance metric values; exact numbers not provided in the excerpt).

Figures & tables: - Fig. 4.3 (RL single-basin): six-panel layout comparing reward functions, training curves, controlled vs. uncontrolled flows and water levels; Y-axis scales differ across reward columns (noted in caption); no confidence intervals on training curves; results shown for the single highest-reward episode only — no cross-run statistics. - Fig. 4.7 (storm spectrum performance map): normalized performance heat map for uncontrolled vs. RL-controlled across a range of storm magnitudes and durations; axes appear labeled; no uncertainty quantification. - Fig. 5.5 (uncertainty quantification): GP vs. MLH-GP uncertainty bands compared to empirical values; 200 samples; shows BO's acquisition function concentrating evaluations around 0.23 — informative but no formal coverage statistics reported. - Table 5.1: only summary shown is the key BO-vs-GA comparison with mean ± SD; no formal hypothesis test for statistical significance stated. - Table 6.3: three-row performance metric table (equal-filling, rule-based, uncontrolled); no confidence intervals; single-run values. - General weakness: most figures lack error bars or cross-replicate confidence intervals; axis label legibility is not assessable from text alone.

Follow-up references: - Kerkez et al. [3] — foundational smart stormwater review and deployment context; essential background. - Emerson et al. [6] — demonstrates that locally optimal SCM placement can degrade watershed-scale outcomes; motivates the coordinated control problem. - Borsányi et al. [34] — identified need for quantitative benchmarking of stormwater control algorithms; conceptual predecessor to pystorms. - Langergraber et al. [30, 84] — finite-element water quality model for subsurface flow wetlands; relevant to extending RL/BO to water quality objectives.

Third pass — critique

Implicit assumptions: - Sim-to-real fidelity: RL and BO controllers are trained and evaluated entirely in SWMM simulation (Ch. 4–5); the gap between simulated and real hydraulic behavior is never quantified; if violated, trained policies may fail or require significant re-training on the physical system. - Near-constant optimal valve position: the motivation for pre-storm BO relies on the claim (attributed to prior RL work [32]) that reactive controllers tend to hold near-constant valve settings throughout a storm; this is illustrated anecdotally but not proved theoretically — storm variability could invalidate the pre-storm planning assumption. - Perfect sensing and communication: all simulation experiments assume noise-free, synchronous sensor readings; real deployments (Ch. 3) involve polling-based asynchronous communication with unquantified latency and dropout. - Stationarity of rainfall statistics: RL training uses fixed design storms (25-year 6-hour, 10-year 24-hour); climate non-stationarity would shift the distribution and may degrade policy performance without retraining.

Missing context or citations: - Model predictive control (MPC) is cited as state-of-the-art (Ch. 4 intro) but is never directly compared to RL or BO in a controlled experiment; the dissertation cannot claim superiority over MPC without this comparison. - Multi-agent RL (MARL) is not discussed despite coordinated multi-asset control being a central goal; relevant MARL literature for networked infrastructure is absent. - Water quality objectives are addressed only in Ch. 2 (simulation) and Ch. 3 (conceptually); the RL and BO chapters focus exclusively on hydraulic (flow/level) objectives, leaving the water quality control gap largely open. - The equal-filling degree algorithm — a competitive heuristic from the literature — appears only in Appendix B as a post-hoc comparison with RL; it is not introduced as a formal baseline in the main Ch. 4 results.

Possible experimental / analytical issues: - Cherry-picking in Ch. 4: single-basin RL results are presented for "the episode that resulted in the highest reward" (Fig. 4.3 caption) rather than mean performance across training runs or held-out evaluations; this overstates policy reliability. - BO vs. GA budget fairness: comparison uses only 30 function evaluations; GA convergence typically requires many more; the comparison may structurally favor BO without establishing whether GA would eventually match BO given a larger budget. - No formal statistical tests: Table 5.1 reports mean ± SD but no p-values or confidence intervals on the BO-vs-GA comparison; statistical significance is not established. - Generalization of RL: training on specific design storms and testing on nearby storm magnitudes (Fig. 4.7) is not a true out-of-distribution test; performance on prolonged low-intensity events or back-to-back events (addressed briefly in Appendix B but described as the controller treating two events as one) reveals a fundamental limitation of the state-space design. - pystorms reproducibility: scenarios are derived from "anonymized" real-world networks — the anonymization prevents independent verification that scenarios are representative of actual system dynamics. - No safety or stability guarantees: the RL controller has no formal constraint mechanism; it could produce valve actions that cause flooding or structural stress, yet no safety layer or constraint formulation is presented.

Ideas for future work: - Deploy trained RL or BO controllers on the Ann Arbor sensor-actuator network (Ch. 3) and measure sim-to-real performance degradation, quantifying the fidelity gap and informing necessary re-training or domain randomization. - Formulate a constrained multi-agent RL problem where each controlled basin has a local agent sharing a global reward, enabling decentralized execution and reducing the combinatorial action space as networks scale beyond 3–4 basins. - Extend the RL/BO reward and objective functions to include water quality metrics (e.g., nitrate load reduction from Ch. 2), testing whether joint hydraulic-quality optimization degrades performance on either objective individually. - Investigate transfer learning or meta-RL across pystorms scenarios to determine whether a policy trained on one network topology generalizes to others without full retraining — a prerequisite for practical deployment at scale.

Methods

Deep Q-Learning
Bayesian Optimization
Gaussian Processes
model-predictive control
equal-filling degree algorithm
coupled hydraulic-water-quality modeling
sensor-actuator networks
genetic algorithms

Datasets

Ann Arbor wireless sensor-actuator network watershed
simulated urban stormwater network with 11 basins
combined sewer network with inflatable storage dams

Claims

Real-time coordinated control of distributed stormwater assets can achieve system-scale watershed outcomes that localized best management practices cannot guarantee.
Deep reinforcement learning can serve as a model-free algorithm for controlling stormwater networks without requiring explicit dynamical assumptions or surrogate models.
Bayesian Optimization identifies optimal pre-storm control actions more efficiently than genetic algorithms and can quantify rainfall uncertainty impacts on control decisions.
A wireless sensor-actuator network can precisely shape hydrographs at the outlet of an urban watershed through coordinated multi-site control.
The open-source pystorms library enables systematic quantitative evaluation and comparison of stormwater control algorithms across standardized scenarios.