The role of deep learning in urban water management: A critical review

Guangtao Fu, Yiwen Jin, Siao Sun, Zhiguo Yuan, David Butler · Water Research · 2022

[doi]

The Role of Deep Learning in Urban Water Management: A Critical Review

Authors: Guangtao Fu, Yiwen Jin, Siao Sun, Zhiguo Yuan, David Butler Year: 2022 Tags: deep-learning, urban-water-systems, anomaly-detection, flood-forecasting, reinforcement-learning, critical-review

TL;DR

A narrative critical review of deep learning (CNN, LSTM, GNN, DRL, autoencoder) applications across urban water management sub-domains: demand forecasting, leakage/contamination detection, sewer defect assessment, wastewater state prediction, cyber security, urban flooding, and real-time control. Most applications remain at the experimental stage validated on synthetic, benchmark, or laboratory data; leakage detection is identified as closest to operational adoption. Five research challenges — data privacy, algorithmic development, explainability, multi-agent systems, and digital twins — are proposed as priorities.


First pass — the five C's

Category. Survey / narrative critical review.

Context. Urban water systems engineering + AI/ML subfield. Builds on: LeCun et al. (2015) — foundational deep representation learning theory; Shen (2018) — deep learning in hydrology; Makropoulos and Savic (2019) — ML in the water sector broadly; IWA (2019) — water sector digitalisation agenda.

Correctness. Three load-bearing assumptions: (1) deep learning outperforms conventional ML for UWS tasks — partially supported within individual cited studies but not systematically synthesised; (2) synthetic/lab/benchmark-derived results generalise to real operational systems — authors themselves acknowledge this is unproven for most domains; (3) the five identified challenges are the correct and complete set — asserted rather than derived from evidence.

Contributions. - Taxonomic mapping of DL architectures (CNN, LSTM, GAN, GNN, DRL) to specific UWS problem types (Table 1), including data requirements and available open datasets. - Empirical finding that practical DL adoption across UWS is early-stage; no domain except leakage detection has reported day-to-day operational deployment. - Identification of DRL as an emerging paradigm for UWS real-time control, previously unreviewd in a water-focused survey. - Articulation of five forward-looking research challenges linking DL maturation to UWS digitalisation goals.

Clarity. Well-structured and readable for an interdisciplinary audience; the provided text is truncated mid-sentence in Section 3.5.2, with Sections 4–5 entirely absent from the supplied manuscript — so key research challenges and conclusions cannot be evaluated from the text provided.


Second pass — content

Main thrust: Across all major UWS sub-domains, DL methods show consistent accuracy gains over traditional ML in controlled settings, but real-world operational adoption is rare; leakage detection is the exception, and DRL is the most nascent application area with the highest theoretical upside for autonomous control.

Supporting evidence: - Fang et al. (2019): best CNN achieved 97.33% leakage detection accuracy with 21 pressure sensors on a 400 m lab network, falling to 92.11% with 8 sensors; multi-leak accuracy dropped from 96.43% (single leak) to 91.56% (three-point leak). - Cody et al. (2020): variational autoencoder on acoustic spectrograms achieved 97.2% accuracy detecting a 0.25 L/s leak on a lab testbed connected to a municipal supply line. - Qian et al. (2020): balanced LSTM achieved F1 = 0.7819 on the GECCO water quality dataset (1.452% anomaly rate); Muharemi et al. (2019) achieved F-score = 0.9 on the same dataset using time-series cross-validation, though both models generalised poorly to a new dataset. - Hajgató et al. (2021): GNN reconstructed nodal pressures with <5% relative error on average at a 5% sensor observation ratio across three benchmark networks. - Bowes et al. (2021): DDPG actor-critic outperformed model predictive control and rule-based strategies for detention pond flood control on a hypothetical catchment; shown robust to rainfall forecast and water-level measurement uncertainties.

Figures & tables: - Fig. 1 (architecture diagrams for MLP, autoencoder, LSTM, CNN, DRL, GNN): conceptual/schematic — no quantitative axes; useful for orientation but adds no empirical content. - Fig. 2 (DL algorithm-to-application mapping): categorical bubble-style map — no quantitative axes, no counts of studies per cell. - Table 1 (summary across all problems): lists algorithm, data type, case study, open dataset, and qualitative advantage per sub-domain — no numerical performance metrics, no error bars, no statistical significance; the most useful single artefact in the paper but lacks quantitative synthesis. - No confidence intervals or error bars appear anywhere in the review text.

Follow-up references: - LeCun et al. (2015) — theoretical foundation for representation learning underlying all reviewed architectures. - Taormina and Galelli (2018) + Taormina et al. (2018) — autoencoder cyber-attack detection with the public BATADAL dataset; most rigorous single benchmark study cited. - Bowes et al. (2021) — DDPG flood control with publicly available data and code; best-documented DRL case. - Guo et al. (2021b) — CNN-based urban flood depth prediction with publicly available data and code; strongest urban flood forecasting result cited.


Third pass — critique

Implicit assumptions: - Performance achieved on synthetic, lab, or benchmark data generalises to real operational networks — the most consequential assumption; the paper acknowledges it is largely untested but does not quantify the generalisation gap. - The reviewed literature is representative of the field — no systematic search protocol, database list, date range, or inclusion/exclusion criteria is stated anywhere in the provided text, making coverage unverifiable. - Reported accuracy metrics across independent studies (different datasets, sensor densities, leak sizes) are comparable — they are not, and the review does not normalise or flag this. - DL is strictly advantageous over conventional ML across all UWS domains — several cited comparisons show marginal gains or context-dependence (e.g., Muharemi et al. found LSTM did not outperform SVM on the GECCO set).

Missing context or citations: - No engagement with physics-informed neural networks (PINNs), which directly address the gap between physically-based and purely data-driven models for UWS surrogate modelling. - Uncertainty quantification for DL predictions (e.g., Bayesian deep learning, conformal prediction) is absent despite being critical for operational decision support. - No comparison with or positioning against Shen (2018) or other existing DL-in-hydrology reviews to delineate what is new here. - Transfer learning between water networks of different scales and geographies is mentioned only briefly, not reviewed. - Federated learning as a privacy-preserving approach (relevant to the stated data privacy challenge) is not discussed.

Possible experimental / analytical issues: - No meta-analysis or quantitative synthesis; cross-study performance comparison is entirely qualitative, making it impossible to determine which architecture class is actually superior for any given UWS task. - High accuracies for leakage detection (e.g., 97.33%) come from lab networks with sensor densities (8–21 sensors per 400 m) that are impractical in real systems — the review notes this but does not weight the findings accordingly. - The claim that leakage detection is "at the forefront of practical implementation" is asserted without a systematic tally of deployment reports or practitioner case studies. - Reproducibility: the review notes only a handful of studies share data or code; no reproducibility scoring or systematic assessment is provided. - The supplied manuscript is truncated in Section 3.5.2 and omits Sections 4–5 entirely; the five stated research challenges and the conclusions cannot be evaluated from the provided text.

Ideas for future work: - Conduct a PRISMA-compliant systematic review with explicit search terms, databases, and inclusion criteria to make literature coverage auditable and reproducible. - Develop standardised, real-operational-data benchmarks (analogous to BATADAL/BattLeDIM for leakage) for demand forecasting, wastewater state prediction, and flood forecasting to enable fair cross-algorithm comparison. - Empirically benchmark physics-informed DL against purely data-driven DL on the same UWS datasets to quantify whether physical constraints improve generalisation to unseen network configurations. - Conduct field trials comparing lab/synthetic-trained models against models fine-tuned on real operational data to directly measure the generalisation gap that the review identifies but cannot quantify.

Methods

  • CNN
  • LSTM
  • GRU
  • Autoencoder
  • GNN
  • Deep Reinforcement Learning
  • GAN
  • hybrid CNN-LSTM
  • DDPG
  • DQN
  • 1D CNN
  • U-Net
  • faster R-CNN
  • YOLO
  • stochastic gradient descent

Datasets

  • BattLeDIM 2020
  • GECCO challenge dataset
  • BATADAL dataset
  • CANARY dataset
  • Sewer-ML dataset

Claims

  • Deep learning applications in urban water management are still at an early stage, with most studies relying on benchmark networks, synthetic data, or pilot systems rather than practical deployment.
  • Leakage detection is at the forefront of practical implementation among urban water management problems reviewed.
  • CNNs are the predominant deep learning approach for leakage detection using flow/pressure data or acoustic/vibration signals.
  • Five key research challenges—data privacy, algorithmic development, explainability and trustworthiness, multi-agent systems, and digital twins—must be addressed to advance deep learning in water management.
  • Deep reinforcement learning outperforms model predictive control and rule-based strategies for automated flood control of urban drainage systems.