RivRetrieve-Python: A Python package for facilitating and unifying access to global streamflow data

Simon Moulds, Thiago Nascimento, Ryan Riggs, George Allen, Frederik Kratzert · EGU General Assembly 2026 · 2026

[doi]

RivRetrieve-Python: A Python package for facilitating and unifying access to global streamflow data

Authors: Simon Moulds, Thiago Nascimento, Ryan Riggs, George Allen, Frederik Kratzert Year: 2026 Tags: streamflow-data, hydrometric-apis, large-sample-hydrology, open-source-python, river-monitoring, data-unification

TL;DR

RivRetrieve-Python is an open-source Python library that wraps more than 18 hydrometric APIs—covering more than 60,000 gauge stations globally—behind a single object-oriented interface for retrieving streamflow, stage, and river temperature. It targets known shortcomings of static large-sample datasets (CAMELS and kin): difficulty updating, missing variables, absent sub-daily data, inconsistent naming conventions, and stripped quality flags.

First pass — the five C's

Category. Research prototype / software tool description (conference abstract, not a full methods paper).

Context. Large-sample hydrology subfield; explicitly builds on CAMELS-style datasets as the motivating prior art. No other specific prior works or authors are named in the abstract.

Correctness. Load-bearing assumptions: (1) fragmented, inconsistent API access is a genuine bottleneck for global hydrological research; (2) a single abstraction layer can faithfully preserve enough metadata (quality flags, variable definitions) to make unified retrieval scientifically useful. Both are plausible but undemonstrated in this abstract.

Contributions. - Unified Python interface abstracting ≥18 hydrometric APIs with a consistent object-oriented design. - Coverage of >60,000 gauge stations globally for streamflow, stage, and river temperature as of January 2026. - Helper functions for bulk retrieval across multiple catchments. - Preservation and exposure of quality flags assigned by measuring authorities—absent in existing large-sample datasets.

Clarity. Clear and well-structured for a 300-word abstract; necessarily light on implementation detail, evaluation, and design trade-offs.

Second pass — content

Main thrust: A Python package hides the heterogeneity of 18+ national/international hydrometric APIs behind one interface, enabling researchers to pull comparable streamflow time series (including sub-daily and quality-flagged records) at continental to global scale without custom API wrappers.

Supporting evidence: - ≥18 hydrometric APIs supported at time of writing (January 2026). - ≥60,000 gauge stations in total coverage. - Variables retrieved: streamflow, stage, river temperature. - Sub-daily temporal resolution explicitly supported (not stated whether all APIs provide it). - Quality flags from measuring authorities retained (mechanism not described).

Figures & tables: None — this is a conference abstract only. No figures, axes, error bars, or statistical reporting are present.

Follow-up references: Only CAMELS is named (no author, year, or venue given in the abstract). No other cited works appear. Not stated which specific APIs are wrapped.

Third pass — critique

Implicit assumptions: - That abstracting API differences does not silently discard or homogenize scientifically important metadata (unit conventions, datum references, missing-data encodings). If this assumption fails, the "unified" interface introduces hidden errors. - That API stability and terms-of-service across 18+ providers will not degrade the package's reliability over time—a significant maintenance burden not discussed. - That 60,000 stations constitute meaningful global coverage without spatial or temporal bias assessment.

Missing context or citations: - No engagement with existing Python hydrology retrieval tools (e.g., dataretrieval, hydrofunctions, pynwis) or comparable R packages; it is unclear how RivRetrieve-Python differs from or improves on these. - No citation of specific CAMELS variants (CAMELS-US, CAMELS-GB, CAMELS-BR, etc.) that motivate the problem statement. - No discussion of data licensing heterogeneity across the 18 APIs.

Possible experimental / analytical issues: - No benchmark or validation: retrieval correctness, completeness, or latency against raw API calls is not tested or reported. - No reproducibility information beyond a GitHub URL; version, license, dependency stack, and CI status are not stated. - The claim of "more than 60,000 gauge stations" has no breakdown by region, variable availability, or temporal coverage, making it impossible to assess effective usability. - No ablation or comparison demonstrating that the unified interface adds value over calling individual APIs directly.

Ideas for future work: - Systematic spatial and temporal coverage audit: map which of the 60,000+ stations have sub-daily records, quality flags, and multi-variable overlap to identify true usable sample for large-sample studies. - Benchmark retrieval fidelity by cross-checking package output against manually downloaded raw API responses for a stratified sample of stations. - Publish a versioned, DOI-stamped dataset snapshot generated via the package to enable reproducible large-sample studies that reference a fixed data state. - Evaluate robustness to API deprecation or breaking changes across providers to quantify long-term maintenance risk.

Methods

  • object-oriented API abstraction
  • hydrometric API integration
  • helper functions for batch data retrieval

Datasets

  • CAMELS
  • 18+ hydrometric APIs with 60000+ gauge stations

Claims

  • RivRetrieve-Python provides a unified interface to streamflow, stage, and river temperature data from more than 18 hydrometric APIs covering over 60,000 gauge stations.
  • An object-oriented design abstracts implementation details of hydrometric APIs, offering a consistent interface regardless of data source or variable.
  • The library addresses shortcomings of existing large-sample hydrology datasets, including difficulty of updates, limited variable coverage, and inconsistent naming conventions.
  • RivRetrieve-Python is expected to enable future research on real-time river monitoring, digital twins, hydrological prediction, and sub-daily hydrological variability and extremes.