RivRetrieve-Python: A Python package for facilitating and unifying access to global streamflow data
[doi] streamflow-datahydrologyopen-source-softwarelarge-sample-hydrologyriver-monitoring
RivRetrieve-Python: A Python package for facilitating and unifying access to global streamflow data
Authors: Simon Moulds, Thiago Nascimento, Ryan Riggs, George Allen, Frederik Kratzert Year: 2026 Tags: streamflow-data, hydrometric-apis, large-sample-hydrology, open-source-python, river-monitoring, data-unification
TL;DR
RivRetrieve-Python is an open-source Python library that wraps more than 18 hydrometric APIs—covering more than 60,000 gauge stations globally—behind a single object-oriented interface for retrieving streamflow, stage, and river temperature. It targets known shortcomings of static large-sample datasets (CAMELS and kin): difficulty updating, missing variables, absent sub-daily data, inconsistent naming conventions, and stripped quality flags.
First pass — the five C's
Category. Research prototype / software tool description (conference abstract, not a full methods paper).
Context. Large-sample hydrology subfield; explicitly builds on CAMELS-style datasets as the motivating prior art. No other specific prior works or authors are named in the abstract.
Correctness. Load-bearing assumptions: (1) fragmented, inconsistent API access is a genuine bottleneck for global hydrological research; (2) a single abstraction layer can faithfully preserve enough metadata (quality flags, variable definitions) to make unified retrieval scientifically useful. Both are plausible but undemonstrated in this abstract.
Contributions. - Unified Python interface abstracting ≥18 hydrometric APIs with a consistent object-oriented design. - Coverage of >60,000 gauge stations globally for streamflow, stage, and river temperature as of January 2026. - Helper functions for bulk retrieval across multiple catchments. - Preservation and exposure of quality flags assigned by measuring authorities—absent in existing large-sample datasets.
Clarity. Clear and well-structured for a 300-word abstract; necessarily light on implementation detail, evaluation, and design trade-offs.
Second pass — content
Main thrust: A Python package hides the heterogeneity of 18+ national/international hydrometric APIs behind one interface, enabling researchers to pull comparable streamflow time series (including sub-daily and quality-flagged records) at continental to global scale without custom API wrappers.
Supporting evidence: - ≥18 hydrometric APIs supported at time of writing (January 2026). - ≥60,000 gauge stations in total coverage. - Variables retrieved: streamflow, stage, river temperature. - Sub-daily temporal resolution explicitly supported (not stated whether all APIs provide it). - Quality flags from measuring authorities retained (mechanism not described).
Figures & tables: None — this is a conference abstract only. No figures, axes, error bars, or statistical reporting are present.
Follow-up references: Only CAMELS is named (no author, year, or venue given in the abstract). No other cited works appear. Not stated which specific APIs are wrapped.
Third pass — critique
Implicit assumptions: - That abstracting API differences does not silently discard or homogenize scientifically important metadata (unit conventions, datum references, missing-data encodings). If this assumption fails, the "unified" interface introduces hidden errors. - That API stability and terms-of-service across 18+ providers will not degrade the package's reliability over time—a significant maintenance burden not discussed. - That 60,000 stations constitute meaningful global coverage without spatial or temporal bias assessment.
Missing context or citations: - No engagement with existing Python hydrology retrieval tools (e.g., dataretrieval, hydrofunctions, pynwis) or comparable R packages; it is unclear how RivRetrieve-Python differs from or improves on these. - No citation of specific CAMELS variants (CAMELS-US, CAMELS-GB, CAMELS-BR, etc.) that motivate the problem statement. - No discussion of data licensing heterogeneity across the 18 APIs.
Possible experimental / analytical issues: - No benchmark or validation: retrieval correctness, completeness, or latency against raw API calls is not tested or reported. - No reproducibility information beyond a GitHub URL; version, license, dependency stack, and CI status are not stated. - The claim of "more than 60,000 gauge stations" has no breakdown by region, variable availability, or temporal coverage, making it impossible to assess effective usability. - No ablation or comparison demonstrating that the unified interface adds value over calling individual APIs directly.
Ideas for future work: - Systematic spatial and temporal coverage audit: map which of the 60,000+ stations have sub-daily records, quality flags, and multi-variable overlap to identify true usable sample for large-sample studies. - Benchmark retrieval fidelity by cross-checking package output against manually downloaded raw API responses for a stratified sample of stations. - Publish a versioned, DOI-stamped dataset snapshot generated via the package to enable reproducible large-sample studies that reference a fixed data state. - Evaluate robustness to API deprecation or breaking changes across providers to quantify long-term maintenance risk.
Methods
- object-oriented API abstraction
- hydrometric API integration
- helper functions for batch data retrieval
Datasets
- CAMELS
- 18+ hydrometric APIs with 60000+ gauge stations
Claims
- RivRetrieve-Python provides a unified interface to streamflow, stage, and river temperature data from more than 18 hydrometric APIs covering over 60,000 gauge stations.
- An object-oriented design abstracts implementation details of hydrometric APIs, offering a consistent interface regardless of data source or variable.
- The library addresses shortcomings of existing large-sample hydrology datasets, including difficulty of updates, limited variable coverage, and inconsistent naming conventions.
- RivRetrieve-Python is expected to enable future research on real-time river monitoring, digital twins, hydrological prediction, and sub-daily hydrological variability and extremes.