Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DATA CONTRIBUTION] GAGES II #24

Open
1 of 3 tasks
thodson-usgs opened this issue Sep 21, 2023 · 3 comments
Open
1 of 3 tasks

[DATA CONTRIBUTION] GAGES II #24

thodson-usgs opened this issue Sep 21, 2023 · 3 comments

Comments

@thodson-usgs
Copy link

thodson-usgs commented Sep 21, 2023

Basin prefix

gages-ii

Zenodo DOI

TBD

Number of catchments

~9,000

Location of catchments

United States

For which periods are streamflow records available in your dataset?

1900-present

Please list any sources of the data you contributed.

The GAGES II dataset consists of gages which have had either 20+ complete years (not necessarily continuous) of discharge record since he GAGES II dataset consists of gages which have had either 20+ complete years (not necessarily continuous) of discharge record since 1950, or are currently active, as of water year 2009, and whose watersheds lie within the United States, including Alaska, Hawaii, and Puerto Rico. Reference gages were identified based on indicators that they were the least-disturbed watersheds within the framework of broad regions, based on 12 major ecoregions across the United States. Of the 9,322 total sites, 2,057 are classified as reference, and 7,265 as non-reference. Of the 2,057 reference sites, 1,633 have (through 2009) 20+ years of record since 1950. Some sites have very long flow records: a number of gages have been in continuous
service since 1900 (at least), and have 110 years of complete record (1900-2009) to date.

License

CC0

Additional context

Is there interest in adding USGS's GAGES II dataset of ~9000 sites. We could easily pull the polygons from the National Hydrology Dataset (NHD). Data summary below. Of note, these are long-term streamgages but they may have data gaps, so you may prefer a subset: only reference, only complete, etc.

Checklist

  • I have uploaded my dataset on Zenodo, where it is accessible under the DOI provided above.
  • I used a basin prefix that is not yet used by any other Caravan sub-dataset (you can check this via the Data Contributions discussion thread, where all accepted Caravan contributions are listed).
  • Permissive License: My data is available under a license that is compatible with the Caravan CC-BY-4.0 license (the easiest way to be sure about this is if your data uses CC-BY-4.0, too).
@thodson-usgs
Copy link
Author

thodson-usgs commented Sep 21, 2023

Given the size, I thought it best to inquire first.
Also, HYSETS must include a substantial subset of GAGES II, so I'll investigate why they didn't include the whole thing.

@thodson-usgs
Copy link
Author

HYSETS claims 14,425 North American watersheds, which makes me think they include all of GAGES II. Yet, Caravan only includes 4621 HYSETS watersheds. Perhaps you've filtered the other GAGES locations based on some other criteria?

@kratzert
Copy link
Owner

kratzert commented Sep 22, 2023

Hi Timothy,

thanks for reaching out and suggesting to add GAGES-II. Last things first: Regarding HYSETS, not sure if they used GAGES-II or not. From a quick glance at their paper they simply grabbed all station data from the USGS streamflow portal. We only included 4621 basins (so far), because we initially only considered basins smaller than 2000km2 for Caravan. This will likely change soon and we could add all of HYSETS (and the other datasets).

That being said, I don't see a problem why we should not have a dedicated GAGES-II extension, since GAGES-II is an established dataset and having a dedicated extension for GAGES-II gauges would facilitate people to run e.g. global models on all stations in GAGES-II without the need to make sure which stations from CAMELS-US/HYSETS are in GAGES-II and which are not.

Also: I have a slight tendency of extensions from the data providers directly, rather than taking data from other published datasets, as it has a stronger signal to the community (at least this is my thought). So "USGS" creating a Caravan extension has a different meaning than "me" grabbing data from a different published dataset and republishing the data in Caravan.

More than 9000 gauges though is a lot to process and even with the provided code still takes some effort (as you would need to download hourly data in csv format from Earth Engine and process these with the second provided notebook locally). I offered this to other groups that are interested in publishing these large amounts of stations, so I will offer the same to you: I can get all meteorological forcings + catchment attributes for you, if you would go ahead and merge the data with streamflow + upload it. If that would be of interest for you, reach out to me via email (I think you should have my mail).

Another thing: In HYSETS we only have data starting 1980, but for another extension that will be published soon, we actually started with data from 1950 (start of ERA5), which we could do here as well and which would add a substantial amount of data to all US gauges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants