-
Notifications
You must be signed in to change notification settings - Fork 35
Sharing New Data
Caravan is a community project. It depends on the contributions from people around the world who share data. If you have data for one of the white spots on our map, this document is for you. It describes how to share data in a way that it is usable with the rest of Caravan.
To make your data a part of Caravan, the following two things are required:
- The data itself. This includes:
- Catchment attributes
- Timeseries data
- A Shapefile of catchments
- License information. Your data must be shared with a permissive license. We recommend you use CC-BY-4.0, but other licenses may be compatible, too.
This guide assumes that you have followed the tutorial on how to extend Caravan and successfully run the two notebooks that download and locally post-process data from Google Earth Engine. The following sections will explain what to do when you have collected all data and are ready to share it with the community.
After you followed the Caravan extension guide with your data, you should have ended up with a folder on your disk that has the following structure:
The root folder of your dataset must contain the following folders:
attributes
-
timeseries
with subfolderscsv/
andnetcdf/
shapefiles
licenses
Each of these folders must contain a subfolder {BASIN_PREFIX}
(the short string that must be unique within the Caravan data space and describes your dataset) and nothing else. This way, your new sub-dataset can easily be merged with the existing Caravan data by copying the {BASIN_PREFIX}
folders into the official Caravan folders.
The attributes/{BASIN_PREFIX}/
folder should contain two comma-separated .csv files with exactly the following names:
attributes_caravan_{BASIN_PREFIX}.csv
attributes_hydroatlas_{BASIN_PREFIX}.csv
The timeseries
folder must contain the time-series data (meteorological forcings, streamflow) as both csv files (in the csv/{BASIN_PREFIX}/
subfolder) and as netCDF files (in the netcdf/{BASIN_PREFIX}/
subfolder).
The shapefiles/{BASIN_PREFIX}/
folder should contain a shapefile of all catchments that you are contributing to Caravan.
The licenses/{BASIN_PREFIX}/
folder should contain a single markdown file called license_{BASIN_PREFIX}.md
which contains information on the license, sources, and references for your data.
Take a look at the license files from existing Caravan sub-datasets to get an idea of how this file should look like.
Note that your data must be shareable under a permissive license that is compatible with CC-BY-4.0.
When all data is in the correct format and folder structure, you are ready to upload it to the Zenodo data archive. Zenodo is a free service where you can upload your data and get a DOI for it.
Once your data is published on Zenodo, head over to the issues section of the Caravan GitHub page and create a new entry, using the "Data Contribution" template that's provided there. There, you will need to fill in some information on the dataset. Once all information is complete, a Caravan maintainer will post the information about your contribution in the New Data discussion thread.
That's it! Congratulations and a big thank you for sharing your data!