Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document download scripts for models #118

Open
znichollscr opened this issue Aug 29, 2024 · 11 comments
Open

Document download scripts for models #118

znichollscr opened this issue Aug 29, 2024 · 11 comments

Comments

@znichollscr
Copy link
Collaborator

@vnaik60 a question for you!

It seems pretty clear that we're going to have more data on input4MIPs than any one modelling center needs (e.g. you don't need 5 different resolutions of greenhouse gas concentrations, you'll only need one). Hence, our advice to modelling centers will be more nuanced than, "Download all the data".

I was thinking that it would be very helpful if we started a collection of example downloads. I was hoping we could start with your model.

I think the requirements would be relatively simple. We would need to know, for your model:

  • which solar data do you need (answer is probably all)
  • which volcanic data do you need (answer is probably all)
  • which greenhouse gas data do you need (global-mean for co2, n2o and some equivalent species, maybe 15-degree data for CH4?)
  • which emissions do you need (I have no idea on this one)
  • etc.

Then, probably it's also helpful to document any post-processing steps which are likely to be used by multiple models. For example, processing Thomas' data onto the wavelengths of interest for your model.

My instinct is to put this documentation in this repository, so we can update it as soon as we have new data landing. It might make more sense to put it elsewhere of course, so open to suggestions!

cc @durack1

@vnaik60
Copy link
Collaborator

vnaik60 commented Sep 3, 2024

Hi @znichollscr, could you please remind me the motivation for providing this ("collection of example downloads")? I would like to understand the reasons behind this effort before sending you what our model(s) use or getting into a long thread on why this ("example downloads"), despite being well-intended, may not be so useful :-).

@znichollscr
Copy link
Collaborator Author

znichollscr commented Sep 3, 2024

That's a good point, the ask was too vague.

The intent: for some data sets, we have quite a lot of data, not all of which modelling groups will need. My hope was to build up a resource that shows groups how they can pick just the data they need, to help them navigate the available data, avoid them just downloading it all and hopefully avoid some confusion.

I started thinking about this because of the GHG concentration and emissions data. For the GHG concentration, we're providing data on 5 different grids and for 43 different species. No group needs all that, so showing them how to filter for just what they're interested seemed a good idea, and I figured that I may as well use a real use case rather than just making up a pretend group that only uses global-means. For emissions, they're providing data on 2 grids and have this split between 'main' and 'supplementary'. My thinking was basically the same, it's better to use a real use-case rather than inventing a group that uses all the main data on a 0.5 degree grid and then a few (but not all) the files from the supplementary.

Perhaps a better, much narrower set of questions to get us started then:

  • for solar data, I assume you use the piControl. On top of that, do you use the monthly or daily data?
  • for greenhouse gas data, what data do you use? For example, just global-mean for CO2, CH4, N2O and CFC12eq? Or global-mean for CO2, N2O, CFC12, CFC11eq and HFC134aeq, then 15 degree latitudinal data for CH4? Or something else of course
  • for anthropogenic emissions, do you use the 0.5 degree or 0.1 degree data? Do you need all the emissions or just some of them? Do you need any of the supplementary data that Steve's group provides?

@durack1
Copy link
Contributor

durack1 commented Sep 3, 2024

@vnaik60 I believe what @znichollscr is pointing out here that my vague "use esgpull to get input4MIPs data" is a useless comment. Whereas, if you have a little recipe that allows (for e.g. NOAA-GFDL) to get your target data beginning-to-end, then this is a more tangible and useable example that gets modeling groups moving far more quickly than having to work all this stuff out themselves with little guidance (or examples)..

@vnaik60
Copy link
Collaborator

vnaik60 commented Sep 3, 2024

Thanks both!

We do not have a recipe at GFDL for downloading input4mips datasets, we download all that is available with the thought that someone in the lab may need the dataset at some point. Of course, I will acknowledge that we have a never-ending archive which facilitates this, so at GFDL we are privileged!

I can see how this maybe a useful endeavor, especially for newcomer modeling groups who are just spinning up on running CMIP simulations. However, I would not recommend doing this exercise and rather focusing on documenting each dataset with the data provider's recommendations on dos and donts related to their datasets. My reasons are as follows:

  • Modeling centers know what datasets are needed for running their models (for example, very informative paper on the implementation of forcings in the IPSL model). They have people/team who pre-process or modify model code to read in forcing datasets. For example, at GFDL we have a forcings subgroup with folks responsible for specific forcing datasets and we have a Data Services group that supports our research/CMIP activities by providing a unified repository of datasets of interest to lab folks. I guess every modeling center would have something akin to this whether they participate in CMIP or not.
  • Forcing data needs for each model (within or across modeling centers) depends on the complexity of models. For example, at GFDL, we run models that represent chemistry in two ways -  simple aerosol chemistry or the comprehensive gas-phase and aerosol chemistry, which means that we need to download emissions for all aerosol species and most if not all gaseous species (including the supplemental VOC species). The complexity of the chemical mechanism determines which gaseous species are needed (our chemical mechanism is on the lighter side compared to that in CESM2 , for example).
  • Models continually evolve in their representation of physics and chemistry processes, which means that their forcing data needs also evolve. For example, the radiation code in our previous generation model is really old and uses a minimal number of species (global mean CO2, CH4, N2O, CFC-12, CFC-113, and HCFC-22) but we are moving towards the new generation RRTMG which would allow more realitic representation of the GHG distributions (latitudinal/vertical distribution) and more species. And this means that our GHG forcing data needs will change. Models are also evolving in their spatial resolution requiring higher resolution forcing datasets. For example, at least three US modeling centers have a variable resolution version of their global models (possibly EU models as well), which means that they need higher resolution forcing datasets, especially emissions for realistic representation of chemistry/air quality. These models do not participate in CMIP (at least not just yet) but are used for focussed regional climate, air quality etc assessments (more on this in the last point). The 0.1 resolution CEDS data is very useful for these models and possibly for HighresMIP, though I am not totally sure.
  • Finally, forcings data collected/produced for CMIP is not just used for CMIP simulations but is used for a variety of purposes by modeling/user groups outside of the CMIP effort (this is not very well recognized in the community but I think it should be). For example, models participating in AeroCom, which is not a CMIP registered/endorsed MIP, used CMIP6 forcings. These models vary in their complexity (ranging from chemistry transport models to the comprehensive ESMs) and therefore in their forcing dataset needs.  

I think what is definitely needed is a forcing dataset guide or manual (in addition to the nice table @znichollscr and @durack1 have worked on) that describes in little bit more detail on what is available on ESGF and how it can or not be used (separate from a journal paper), just like here, and more specifically here , here, and here.

Short answers to your specifica Qs:

  • for solar data, I assume you use the piControl. On top of that, do you use the monthly or daily data?

yes, monthly.

  • for greenhouse gas data, what data do you use? For example, just global-mean for CO2, CH4, N2O and CFC12eq? Or global-mean for CO2, N2O, CFC12, CFC11eq and HFC134aeq, then 15 degree latitudinal data for CH4? Or something else of course

already mentioned above. For chemistry, we use latitudinally varying CH4 concentrations for lower boundary conditions. And there are other configurations of the model that have different dataset needs - for example, we also run with CH4 emissions in which case we do not specify CH4 concentrations.

  • for anthropogenic emissions, do you use the 0.5 degree or 0.1 degree data? Do you need all the emissions or just some of them? Do you need any of the supplementary data that Steve's group provides?

CMIP class simulations use 0.5deg but as I mentioned above, 0.1deg is used by our variable-grid resolution model. Here are the species needed by our most comprehensive ESM model:
"NO","CO","H2","NH3","CH2O", \ ; NOx is as NO2 in anthro but as NO in BB emissions
"C2H4","C2H6","C3H6",
"C3H8","C4H10",
"CH3OH","C2H5OH",
"ACETONE", "BC", "OC", "SO2"

@znichollscr
Copy link
Collaborator Author

Thanks @vnaik60, super helpful to understand and very well explained!

I think what is definitely needed is a forcing dataset guide or manual (in addition to the nice table @znichollscr and @durack1 have worked on) that describes in little bit more detail on what is available on ESGF and how it can or not be used (separate from a journal paper), just like here, and more specifically here , here, and here.

Got it, that's a good next step then!

@znichollscr
Copy link
Collaborator Author

When we get to writing these docs, we should try to blend them with the air table FAQs too. FAQs here: https://wcrp-cmip.org/cmip7-task-teams/forcings/#frequently_asked_questions_faqs

@eleanororourke
Copy link

Can I suggest for the documentation instead of a google doc like CMIP6 Forcings Dataset summary that we have a Zenodo versioned document?

@znichollscr
Copy link
Collaborator Author

Absolutely! I was hoping to pull up a demo that was built like our current docs (so, managed and updated as code as part of this repo, so things are managed in one place). We could then automatically publish updates on Zenodo as needed/desired.

@vnaik60
Copy link
Collaborator

vnaik60 commented Sep 27, 2024

@znichollscr a demo would be wonderful! And if our data providers can have access to enter the information in the doc, that would be totally dream come true!

I was thinking about additional (more than that contained in the metadata) information from modelers' perspective to ensure that the forcings datasets are implemented as intended by the data providers. Here is an initial draft (pasting editable table messed up the format, hence the image)

image

Additionally, there was a request for summary statistics "Could summary statistics (e.g., timeseries of global emission totals) be provided together with the released datasets, so that modellers could confirm that they are ingesting these datasets correctly into their models?" and we also discussed documentation here. So looks like we know what we want (at least to begin with), it is just a matter of producing the right information in the right format.

@durack1
Copy link
Contributor

durack1 commented Sep 29, 2024

It is totally a good idea to keep information about the datasets close to this registered info (input4MIPs_CVs), markdown is very adaptable for this format as well, and could easily be lifted into zenodo as well..

@znichollscr
Copy link
Collaborator Author

This is still on my to-do list, but while that hasn't happened, another source of info to keep our eyes on: https://padlet.com/cmipipo/the-now-cmip7-deck-forcing-suite-zhtlaqh8qrmltktt/wish/lDK1ZRb7z7RMZJ9z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants