Skip to content

Latest commit

 

History

History
143 lines (93 loc) · 5.87 KB

README.md

File metadata and controls

143 lines (93 loc) · 5.87 KB

ORNL DAAC ICARTT to netCDF4

quick start

Requirements: Python 3

Create/edit configuration YAML file (e.g. ACTAMERICA_B200.yml)

Point it at the ICARTT input directory (DIR_ICARTT) and the output directory (DIR_OUTPUT):

# Path to some input ICARTT files. Will crawl recursively.
DIR_ICARTT: inputs/ACTAMERICA_Merge/B200/

# Path to write output resource files and netCDFs.
DIR_OUTPUT: outputs/ACTAMERICA_Merge_B200/

# Path to variable reference table, maps ICARTT<>netCDF<>CF.json names.
VARIABLES: references/actamerica/VARIABLES_B200.csv

# Path to JSON representation of netCDF output structure.
STRUCTURE: references/actamerica/STRUCTURE_B200.json

# Paths within resources subdirectory of DIR_OUTPUT (no need to change).
RESOURCES:

    # Parsed header + helpful info from each ICARTT file are written here.
    ICARTT_HEADERS: icartt_headers/

    # Variable reference parsed from ICARTT headers are written here.
    ICARTT_VARIABLES: icartt_variables/

Execute

Then pass the configuration file as the only argument to the ornldaac_icartt_to_netcdf module (to ornldaac_icartt_to_netcdf/__main__.py):

python -m ornldaac_icartt_to_netcdf [CONFIG].yml

As long as you have place the ICARTTs in an accessible folder and the full path is given in the CONFIG.yml file, everything should be good. I will revise this soon.

to do

  • fill value replacement -- OUTPUT_FILL column has no affect for now
  • user function -- USER_FUNC will add soon

overview

what

This script translates ICARTT files into archive ready netCDFs.

why

The ORNL DAAC requires netCDF data sets to be self-describing according to CF conventions (as much as is reasonable) because they feel it increases the longevity of data and provides conveniences to data users in ways that the people reading this will I'm sure understand.

how

The code is described line by line in the _utils.py and __main__.py scripts. To summarize:

1. read and validate config file

Example: ACTAMERICA_B200.yml

DIR_ICARTT: inputs/ACTAMERICA_Merge/B200/
DIR_OUTPUT: outputs/ACTAMERICA_Merge_B200/
VARIABLES: references/actamerica/VARIABLES_B200.csv
STRUCTURE: references/actamerica/STRUCTURE_B200.json
RESOURCES:
  ICARTT_HEADERS: icartt_headers/
  ICARTT_VARIABLES: icartt_variables/
  • DIR_ICARTT: inputs/ACTAMERICA_Merge/B200/

Path to the directory containing the input ICARTT files.

  • DIR_OUTPUT: outputs/ACTAMERICA_Merge_B200/

Path to the output directory where resource files (ICARTT header and variable metadata) will be written along with the converted netCDF files.

  • VARIABLES: references/actamerica/VARIABLES_B200.csv

Variable table maps the input ICARTT variable names (ICARTT_NAME) to the output netCDF variable names (OUTPUT_NAME) and, most importantly, to the JSON reference file (VARIABLE_MAP).

The other fields should be self explanatory. To apply an arithmetic function over the data during the netCDF write, stick a string with the value==x into the USER_FUNC column of the variables table:

"{x}+2.5"

The string above is in the ACT-America 5-second merge to shift time (seconds) by 2.5 seconds. The value x in brackets is evaluated as a string expression. Don't expose this code in a prod environment. It relies on insecure Python eval.

  • STRUCTURE: references/actamerica/STRUCTURE_B200.json

Path to the JSON representation of the output netCDF structure (the ORNL DAAC copies are located: references/actamerica/STRUCTURE_[AIRCRAFT].json).

  • RESOURCES: Don't worry about these. Input ICARTTs are parsed and their headers and variable metadata will be written to JSON and CSV files in these subdirectories in DIR_OUTPUT. The files are compared against the reference files as the netCDFs are being written and warnings are printed when a match is not found.

2. loop over input ICARTT files and write resource files.

The script loops over the ICARTTs in the directory specified by DIR_ICARTT, parses the header and variable metadata, and writes to JSONs and CSVs.

3. validate input variables against reference variables as they are written to netCDF

The script attempts to match every unique variable name from the input ICARTT files to a reference JSON file (VARIABLE_MAP) and output variable name (OUTPUT_NAME). Any variables that are not represented in the table will not be written to the output netCDF files.

Looping over the ICARTTs, multileg flights are paired and concatenated, and the ICARTT data are translated into netCDF files.

Each variable in the ICARTT is written to the output netCDF with the attributes given in the matching variable reference JSON; e.g. references/actamerica/variables/B200/AircraftSunAzimuth.json:

{
  "datatype": "float64",
  "dimensions": [
    "time"
  ],
  "attributes": {
    "_FillValue": -999999.0,
    "units": "degree",
    "long_name": "aircraft sun azimuth",
    "standard_name": "platform_azimuth_angle",
    "coordinates": "time LATITUDE LONGITUDE GPS_ALT"
  }
}

The output variable is assigned the type specified in datatype field, the dimensions given in the dimensions field, and the attributes given in the attributes field.

Any variables that cannot be matched to one of these JSON files WILL NOT be written to the output netCDF. Any variables that are skipped will be printed to stdout to clue you in. INDEX is deliberately excluded:

 - [ 85 / 85 ] ACTAMERICA-mrg05-c130_merge_20160803_R3.ict 
   WARN: 'INDEX' has no reference. Skip