-
Notifications
You must be signed in to change notification settings - Fork 22
Data Blending and Interpolation
The workflow allows for multiple datasets to be combined and then interpolated over a grid, which can then be plotted and also output as GIS files or GMT compatible data files. The entry point for the workflow is moho_workflow, which takes a JSON config file as the only argument.
The workflow is controlled by a JSON config file. An example config is available, shown below:
{
"methods":
[
{
"name": "ccp_1",
"data": "sandbox/ccp_conversion/ccp_sample1.csv",
"val_name": "Depth",
"sw_name": "Weight",
"weight": 1.0,
"scale_length": 0.2
},
{
"name": "ccp_2",
"data": "sandbox/ccp_conversion/ccp_sample2.csv",
"val_name": "Depth",
"sw_name": "Weight",
"weight": 1.0,
"scale_length": 0.5
}
],
"plotting":
{
"output_plot": true,
"plot_parameters":
{
"scale": [10.0, 60.0],
"format": "png",
"show": true,
"title": "Moho depth from blended data",
"cb_label": "Moho depth (km)"
},
"output_gmt": true,
"output_gis": true
},
"bounds": [130.0, -20.0, 140.0, -17.0],
"grid_interval": 0.25,
"output_directory": "./output"
}
-
methods
: A list of inversion/stacking/etc. methods, defining the data and parameters for blending and gridding. Each method is stored as a dictionary.-
name
: The name of the method, which is required and must be unique. -
data
: The data file. See the section below on 'Data Format' for more information. -
val_name
: Name of the column containing the value to be blended and plotted. -
sw_name
: Name of the column to be used as per-sample weighting. This is optional, and if not provided, all sample weights default to 1. -
weight
: The overall weight of the method. This is multiplied by each sample weight to produce a total relative weight for each sample. See 1 for more information about weighting. -
scale_length
: The scale length variable for the spatial spread algorithm, in decimal degrees. See 1 for more information about the spatial spread algorithm.
-
-
plotting
: A dictionary containing plotting parameters. This is optional and if not included no plots or map data will be produced.-
output_plot
: A boolean flag. If true, then a plot of the grid and vector map will be produced. This figure will be saved asplot.{format}
in the output directory.-
plot_parameters
: Parameters used for generating the plot. All of these parameters are optional. -
scale
: The minimum and maximum values to define the scale of the colormap, used when plotting the grid. If not provided, the values will be dervied from the minimum and maximum of the grid data. -
format
: Output format of the plot, must be compatible with matplotlib. By default, 'png' is used. -
show
: A boolean flag. Whether or not to display the plot as part of the workflow. If true, the plot will be displayed after generation, useful for verification and debugging purposes. -
title
: The title of the plot. Defaults to 'Moho depth from blended data'. -
cb_label
: Label for the grid colorbar. Defaults to 'Moho depth (km)'.
-
-
output_gmt
: A boolean flag. Whether or not to produce GMT compatible data files. See section below on GMT mapping for more information. -
output_gis
: A boolean flag. Whether or not to produce GIS data. See section below on GIS mapping for more information. -
bounds
: A bounding box of format [min lon, min lat, max lon, max lat]. This defines the extent of the grid to be interpolated and the extent of plots and maps produced. If not provided, the extent is derived from the minimum and maximum bounds of the aggregate datasets. -
grid_interval
: Required. The grid interval in decimal degrees. E.g. 0.25 will interpolate the data to a grid with cell size 0.25 x 0.25 degrees. -
output_directory
: Directory to contain the output. If not provided, the current working directory is used.
-
python moho_workflow.py config_moho_workflow_example.json
will run the workflow with the
configured parameters.
The method data files must be CSV and in a particular format. See example:
# Comments can be placed anywhere in the file, so long as they are
# preceeded by '#' and don't occur in the first two rows after the
# 'START' flag (these rows are reserved for the optional timestamp, and
# the data header)
# We can also place miscellaneous values before the 'START' flag
# survey_location, lat, lon
GA,-35.343,149.158
# For 'TIME' below, we can use Unix epoch (e.g. 1600131276)
# or ISO (e.g. 2020-09-15T00:54:36)
# START
# TIME 1600131276
# Net,Sta,Lon,Lat,Depth,Weight,Additional_Column,Another_Column
OA,BS24,133.036,-19.4734,45.5,1.0,this_is_a_comment,another_comment
# Comment rows can be added within the data
# These rows can be used to denote the start of profile lines,
# and other changes in the data.
OA,BW20,134.909,-19.572,47.9,1.0,this_is_a_comment,another_comment
The format is intended to be flexible and able to contain comments and additional values useful to human readers, while also containing the structured data required for point blending.
Additional comments can be placed anywhere in the file, so long as they are not placed
within the first two rows after the START
flag (reserved for optional timestamp and data header).
Data values that are not part of the structured samples can be added, so long as they are placed
in the rows before the START
flag.
This is a flag used to separate unstructured user data from the structured sample data. Data following this flag will be blended and plotted.
The following row can be an optional timestamp, formatted as # TIME <timestamp>
. The time itself
can given as a Unix timestamp (e.g. 1600131276
) or in ISO format (e.g. 2020-09-15T00:54:46
).
This timestamp, if provided, can be used by quality selection algorithms (TODO: add link to section once complete).
The data header can be placed after the START
flag, or after the timestamp if provided.
The data header contains the column names. These columns do not need to be in any specific order. As many columns as you want can be provided. Only some columns are utilised by the data blending workflow:
-
Lon
: Longitude of the data sample. If a valid network/station code pair and inventory file aren't provided, then it's nessecary to provide sample locations as lat/lon. Other acceptable forms:lon
,LON
,longitude
,Longitude
,LONGITUDE
-
Lat
: Latitude of the data sample. If a valid network/station code pair and inventory file aren't provided, then it's nessecary to provide sample locations as lat/lon. Other acceptable forms:lat
,LAT
,latitude
,Latitude
,LATITUDE
-
Net
: Network code for the data sample. If provided, is used as a label when outputting map data. Can also be used to derive location from an inventory file (see 'Deriving location' below). Other acceptable forms:net
,NET
,network
,Network
,NETWORK
-
Sta
: Station code for the data sample. If provided, is used as a label when outputting map data. Can also be used to derive the location from an inventory file (see 'Deriving location' below). Other acceptable forms:sta
,STA
,station
,Station
,STATION
In addition, a column must be provided as the value to be blended. In this example, the Depth
column would be provided as the val_name
parameter in the config.
An optional sample weight column can be provided as the sw_name
parameter in the config.
The point blending workflow requires the location of each sample point.
You can provide per-sample coordinates as latitude and longitude. Alternatively, you can provide a network & station code for each sample and an inventory file.
To provide the inventory file, you must add it as the inventory_file
parameter in the config:
...
"methods":
[
{
"name": "ccp_1",
"data": "sandbox/ccp_conversion/ccp_sample1.csv",
"val_name": "Depth",
"sw_name": "Weight",
"weight": 1.0,
"scale_length": 0.2,
"inventory_file": "ccp_inventory.xml"
}
...
The network and station codes will be joined as 'NET.STA' and be used to lookup station location in the inventory, so it's important that these are spelt and formatted as they appear in the inventory file.
Excel can be used to generate this file by saving your spreadsheet as CSV. For example, the below spreadsheet:
Results in the following CSV file:
# Comment,,,,
# Misc. values,,,,
# A,B,C,,
1,2,3,,
,,,,
# START,,,,
# TIME 1600131276,,,,
# Net,Sta,Lon,Lat,Depth
OA,BS24,133.036,-19.463,45.5
OA,BW20,134.909,-19.572,47.9
(the trailing commas are due to Excel padding the file according to the empty cells of the spreadsheet - these are ignored by the workflow when read).
If enabled, GIS data will be produced and stored in the output directory under gis_data
.
A singleband geotiff of the grid is produced. A multiband geotiff of the vector map is produced,
with the bands containing U and V components respectively. This can be rendered as a vector field
using ArcGIS' vector field renderer
or equivalent function in your GIS software. Each method used will also produce a shapefile containg
the locations of samples used, along with other information such as station name and weight.
If enabled, GMT compatible data files will be produced and stored in the output directory under
gmt_data
. A grid.txt
file of format LON LAT DEPTH
is produced, which can be converted to a
NetCDF grid using gmt xyz2grd
and plotted using gmt grdimage
. A gradient.txt
file of format
LON LAT ANGLE MAGNITUDE
can be used to plot the vector field using gmt psxy -Sv
. For each
method used, a text file of format LON LAT TOTAL_WEIGHT
is produced, which can be plotted using
gmt psxy
with the weight used as symbol size or other differentiator.
This workflow is developed around Moho depth, but any data can be gridded, so long as the data
files follow the format of:
# Sta,Lon,Lat,Value,Weight
The workflow supports applying corrections and other preprocessing to data used. Currently CCP correction by H-K values is supported. Config example:
{
"data_preperation":
[
{
"data": "/home/bren/data_passive/MOHO/hk_ccp_correction/ccp_weights.csv",
"correction_data": "/home/bren/data_passive/MOHO/hk_ccp_correction/hk_weights.csv",
"correction_func": "ccp_correction",
"output_file": "/home/bren/data_passive/MOHO/hk_ccp_correction/ccp_corrected.csv"
}
],
Adding this block will correct the CCP readings by H-k readings by finding applying the difference between the H-k median value and CCP median value to the CCP readings for each station.
The corrected data (specified by output_file
) can then be blended and plotted by providing it
as the data for a method.
The moho_config
module contains constants for the configuration keys and config validation. If the config schema
is modified, it's recommended to provide the new key a constant and add it to the relevant
SUPPORTED_KEYS
list so it can pass validation.
If further correction or preprocessing functions are added, a mapping of the config key to the
correction function must be added to CORR_FUNC_MAP
.