Releases: LBL-EESA/TECA
TECA 6.0.0
TECA 6.0.0 Release Highlights
This is a major release that contains numerous improvements and fixes. TECA BARD is
fully GPUized. Temporal reductions have been ported to C++ and optimized. The data
and execution models have been extended for batching (processing multiple steps per
request). New spatial parallel and space time parallel execution patterns allow the full space
time extent of high resolution data to processed in memory. The new spatial parallelism is
used in a low, high, and band pass filters as well as temporal percentile calculation.
Numerous I/O optimizations have been introduced including the use of MPI collective
buffering for spatial parallel execution.
Execution Model Improvements
e134264 add spatial executive
c6f9bc6 cf_writer add partitioning contraints
4d53de6 add space_time_executive
97efc35 add cf_space_time_time_step_mappper
98bbb97 adds cf_spatial_time_step_mapper
3d915ee cf_space_time_time_step_mapper add partitioning contraints
a8fa4d8 cf_spatial_time_step_mapper add partitioning contraints
19f5e22 coordinate_util partition add contraints
4d2a8f1 index_reduce execution controls
c765bd5 cf_writer command line parsing of spatial parallel properties
7c0c8a3 spatial_executive constrain partitioning
792b7f9 space_time_executive constrain partitioning
f572c81 metadata_probe report number of intervals
b295896 mesh wrap temporal bounds and extent
daa684d index_request_key update
25bd3b4 index_executive clean up verbose report
a61ec63 test cf_reader temporal extent handling
6e3323d dataset_diff handle temporal extents
d5dad5e test temporal reduction spatial parallelism
019dc83 cf_writer spatial parallelism
f3c14a0 cf_layout_manager spatial parallelism
ba50dd8 cf_time_step_mapper layout manager API
9aa17f1 interval time step mapper refactor
e3a25a8 block time step mapper refactor
1cfcc08 coordinate util spatial partitioning
e341f63 cf_reader reads temporal extents
423a8da data model updates for multiple time steps per mesh
Data Model Improvements
03939e1 add and apply simplified dispatch macros
69e88df hamr update to latest
422f383 hamr fully asynchronous by default
2cd9c8e hamr enforce const for read only data access
95de593 hamr update to latest master
2927a95 HAMR update to latest master
adf5603 variant_array_util add host synchronization helper
6976084 variant_array add synchronization method
c7b1b2d add teca_variant_array_util
29897a4 variant_array better dispatch
1ce73a7 variant_array better dispatch
a70cdfe variant_array make test for accessibility virtual
1422ea3 variant_array provide direct access to internal memory
5911134 variant_array python construct from numpy scalar
c3562a7 cartesian_mesh fall back to mesh extents
03143cb cartesian_mesh_source spatial parallelism
b8615ed cartesian_mesh_regrid per array dimensions
ca4fcbb cartesian_mesh per array extent and shape const
42446f2 cartesian_mesh_source generate data on the assigned GPU
d3082de cartesian_mesh_source include bounds metadata in output mesh
acf3fe2 cartesian_mesh overload array shape to return a tuple
6a9f3ac cartesian_mesh_regrid pass array attributes from the source
e5e8e4a cartesian_mesh array extent time dim and add shape
73b58eb cartesian_mesh fix Python bindings for array shape/extent
86ef561 cartesian_mesh_source fix calendaring metadata in output
New Algorithms
f730aa8 add teca_surface_integral alg
f79c2c8 add teca_regional_moisture_flux
dc66e32 add teca_table_join
f2af4c4 add spectral filter
e439275 add teca_vtk_util::partition_writer to help debug space-time paritioning
0fe459e add temporal_percentile temporal reduction
140008c wrote temporal_index_select and tests
New Applications
acfcaff add regional_moisture_flux app
cfd6ce8 Add the spectral filter app
GPUization
a64839b bayesian_ar_detect add CUDA implementation
cf74102 2d_component_area thrust use stream per thread stream
42d16f7 2d_component_area set cuda device before doing any work
e54e33b component_area_filter set cuda device before doing any work
c3efa90 connected_components set cuda device before doing any work
45a87f1 bayeseian_ar_detect set cuda device before doing any work
3791b67 latitude_damper set cuda device before doing any work
8993ed6 unpack_data set cuda device before doing any work
640ee57 index_executive explicitly assign device ids
79445b3 binary_segmentation use streams for sorting and data movement
2334735 cuda_util add a 1d domain decomposition
9644b34 latitude_damper add CUDA implementation
a243206 component_area_filter add CUDA implementation
5a2f660 2d_component_area use restrict on kernels
ad65931 2d_component_area GPU-ize the area calculation
96c5966 cf_reader don't use page locked memory for cuda
7549e88 cuda_util simplify device assignment
1b14777 connected_components use 8 connetivity
52be362 ha4 test code use 8 connectivity
2f4047f index_executive environment variable override CUDA device assignment
0919c78 connected_components inetgrate CUDA ha4 implementation
7788426 shape_file_mask add CUDA implementation
c44aded cuda_util implement a container for cuda streams
edf6c58 geometry_util GPUize point in poly
693a7b2 thread_util threads per device behavior
ac2f59f cuda warning cleanup
3f2ba7f spatial_executive load balance across GPUs
5c08259 space_time_executive load balance across GPUs
Threading Improvements
6241065 bayesian_ar_detect fix thread safety issues
fa1c209 thread_util warn about too few threads wo MPI
1d5f415 thread_util clamp the number of threads
c970444 thread_util report num threads when not binding
af1592a threaded_algorithm propagate_device_assignment
81d4e2d threaded_algorithm expose ranks_per_device in API
Optimizations
60c9e71 cf_restripe app add collective buffer mode
3dbc0e2 Added C++ version of the temporal reduction algorithm and application
9735209 cf_reader open file in collective mode
5558ff6 spectral_filter app command line options for collective buffering
c0efea8 cf/multi_cf_reader option to use collective buffering
f304f27 cf_writer use collective buffering
Documentation
d5eb0fc cf_reader fix copy paste error in documentation
e5306fa component_area_filter fix indent add comments
30adda5 algorithm fix a documentation typo
bb73083 shape_file_mask improve documentation
d8fcade table_reduce improve documentation
b166667 integrated_water_vapor improve documentation
ef2cd48 integrated_vapor_transport improve documentation
f362380 threaded_algorithm improve documentation
e5a26ff doc doxygen style comments for programmable_algorithm
dc36772 doc doxygen style comments for teca_table
de5e8d6 doc data access developer tutorial
1d25525 interval_iterator subclasses fix units doxygen doc strings
dd5f1fe doc update temporal_reduction user guide
c71e905 cf_writer fix typo in docs
53effc0 doc update m1517 install locations for perlmutter
1b71d8e coordinate_util improve documentation
ff383a0 rtd add section explaining execution model
ae237bd rtd docs fix doxygen install location
c51132b rtd pin sphinx version as latest is incompatible with rtddocs
5ea6e10 rtd doc array access tutorial spell check
af9d2e6 doc rtd improve array access tutorial
b528ec9 rtd fix a rst warning
9a6e888 rtd updates to the install for mac os
1a7dc38 doc rtd exclude variant_array_oeprator from doxygen
Testing
bf97e95 test disable periodic bc in bard app test
238db9f test bayesian ar detect sort by label area
49e83a9 deeplab_ar_detect remove tests
b7d14f1 testing update linux distributions
c38337f testing cleanup use of %e% in tests
d40d800 temporal_reduction: added tests
80a0159 test add test for cpp_temporal_reduciton w. io
3b277b3 test temporal reduction steps_per_request command line argument
9e614ea add test_temporal_reduciton
3b338bf ha4 test code update ctests command
5dd84cb connected_components test ignore component labels
6569a79 ha4 test code improvements
a1012ed ha4 test code handles periodic BC in x-direction
a380f62 ha4 test code works on images not divisible by 32
e6216c3 add ha4 connected component label test code
a769ff7 test_streaming_reduce_threads: specifying netcdf file name to avoid conflict with temporal reduction all iterator test
6e02fa6 test temporal_reduction app python and C++
d79206a testing temporal_reduction tests specify number of threads
709f685 temporal_reduction C++ impl improvements and regression test
5120006 update the DOI badge to point to the latest release
18533f8 Changed teca data revision from 149 to 151
General Improvement
2414209 bayesian_ar_detect_parameters add properties to select specific start row
be087dc bayesian_ar_detect instrument the BARD app
37f4237 bayesian_ar_detect app control writer thread pool size
176c1f6 connected_components cleanup a warning
10eaf19 connected_components minor improvements
ee8cbf2 temporal_reduction: set steps_per_request in python app; included definition in cpp app
27f3ef3 temporal_reduction: standardized n_threads command line
b371bea temporal_reduction construct output at end and others
494a3b4 temporal_reduction: caching the intermediate result
07a119a temporal_reduction: any number of time steps per request is allowed
bd32184 descriptive_statistics remove debuging code
18768fd index_executive fix a compiler warning
ff551dc cpp_temporal_reduction algorithm errors are fatal
95bd6a8 temporal_reduction: set_thread_pool_size [cf_writer] changed from -1 to 1 to fix intermittent bugs
7953cbb temporal_reduction: change the 1 time step per request to a run time specified number of steps
1bab425 dataset_diff ignore specified arrays
03fc0bc table_sort sort either ascending or descending
b29c4fd coordinate_util wrap bounds to extent overload
d0ac7a9...
TECA 5.0.0
Major features
The TECA data model now supports memory management on CPUs as well as CUDA,
OpenMP device offload, HIP capable GPUs and accelerators.
TECA's execution model was extended to support CUDA capable GPUs. This includes
automated load balancing across multi GPU accelerated compute nodes on
supercomputing systems as well as CUDA kernel launching and load balancing
infrastructure
Support for zero-copy interpoerability with Cupy and Numba on CUDA capable
GPU's was added.
GPUized algorithms
teca_binary_segmentation
teca_l2_norm
teca_valid_value_mask
teca_unpack_data
teca_integrated_vapor_transport
teca_temporal_reduction
teca_lapse_rate
teca_cf_reader
teca_cf_writer
New algorithms and apps
teca_lapse_rate
teca_tc_potential_intensity
teca_time_axis_convolution
teca_shapefile_mask
teca_tempest_remap
teca_cartesian_mesh_coordinate_transform
teca_array_collection_reader
teca_array_collection_writer
Improvements
Make the teca_array_collection a data set
Add user defined intervals and operators to the teca_temporal_reduction
teca_temporal_reduction handle integer data in the avergaing reduction
teca_temporal_reduction use the valid value mask
add a summation reduction to the teca_temporal_reduction
improved threading support on MacOS
users can provide call backs at runtime for custom error handling
Documentation
Numerous improvements to the user guide and Doxygen documentation including
documentation of new applications and install on GPU enabled systems
Updated examples illustrating how to use Cupy in Python applications
New Perlmutter specific examples were added to TECA_Examples
TECA 4.1.0
4.1.0 is a feature release with a number of new and exciting features and a number of critical bug fixes.
- new mask below surface algorithm that creates point wise binary (0,1) mask identifying mesh points that are below land surface based on externally provided DEM.
- integrated the mask below surface stage into the BARD, IWV, and IVT apps
- new unpack NetCDF packed data stage
- add coordinate normalization stage transform for longitude from -180 to 180
to 0 to 360 - new IWV algorithm
- new IWV command line application
- new time based file layouts (daily, monthly, yearly, seasonal)
- BARD app can now generate output fields weighted by AR probabilities
- new rename variables stage
- improvements to cartesian_mesh_source for remeshing
- cf_reader correctly detects centering and per field dimensionality
- multi_cf_reader MCF file format improvements. Add support for reader
properties, globablly and per reader. - cf_reader option to produce 2D field when the 3'rd dimension is length 1
- Cartesian meshes can now contain both 2D and 3D arrays, metadata annotations
are used to differentiate at run time - metadata probe improvements to report per-field centering
- new remeshing capability deployed in cf_restripe and apps that utilize
elevation mask - improvements to the user guide
- refactored source code documentation to be compatible with Doxygen,
- published Doxygen on the rtd site : https://teca.readthedocs.io/en/integrating_breathe/doxygen/index.html
- new capabilities in the cf_restripe command line application for remeshing
- 25+ bug fixes
TECA 4.0.0
Documentation
- A major overhaul of the command line application section of the user guide including the addition of examples.
- Publish batch scripts illustrating running TECA at scale in the new TECA_examples repo.
- Giving tutorials and publishing the materials in the new TECA_tutorials repo
- Updates to the installation section of the TECA User's Guide](https://teca.readthedocs.io/en/latest/installation.html)
Data Model Improvements
- Added support for Arakawa C Grids in
teca_arakawa_c_grid
- Added support for logically Cartesian so called curvilinear grids in
teca_curvilinear_mesh
- Refactored mesh related class hierarchy so that common codes such as array accessing and I/O live in
teca_mesh
- Added support for face and edge centered mesh based data.
I/O Capabilities
- Added reader for WRF simulation
teca_wrf_reader
- Add support for writing logically Cartesian curvilinear meshes in
teca_cartesian_mesh_writer
. - Added a new NetCDF based output format for tabular data to the
teca_table_writer
. - Added support for reading tabular CSV files to the
teca_table_reader
. This enables the tabular outputs such as TC tracks etc saved from TECA apps to be stored in a format ingestible by other tools such as Python and Excel without the need to convert from TECA's internal binary format. - Added versioning and error checking to TECA's internal binary serialization format across all datasets. This enables us to catch version differences and handle bad or corrupted files gracefully.
- use of NetCDF parallel 4 (i.e. MPI collective I/O) for writing results. this enables the use of any number of files with any number of ranks.
Execution Patterns
- Implement a new streaming mode reduction where data is incrementally reduced as it becomes available. This parallelizes the reduction step and reduces the memory overhead.
- Introducing a new MPI parallel approach to scan the time axis. This has substantial benefit when there are a large number of files.
- expose MPI aware thread load balancing to Python. This was used in the
teca_pytorch_algorithm
to automatically load balance the OpenMP backend of PyTorch. - implement GPU load balancing strategy in the
teca_pytorch_algorithm
. - Enable process groups to be excluded from execution. This lets a pipeline run on a subset of MPI_COMM_WORLD.
Algorithmic Capabilities
- Added
teca_pytorch_algorithm
a base class that handle tasks common to interfacing to PyTroch when developing Machine Learning based detectors. - Added
teca_deeplab_ar_detect
a new PyTorch based Machine Learning based AR detector. - Added
teca_valid_value_mask
an algorithm that generates a mask identifying the presence of NetCDF _FillValue values in arrays. Down stream algorithms use the mask to handle _FillValue's in an algorithm appropriate manner. - Added
teca_temporal_reduction
an algorithm that implements transformations from one time resolution to another. The implementation includes min, max, and average operators and supports daily, monthly, and seasonal intervals. - Added
teca_vertical_reduction
an algorithm that converts 3D data to 2D by applying a reduction in the vertical spatial dimension. This is a base class that contains code common to vertical reductions. - Added
teca_integrated_vapor_transport
a vertical reduction that computes IVT from horizontal wind vector and specific humidity. - An improved floating point differencing algorithm was developed and a number of codes were updated to use it.
Command Line Applications
- Added
teca_integrated_vapor_transport
command line application for computing IVT. - Added
teca_restripe
command line application for re-organizing NetCDF datasets. - Added
teca_deeplab_ar_detector
command line application detecting AR's using machine learning. - Integrated IVT calculations into the
teca_nayesian_ar_detector
. - Normalized names and meaning of command line options across command line applications
Python Capabilities
- A polymorphic redesigned the
teca_python_algorithm
makes it easier to use. - Handle numpy scalar types
- Expose more features such as MPI aware thread load balancing, calendaring, profiling, and file manipulation utilities.
Testing
- Added testing infrastructure and tests for command line applications
- Deployed testing on Ubuntu 18.04, Fedora 31, Fedora 32, and Mac OS with xcode 12.2.
Bug fixes
More than 50 bug fixes were reported.
TECA 3.0.0
This is a major release in support of:
T.A. O'Brien et al, "Detection of Atmospheric Rivers with Inline
Uncertainty Quantification: TECA-BARD v1.0", Geoscientific Model
Development, submitted winter 2020
The pipeline internals were refactored to be more general, the assumption that
time was the dimension across which the reduction is applied was removed, as
well as changes that enable nested map-reduce.
The TECA User Guide was ported to "Read the Docs". https://teca.readthedocs.io
Our Travis CI test infrastructure was updated to use Docker, and two new OS
images Fedora 28, and Ubuntu 18.04 were deployed.
More than 40 bug fixes
New algorithms included in this release:
Type | Name | Description |
---|---|---|
general puprose | teca_2d_component_area | Computes the area's of regions identified by the connected components filter. |
general puprose | teca_bayesian_ar_detect | Detects atmospheric rivers using a Bayesian method. |
general puprose | teca_bayesian_ar_detect_parameters | Parameters used by Bayesian AR detector. |
general puprose | teca_cartesian_mesh_source | Used to create Cratesian meshes in memory and inject them into a pipeline. |
general puprose | teca_component_area_filter | Masks regions with area out side a user specified range |
general puprose | teca_component_statistics | Gathers information about connected component regions into a tabular format |
general puprose | teca_latitude_damper | Multiplies a field by an inverted Gaussian (user specified mean and HWHM) |
general puprose | teca_normalize_coordinates | Transforms Cartesian meshes such that coordinates are always in ascending order |
general puprose | teca_python_algorithm | Base class for TECA algorithm's written in Python. Handles internal plumbing |
core infrastructure | teca_memory_profiler | Supporting class that samples memory consumtion during application execution |
core infrastructure | teca_profiler | Supporting class that logs start, stop, and duration of developer defined events |
I/O | teca_cartesian_mesh_reader | Reads TECA Cartesian meshes in TECA's internal binary format |
I/O | teca_cartesian_mesh_writer | Writes TECA Cartesian meshes in TECA's internal binary format |
I/O | teca_cf_writer | Writes TECA Cratesian meshes in NetCDF CF2 conventions |
New applications included in this release:
Name | Description |
---|---|
teca_bayesian_ar_detect | Command line application that can be used to detect AR's on HPC systems |
teca_profile_explorer | Interactive tool for exploring run time profiling data |