Skip to content

TECA 6.0.0

Latest
Compare
Choose a tag to compare
@burlen burlen released this 15 Nov 06:10
· 13 commits to develop since this release
9e5abae

TECA 6.0.0 Release Highlights

This is a major release that contains numerous improvements and fixes. TECA BARD is
fully GPUized. Temporal reductions have been ported to C++ and optimized. The data
and execution models have been extended for batching (processing multiple steps per
request). New spatial parallel and space time parallel execution patterns allow the full space
time extent of high resolution data to processed in memory. The new spatial parallelism is
used in a low, high, and band pass filters as well as temporal percentile calculation.
Numerous I/O optimizations have been introduced including the use of MPI collective
buffering for spatial parallel execution.

Execution Model Improvements

e134264 add spatial executive
c6f9bc6 cf_writer add partitioning contraints
4d53de6 add space_time_executive
97efc35 add cf_space_time_time_step_mappper
98bbb97 adds cf_spatial_time_step_mapper
3d915ee cf_space_time_time_step_mapper add partitioning contraints
a8fa4d8 cf_spatial_time_step_mapper add partitioning contraints
19f5e22 coordinate_util partition add contraints
4d2a8f1 index_reduce execution controls
c765bd5 cf_writer command line parsing of spatial parallel properties
7c0c8a3 spatial_executive constrain partitioning
792b7f9 space_time_executive constrain partitioning
f572c81 metadata_probe report number of intervals
b295896 mesh wrap temporal bounds and extent
daa684d index_request_key update
25bd3b4 index_executive clean up verbose report
a61ec63 test cf_reader temporal extent handling
6e3323d dataset_diff handle temporal extents
d5dad5e test temporal reduction spatial parallelism
019dc83 cf_writer spatial parallelism
f3c14a0 cf_layout_manager spatial parallelism
ba50dd8 cf_time_step_mapper layout manager API
9aa17f1 interval time step mapper refactor
e3a25a8 block time step mapper refactor
1cfcc08 coordinate util spatial partitioning
e341f63 cf_reader reads temporal extents
423a8da data model updates for multiple time steps per mesh

Data Model Improvements

03939e1 add and apply simplified dispatch macros
69e88df hamr update to latest
422f383 hamr fully asynchronous by default
2cd9c8e hamr enforce const for read only data access
95de593 hamr update to latest master
2927a95 HAMR update to latest master
adf5603 variant_array_util add host synchronization helper
6976084 variant_array add synchronization method
c7b1b2d add teca_variant_array_util
29897a4 variant_array better dispatch
1ce73a7 variant_array better dispatch
a70cdfe variant_array make test for accessibility virtual
1422ea3 variant_array provide direct access to internal memory
5911134 variant_array python construct from numpy scalar
c3562a7 cartesian_mesh fall back to mesh extents
03143cb cartesian_mesh_source spatial parallelism
b8615ed cartesian_mesh_regrid per array dimensions
ca4fcbb cartesian_mesh per array extent and shape const
42446f2 cartesian_mesh_source generate data on the assigned GPU
d3082de cartesian_mesh_source include bounds metadata in output mesh
acf3fe2 cartesian_mesh overload array shape to return a tuple
6a9f3ac cartesian_mesh_regrid pass array attributes from the source
e5e8e4a cartesian_mesh array extent time dim and add shape
73b58eb cartesian_mesh fix Python bindings for array shape/extent
86ef561 cartesian_mesh_source fix calendaring metadata in output

New Algorithms

f730aa8 add teca_surface_integral alg
f79c2c8 add teca_regional_moisture_flux
dc66e32 add teca_table_join
f2af4c4 add spectral filter
e439275 add teca_vtk_util::partition_writer to help debug space-time paritioning
0fe459e add temporal_percentile temporal reduction
140008c wrote temporal_index_select and tests

New Applications

acfcaff add regional_moisture_flux app
cfd6ce8 Add the spectral filter app

GPUization

a64839b bayesian_ar_detect add CUDA implementation
cf74102 2d_component_area thrust use stream per thread stream
42d16f7 2d_component_area set cuda device before doing any work
e54e33b component_area_filter set cuda device before doing any work
c3efa90 connected_components set cuda device before doing any work
45a87f1 bayeseian_ar_detect set cuda device before doing any work
3791b67 latitude_damper set cuda device before doing any work
8993ed6 unpack_data set cuda device before doing any work
640ee57 index_executive explicitly assign device ids
79445b3 binary_segmentation use streams for sorting and data movement
2334735 cuda_util add a 1d domain decomposition
9644b34 latitude_damper add CUDA implementation
a243206 component_area_filter add CUDA implementation
5a2f660 2d_component_area use restrict on kernels
ad65931 2d_component_area GPU-ize the area calculation
96c5966 cf_reader don't use page locked memory for cuda
7549e88 cuda_util simplify device assignment
1b14777 connected_components use 8 connetivity
52be362 ha4 test code use 8 connectivity
2f4047f index_executive environment variable override CUDA device assignment
0919c78 connected_components inetgrate CUDA ha4 implementation
7788426 shape_file_mask add CUDA implementation
c44aded cuda_util implement a container for cuda streams
edf6c58 geometry_util GPUize point in poly
693a7b2 thread_util threads per device behavior
ac2f59f cuda warning cleanup
3f2ba7f spatial_executive load balance across GPUs
5c08259 space_time_executive load balance across GPUs

Threading Improvements

6241065 bayesian_ar_detect fix thread safety issues
fa1c209 thread_util warn about too few threads wo MPI
1d5f415 thread_util clamp the number of threads
c970444 thread_util report num threads when not binding
af1592a threaded_algorithm propagate_device_assignment
81d4e2d threaded_algorithm expose ranks_per_device in API

Optimizations

60c9e71 cf_restripe app add collective buffer mode
3dbc0e2 Added C++ version of the temporal reduction algorithm and application
9735209 cf_reader open file in collective mode
5558ff6 spectral_filter app command line options for collective buffering
c0efea8 cf/multi_cf_reader option to use collective buffering
f304f27 cf_writer use collective buffering

Documentation

d5eb0fc cf_reader fix copy paste error in documentation
e5306fa component_area_filter fix indent add comments
30adda5 algorithm fix a documentation typo
bb73083 shape_file_mask improve documentation
d8fcade table_reduce improve documentation
b166667 integrated_water_vapor improve documentation
ef2cd48 integrated_vapor_transport improve documentation
f362380 threaded_algorithm improve documentation
e5a26ff doc doxygen style comments for programmable_algorithm
dc36772 doc doxygen style comments for teca_table
de5e8d6 doc data access developer tutorial
1d25525 interval_iterator subclasses fix units doxygen doc strings
dd5f1fe doc update temporal_reduction user guide
c71e905 cf_writer fix typo in docs
53effc0 doc update m1517 install locations for perlmutter
1b71d8e coordinate_util improve documentation
ff383a0 rtd add section explaining execution model
ae237bd rtd docs fix doxygen install location
c51132b rtd pin sphinx version as latest is incompatible with rtddocs
5ea6e10 rtd doc array access tutorial spell check
af9d2e6 doc rtd improve array access tutorial
b528ec9 rtd fix a rst warning
9a6e888 rtd updates to the install for mac os
1a7dc38 doc rtd exclude variant_array_oeprator from doxygen

Testing

bf97e95 test disable periodic bc in bard app test
238db9f test bayesian ar detect sort by label area
49e83a9 deeplab_ar_detect remove tests
b7d14f1 testing update linux distributions
c38337f testing cleanup use of %e% in tests
d40d800 temporal_reduction: added tests
80a0159 test add test for cpp_temporal_reduciton w. io
3b277b3 test temporal reduction steps_per_request command line argument
9e614ea add test_temporal_reduciton
3b338bf ha4 test code update ctests command
5dd84cb connected_components test ignore component labels
6569a79 ha4 test code improvements
a1012ed ha4 test code handles periodic BC in x-direction
a380f62 ha4 test code works on images not divisible by 32
e6216c3 add ha4 connected component label test code
a769ff7 test_streaming_reduce_threads: specifying netcdf file name to avoid conflict with temporal reduction all iterator test
6e02fa6 test temporal_reduction app python and C++
d79206a testing temporal_reduction tests specify number of threads
709f685 temporal_reduction C++ impl improvements and regression test
5120006 update the DOI badge to point to the latest release
18533f8 Changed teca data revision from 149 to 151

General Improvement

2414209 bayesian_ar_detect_parameters add properties to select specific start row
be087dc bayesian_ar_detect instrument the BARD app
37f4237 bayesian_ar_detect app control writer thread pool size
176c1f6 connected_components cleanup a warning
10eaf19 connected_components minor improvements
ee8cbf2 temporal_reduction: set steps_per_request in python app; included definition in cpp app
27f3ef3 temporal_reduction: standardized n_threads command line
b371bea temporal_reduction construct output at end and others
494a3b4 temporal_reduction: caching the intermediate result
07a119a temporal_reduction: any number of time steps per request is allowed
bd32184 descriptive_statistics remove debuging code
18768fd index_executive fix a compiler warning
ff551dc cpp_temporal_reduction algorithm errors are fatal
95bd6a8 temporal_reduction: set_thread_pool_size [cf_writer] changed from -1 to 1 to fix intermittent bugs
7953cbb temporal_reduction: change the 1 time step per request to a run time specified number of steps
1bab425 dataset_diff ignore specified arrays
03fc0bc table_sort sort either ascending or descending
b29c4fd coordinate_util wrap bounds to extent overload
d0ac7a9 integrated_vapor_transport handle ascending coordinates in the first order method
b593e57 integrated_vapor_transport app enable automatic z-axis unit conversion
78675ec integrated_vapor_transport warn if vertical axis units are incorrect
63087d2 normalize_coordinates check z-axis units
df9378e integrated_vapor_transport layer thickness
eb4853a evaluate_expression netcdf attributes for the cf_writer
4cccc26 table include dataset property for array attributes
98e0a89 table_join pass array attributes for NetCDF I/O
3b81582 integrated_water_vapor reformat units string
f6eabe0 algorithm add a single value setter for vector properties
657ba21 index_reduce use std::vector instead of std::array
abac3f2 indexed_dataset_cache override request index
eb86345 integrated_vapor_transport change format of units
b0a4390 dataset_source report variables from tables and meshes
31b748a dataset_source move to alg to access typed datasets
290db6b coordinate_util improvements
20c5ad6 table_reduce report and request use default implementations
b878f89 program_options support std::array in algorithm properties
06b52b2 shape_file_mask improvements
9a506d7 dataset typed accessors
40ea89b derived quantity improvements
ebb1286 array_attributes include mesh_dim_active
254f9e7 temporal_reduction app/alg cpp/python catch user errors
a1eb0f0 cf_writer improve error message
589f70c cf_writer improve collective buffering error message
d01da07 cf_restripe app runs in CPU only mode by default
19334ed cf_reader improve collective buffering error
c4de242 spectral_filter per-rank timing output in verbose mode
51a2670 spectral_filter add ideal butterworth frequency response
dadb491 spectral_filter fixes issue found when processing real data
2a8816f spectral_filter refactor regression tests
976482d spectral filter fix high pass kernel generation
57e3f31 teca_temporal_reduction: added all iterator average test
9924d67 teca_temporal_reduction: added all iterator
fbb866e teca_calendar_util: added new class all_iterator
c6704cd temporal_reduction: added flag to spatial parallelism
4b0d251 teca_calendar_util: added the new class n_steps_iterator
ec98d67 added index selection to the temporal reduction
6f1ae9d metadata add support for std::array
6e99362 vorticity better identitiers in dispatch macro
4407b53 cuda_util remove redundant error check
d11f480 valid_value_mask export mask type
296ec4a temporal_reduction app command line option controlling threadpool size
48e3213 temporal_reduction: rename the C++ implementation
bd6718a temporal_reduction: handle the case where the number of inputs < 2.
19ead29 temporal_reduction: renamed the original python implementation
fbd2235 temporal_reduction: resolving a warning
9ba6d35 temporal_reduction clean up warnings with nvcc
3bcfcf4 tenporal_reduction app integrate multi threading
ec71e98 Renamed python version of temporal reduction; python bindings
cd28e3e teca_threaded_programmable_algorithm: increased the size of the class_name variable from 64 to 96
ccfdba3 potential_intensity user provided masking threshold
e7c53c0 potential_intensity units checks and conversions
c674b79 potential_intensity app reduce verbosity
8f6a1ed teca_potential_intensity clean up runtime warnings
df50b49 python functions returing typed scalars
1181451 potential_intensity app use spatial partitioning
e18f4d4 potential_intensity app land mask from mcf file
839ef1c app_util error out with positional options

Bug fixes

6cb6ccc component_area_filter fix indentation
ee9a4d8 connected_components fix 8-way connectivity accross periodic boundary
e85aa72 system_interface fix double free in stack trace generation
d19e727 testing fix the component are filter test
10e9459 temporal_reduction: fix data access
d6b22e6 teca_profiler: fixed convertion of hexstring to int
220587a cpu_thread_pool fix bind argument position
bf99eff cpu/cuda_thread_pool fix streaming bug
3d5d4db cf_writer fix let threaded_algorithm process command line
80820ef threaded_algorithm fix set algorithm props from command line
59c42b5 threaded_algorithm fix threads_per_device parameter name
bada5a6 cpp_temporal_reduciton fix thread safety issues
7de9224 cpp_temporal_reduction fix a typo in documentation
59eb4c7 ha4 test code fix race condition
2eaa71b connected_components fix race condition
ddaf758 connected_components fix compile w/o cuda
4c8032c connected_components 8-way connectivity bug fixes
55b0908 ha4 test code 8-way connectivity bug fixes
18e0c92 rename_variables fix set variables in the output attributes
e739682 fixes for cuda 12 and warning cleanup for gcc 12
9d13cd4 temporal_reduction fix missing virtual destructor in base class
76ab59d array_collection fix double move
f462ac8 normalize_coordinates fix a bug in the output extents
e8dcfca tests fix regex that picks up new file
e3dc08f cpp_temporal_reduction cleanup, fixes, and improvements
00ba242 temporal_reduction: included flag to choose python or c++ implementation; fixing the n_steps interval
bc43a36 temporal_reduction: rename the python implementation; fixing name of two python tests
244f58e temporal_reduction: fixing the parameter order in a test
79b3673 temporal_reduction: added a new finalize function to fix a bug
942aa11 temporal_reduction fix a warning and set strream size
7120ecb cpu/cuda_thread_pool fix thread safety issues
d251940 threaded_algorithm fix indentation
26fba6d potential_intensity units checks and conversion fix
45dbd2b Fixed n_steps_iterator class of python version of temporal_reduction
86e6ea7 calendaring fix buffer overflow warnings
5ffc2d4 Fixing issue
98f04ee temporal_percentile fixes

Python

42ca1d8 python support wrapping API with fixed length C-arrays
61e9f34 remove numpy deprecated types

Build System

a76c7cf build cleanup cmake code
8f96503 added CMAKE_INSTALL_RPATH to CMakeLists.txt
3e43838 build define NDEBUG in CUDA release build
08b95f0 build always update the version descriptor
944a3f2 build system don't relink unless neccessary