Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: h5amep format improvements - no zero-padding in h5amep file #41

Open
8 tasks
kay-ro opened this issue Jul 3, 2024 · 0 comments
Open
8 tasks

BUG: h5amep format improvements - no zero-padding in h5amep file #41

kay-ro opened this issue Jul 3, 2024 · 0 comments
Labels
bug Something isn't working module: base module: load module: reader release: major Issues that need or may be better addressed in a major release status: to do Issues that someone needs to work on

Comments

@kay-ro
Copy link
Member

kay-ro commented Jul 3, 2024

Description:

At the moment, the LAMMPS reader reads dump files and and stores the data in the h5amep file. Data such as coordinates or velocities are stored as 3d data. Data that is missing (such as z in 2d simulations) is replaced by 0s. But this may not always be correct. If the simulation data is missing a component accidentally, this would lead to incorrect replacement of missing data.

This also applies to the current version of the AMEP HDF5 data format. It combines all vector quantities to a 3d dataset in the h5amep file, e.g., coordinates are stored as a (N,3) dataset named 'coords'. If for a 2d system for example, only x and y are given, the z component will automatically be set to zero. Additionally, the h5amep file will have datasets for all standard vector quantities initialized with zeros per default. Thus, even if for exameple forces are not given in the dump files, a dataset called "forces" exists that only contains zeros. It would be better, if it would not exist. Additionally, if the user wants to access this data, an error should be raised saying that the requested data is not availabe.

In conclusion, we should not initialize the HDF5 file with arrays of zeros. Instead, it would be better to store each quantity (i.e., each column of a dump file) in a seperate dataset (as already done for the scalar quantities and any user-defined quantities). Thus, instead of 'coords', the HDF5 file will have 3 datasets called 'x', 'y', and 'z' (if all of them are given in the dump file). If for example z is not given, there will be no dataset called 'z'.

If the user wants to access the data, e.g., if one would like to get the coordinate array in the shape (N,3) and for example z does not exist, AMEP should fill the last column of the array with zeros and print a warning (same for other vector quantities such as velocities, forces, ...).

Backwards compatibility can be ensured by modifying the __read_data method of the BaseFrame class (we need an additional if condition such that we will have two, one for the current format and one for the new format).

Code for reproduction:

traj = amep.load.traj("2d_data", mode="lammps")
coords = traj[-1].coords()
print(coords[:,2])

Error message:

Output should not be 0s. At least a warning is expected.

Python and AMEP versions:

any python version, AMEP 1.0.1

Additional information:

ToDo:

  • adapt all particle-based reader classes to generate HDF5 files with the new format
    • check which data is stored incorrectly (coordinates, velocities, ...)
    • store only imported data
    • implement warning/... if missing data is replaced with 0s on import
  • adapt BaseFrame.__read_data and BaseFrame.data to handle both formats for backwards compatibility
  • test the new format (i.e., generate a new h5amep file and check with an HDF5 viewer if the format is correct)
  • test all AMEP methods that allow to access the data (i.e., all methods of BaseFrame)
  • test implementation for backwards-compatibility

How did you install AMEP?

None

@kay-ro kay-ro added bug Something isn't working release: major Issues that need or may be better addressed in a major release status: to do Issues that someone needs to work on module: base module: load module: reader labels Jul 3, 2024
@kay-ro kay-ro changed the title BUG: loading data, zero-padding missing data BUG: h5amep format improvements - no zero-padding in h5amep file Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: base module: load module: reader release: major Issues that need or may be better addressed in a major release status: to do Issues that someone needs to work on
Projects
None yet
Development

No branches or pull requests

1 participant