Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output to a standardized data format #83

Open
talmo opened this issue Nov 25, 2024 · 0 comments
Open

Output to a standardized data format #83

talmo opened this issue Nov 25, 2024 · 0 comments

Comments

@talmo
Copy link
Contributor

talmo commented Nov 25, 2024

Currently, we are outputting the results of stac-mjx to Pickle files by default:

stac-mjx/stac_mjx/io.py

Lines 192 to 211 in f3980e4

def save(fit_data, save_path: Text):
"""Save data.
Save data as .p or .h5 file.
Args:
fit_data (numpy array): Data to write out.
save_path (Text): Path to save data. Defaults to None.
"""
if os.path.dirname(save_path) != "":
os.makedirs(os.path.dirname(save_path), exist_ok=True)
_, file_extension = os.path.splitext(save_path)
if file_extension == ".p":
with open(save_path, "wb") as output_file:
pickle.dump(fit_data, output_file, protocol=2)
elif file_extension == ".h5":
ioh5.save(save_path, fit_data)
else:
with open(save_path + ".p", "wb") as output_file:
pickle.dump(fit_data, output_file, protocol=2)

It supports saving out to HDF5, but using a very general purpose recursive method that sacrifices the documentation of the file format for ease of use on the implementation.

What we would like is to explicitly list out the main fields and associated metadata that we need to serialize. This should also help with documenting the specifics of the file format (shapes, dtypes, names) which makes it more straightforward to create a contract with downstream applications that use the data that this tool produces.

For example, an organization of the HDF5 file could look like:

/config: str [vlen]
/mjcf_xml: str [vlen]
/qpos: float32 [n_frames, ?]

It would probably be more portable and self-describing to break up qpos into its constituent elements, e.g.:

/root_xyz: float32 [n_frames, 3]
/root_quaternion: float32 [n_frames, 4]
/joint_angles: group
/joint_angles/spine1: float32 [n_frames, 3]  # 3 DOF joint
/joint_angles/elbowL: float32 [n_frames, 1]   # 1 DOF joint
...

But this format trades off generalizability for being more self-descriptive.

Whether we keep qpos in its flattened representation (useful for pipelining) or break it up into better described sub-elements (useful for portability and use outside of our pipelines) is a key decision point (though not mutually exclusive).

No matter what, we should also have a version key in the HDF5 file that can be used to route logic if this format evolves.


As a separate concern, we should also consider embedding the more useful values captured in this data structure that right now we compute on the fly after loading downstream via forward kinematics (see this module):

@struct.dataclass
class ReferenceClip:
    """This dataclass is used to store the trajectory in the env."""

    # qpos
    position: jp.ndarray = None
    quaternion: jp.ndarray = None
    joints: jp.ndarray = None

    # xpos
    body_positions: jp.ndarray = None

    # velocity (inferred)
    velocity: jp.ndarray = None
    joints_velocity: jp.ndarray = None
    angular_velocity: jp.ndarray = None

    # xquat
    body_quaternions: jp.ndarray = None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant