Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom TypeHandler and "No per-process OCDBT checkpoint subdirs" warning #1326

Open
PhilipVinc opened this issue Nov 13, 2024 · 2 comments
Open

Comments

@PhilipVinc
Copy link

Hello,

I've recently created a custom type handler.
Using it, and running on a single process I see the following warning

WARNING:absl:[process=0][thread=async_save_18] Skipping merge of OCDBT checkpoints: No per-process OCDBT checkpoint subdirs found in /tmp/ckp3/115.orbax-checkpoint-tmp-136/callbacks.orbax-checkpoint-tmp-139, 

The custom type handler I wrote serialises some custom type containing some numpy arrays, and if I was to run this across multiple processes I'd like only the master process to serialise the data (which is basically replicated).

How can I silence this warning? Did I forgot to define something?

@cpgaffney1
Copy link
Collaborator

The merge is there to allow ArrayHandler to write data to per-process subdirectories, at which point they can be merged to form a "global view" that is used for restoration. In your custom handler the master process is responsible for serializing everything, so you already have a global view.

You could silence the warning by using your own PyTreeCheckpointHandler that just skips the finalize implementation.

Or your custom TypeHandler could write data to ocdbt.process_X on the master process and the merge would be performed on that single subdirectory, so the merge is basically a no-op since there's only one process.

@PhilipVinc
Copy link
Author

Thank you @cpgaffney1 .

I think I figured what the problem was...

If I use a PytreeSave which contains only types that are handled by a 'custom Type handler' (that do not create an ocdbt.process_X folder) then this warning gets thrown.

This is because PyTreeSave assumes that at least 1 ocdbt-directory-creation type handler is used to treat the collection, but this is not guaranteed..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants