Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(save_file): add force_contiguous(default=True) to save_file #530

Closed
wants to merge 1 commit into from

Conversation

dbyoung18
Copy link

@dbyoung18 dbyoung18 commented Sep 25, 2024

What does this PR do?

Refering the inferface of save_model, add force_contiguous(default=True) to force the state_dict to be saved as contiguous tensors, and finally into raw bytes in safetensors format.

The fix could contribute such scenario, when running workloads w/ channels_last format in PyTorch, but need finally saved to .safetensor by calling transformers::safe_save_file. Meanwhile, it could also align the semantics between save_model & save_file.

A related issue encountered such scenario: #308

… dictionary of tensors into raw bytes in safetensors format.

Signed-off-by: dbyoung18 <[email protected]>
@dbyoung18
Copy link
Author

dbyoung18 commented Sep 25, 2024

@Narsil @ArthurZucker @osanseviero Could you help review the PR?

@yao-matrix
Copy link

@ArthurZucker , hi, could you help take a look at this issue, thx very much.
The thing is channel_last format is a supported format in pytorch and adopted by intel xpu for better performance. So, when we run training/finetuning and save the checkpoint, transformers will call safe_save_file and finally to safetensors's save_file which not support force_contiguous yet(but save_model already). This will not impact the prior behavior, it just add non_contiguous format format.

Thx.

@Narsil
Copy link
Collaborator

Narsil commented Nov 6, 2024

No, this option will not be added.
safetensors is very explicit in not allowing this. Users should do this on their own as making tensors contiguous should be an express intent by the user. save_file is extremely strict on purpose.

The reason is that there are valid use cases where the tensors must not be contiguous on the model, therefore they must not be casted to contiguous automatically during save(even with a choice on user's behalf). Doing so would break those models during reload (or the speed performance, or something else).
The use case is typically a channel location format. Safetensor doesn't hold semantic information about shapes, therefore it's impossible to recover it after being saved.

save_model/load_model follow different semantics because the layout and shape semantics are contained in the model, therefore those operations can occur be uncasted/recasted with those interfaces. Both functions exists purely because not everyone uses transformers.

If users want to force contiguous tensors, they must explicitly do so after getting their model's state.

It's up to transformers to hold those in save_pretrained/from_pretrained to follow similar patterns as save_model/load_model if it needs to be, but not this function.

@dbyoung18
Copy link
Author

No, this option will not be added. safetensors is very explicit in not allowing this. Users should do this on their own as making tensors contiguous should be an express intent by the user. save_file is extremely strict on purpose.

The reason is that there are valid use cases where the tensors must not be contiguous on the model, therefore they must not be casted to contiguous automatically during save(even with a choice on user's behalf). Doing so would break those models during reload (or the speed performance, or something else). The use case is typically a channel location format. Safetensor doesn't hold semantic information about shapes, therefore it's impossible to recover it after being saved.

save_model/load_model follow different semantics because the layout and shape semantics are contained in the model, therefore those operations can occur be uncasted/recasted with those interfaces. Both functions exists purely because not everyone uses transformers.

If users want to force contiguous tensors, they must explicitly do so after getting their model's state.

It's up to transformers to hold those in save_pretrained/from_pretrained to follow similar patterns as save_model/load_model if it needs to be, but not this function.

Make sense to me. I will close the PR & take ur suggestion to fix the case on other place. THX for detailed explanation:)

@dbyoung18 dbyoung18 closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants