Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gguf cleanup #230

Merged
merged 9 commits into from
Apr 17, 2024
Merged

Gguf cleanup #230

merged 9 commits into from
Apr 17, 2024

Conversation

metascroy
Copy link
Contributor

Clean up gguf loading code. Move to use meta device. Share more code with non-gguf path.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 17, 2024
Copy link
Contributor

@mikekgfb mikekgfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure at least as much green as before landing!

Copy link
Contributor

@mergennachin mergennachin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking at the CI test cases. It is still very basic.

Can we add the following e2e test:

  • Download gguf fp version.
  • Load gguf and convert to nn.Module's state_dict and save the checkpoint.
  • Run generate.py on the new checkpoint.

build/builder.py Outdated
if builder_args.gguf_path:
model = _load_model_gguf(builder_args)
else:
model = _load_model_not_gguf(builder_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this function. _load_model_default()

@metascroy
Copy link
Contributor Author

@mergennachin I'll add the CI test you outline in a follow-up PR. We do have an existing CI test that calls generate on a quantized GGUF file here: https://github.com/pytorch/torchchat/blob/main/.github/workflows/compile-gguf.yml.

@metascroy metascroy merged commit e3516e4 into main Apr 17, 2024
18 of 23 checks passed
@metascroy metascroy deleted the gguf-cleanup branch April 17, 2024 16:39
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants