Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmolLM2, Request for consideration #2060

Open
insop opened this issue Nov 24, 2024 · 2 comments
Open

SmolLM2, Request for consideration #2060

insop opened this issue Nov 24, 2024 · 2 comments

Comments

@insop
Copy link

insop commented Nov 24, 2024

Huggingface team has released Smollm2 recently, it seems very promising.

It is new so it makes sense to review and see before bringing the model to core as in #2058, but I wanted to create an issue here and see if there is any quick steps for those wanted to try the model with torchtune.

Thank you.

CC: @ebsmothers

@ebsmothers
Copy link
Contributor

ebsmothers commented Nov 24, 2024

Hi @insop thanks for creating this issue. Similar to the case of #2058, I think the model itself is quite easy to support as a Llama-style arch. E.g. for the 1.7B model:

from torchtune.models.llama3._component_builders import llama3

smollm2_1_7b = llama3(
	vocab_size=49152,
	num_layers=24,
	num_heads=32,
	num_kv_heads=32,
	embed_dim=2048,
	max_seq_len=8192,
	rope_base=130000.0,
	intermediate_dim=8192,
)

In this case the tokenizer is similar to GPT2. I believe it should be possible to implement a version of this by closely following our Qwen2Tokenizer, which uses the same underlying BPE algorithm. You will just need to modify the special tokens

@insop
Copy link
Author

insop commented Nov 25, 2024

Thank you @ebsmothers !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants