SmolLM2, Request for consideration #2060

insop · 2024-11-24T05:01:07Z

Huggingface team has released Smollm2 recently, it seems very promising.

It is new so it makes sense to review and see before bringing the model to core as in #2058, but I wanted to create an issue here and see if there is any quick steps for those wanted to try the model with torchtune.

Thank you.

CC: @ebsmothers

ebsmothers · 2024-11-24T23:25:47Z

Hi @insop thanks for creating this issue. Similar to the case of #2058, I think the model itself is quite easy to support as a Llama-style arch. E.g. for the 1.7B model:

from torchtune.models.llama3._component_builders import llama3

smollm2_1_7b = llama3(
	vocab_size=49152,
	num_layers=24,
	num_heads=32,
	num_kv_heads=32,
	embed_dim=2048,
	max_seq_len=8192,
	rope_base=130000.0,
	intermediate_dim=8192,
)

In this case the tokenizer is similar to GPT2. I believe it should be possible to implement a version of this by closely following our Qwen2Tokenizer, which uses the same underlying BPE algorithm. You will just need to modify the special tokens

insop · 2024-11-25T16:27:17Z

Thank you @ebsmothers !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmolLM2, Request for consideration #2060

SmolLM2, Request for consideration #2060

insop commented Nov 24, 2024

ebsmothers commented Nov 24, 2024 •

edited

Loading

insop commented Nov 25, 2024

SmolLM2, Request for consideration #2060

SmolLM2, Request for consideration #2060

Comments

insop commented Nov 24, 2024

ebsmothers commented Nov 24, 2024 • edited Loading

insop commented Nov 25, 2024

ebsmothers commented Nov 24, 2024 •

edited

Loading