Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer. #439

Open
OlegJakushkin opened this issue Nov 26, 2023 · 0 comments

Comments

@OlegJakushkin
Copy link

What I want to do:

model = MosaicGPT.from_pretrained(
    "mosaicml/mpt-1b-redpajama-200b",
    trust_remote_code=True,
    attn_impl='torch'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_train_data["train"],
    eval_dataset=tokenized_val_data["validation"],
    dataset_text_field="text",
    args=training_args,
    neftune_noise_alpha=5 #the only one important thing for me
)

Yet it fails with various missing features in MPT-1b implementation:

and potentially others.

Please help the community to use MPT-1b by:
a) retraining MPT-7b with 1b params size weights and MPT-7b code base
b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant