You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we could sucessfully pretrain various MosaicBERT models and evaluations with composer-based fine-tuning look really good :)
However, when using a/the conversion script llm-foundry/scripts/inference/convert_composer_to_hf.py the converted HF model seems to be initialized randomly and the MLM predictions are looking super random.
I used the conversion script from the llm-foundry repository like this:
It then shows, that various weights are not correctly initalized:
HF checkpoint folder successfully created at ./converted-3.
Loading model from ./converted-3
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of BertLMHeadModel were not initialized from the model checkpoint at ./converted-3 and are newly initialized
: ['bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.bias',
'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.0.output.LayerNorm.weight'
[...]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Is there any special conversion script/hints for converting a MosaicBERT composer checkpoint 🤔
Any help is highly appreciated!
The text was updated successfully, but these errors were encountered:
Hi, the conversion script in LLM Foundry is not intended for MosaicBERT, which still lives here in examples repo. To export it properly with the code files, you'll need to do some manual movement of the code files. See my other answer as well: #401 (comment)
Hi,
we could sucessfully pretrain various MosaicBERT models and evaluations with composer-based fine-tuning look really good :)
However, when using a/the conversion script
llm-foundry/scripts/inference/convert_composer_to_hf.py
the converted HF model seems to be initialized randomly and the MLM predictions are looking super random.I used the conversion script from the
llm-foundry
repository like this:It then shows, that various weights are not correctly initalized:
Is there any special conversion script/hints for converting a MosaicBERT composer checkpoint 🤔
Any help is highly appreciated!
The text was updated successfully, but these errors were encountered: