-
Notifications
You must be signed in to change notification settings - Fork 42
espnet-style attn_output_weight scaling and extra after-norm layer #204
Conversation
The scaling is just so that, assuming the input variance is about 1, the variance going into the softmax is about 1. |
Maybe there will be more WER difference at worse WERs, e.g. before LM rescoring. |
.. don't you have the test-other results? |
Results of before rescoring and "test_other" are giving soon(being re-tested.) |
Relative wer decrease seems no significant difference before and after LM rescoring.
|
still better though.. good..
…On Wednesday, June 2, 2021, LIyong.Guo ***@***.***> wrote:
Maybe there will be more WER difference at worse WERs, e.g. before LM
rescoring.
Relative wer decrease seems no significant difference before and after LM
rescoring.
avg epoch 16-20 no rescore no rescore 4-gram lattice rescore 4-gram
lattice rescore
test-clean test-other test-clean test-other
before 4.33 8.96 3.87 8.08
current 4.26 8.61 3.77 7.86
relative decrease 1.62% 3.91% 2.58% 2.72%
avg epoch 26-30 no rescore no rescore 4-gram lattice rescore 4-gram
lattice rescore
test-clean test-other test-clean test-other
before 4.31 8.98 3.86 8.07
current 4.14 8.41 3.69 7.68
relative decrease 3.94% 6.35% 4.40% 4.83%
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#204 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4QDSQ3H7TBDTICNCLTQY36DANCNFSM456R7TKQ>
.
|
Can you make this an option passed in from the user code, like in your other branch, so that we can |
..I'm just concerned it might be disruptive to make this change as-is. |
To be compatible to previously trained models, maybe an optional config, e.g. is_espnet_structure (or another properer name) which default be false could be used.
|
Yes. |
@@ -285,7 +285,8 @@ def main(): | |||
num_classes=len(phone_ids) + 1, # +1 for the blank symbol | |||
subsampling_factor=4, | |||
num_decoder_layers=num_decoder_layers, | |||
vgg_frontend=True) | |||
vgg_frontend=True, | |||
is_espnet_structure=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have this in training script too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. and it's better if you change the directory name, when changing the model structure.
you can remove a couple of older components of the filename, to stop it getting too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have this in training script too
added.
.. and it's better if you change the directory name, when changing the model structure.
you can remove a couple of older components of the filename, to stop it getting too long.
- -noam-mmi-att-musan-sa-vgg
+ -mmi-att-sa-vgg-normlayer
Thanks a lot! |
Conformer structure differences are identified by loading espnet trained model into snowfall. #201
With these two modifications and 30 epoch training, final result is a bit better(3.69 < 3.86 as reported in #154) than otherwise.
Could you help verify their effectiveness (maybe they are just training variance)? @zhu-han @pzelasko
BTW, is there any mathmatics background which explains when to scaling during attn_output_weights computation? I read several papers, but failed to find a clue about this.
Rescoring WITH 4-gram lm lattice rescore
with modifications of this pr
results of 4-gram lattice rescore from #154
Rescoring WITHOUT 4-gram lm lattice rescore
with modifications of this pr
results from #154