-
Notifications
You must be signed in to change notification settings - Fork 42
n-best rescore with transformer lm #201
base: master
Are you sure you want to change the base?
Conversation
Great!! |
Yes, the modeling units are 5000 tokens including "<blank>". |
Thanks!
You may run into memory problems. Fangjun recently committed a code change
that can be used to work around something related to that, though.
We need to make sure our recipes can run for those kinds of sizes anyway.
…On Tue, May 25, 2021 at 10:21 AM LIyong.Guo ***@***.***> wrote:
Great!!
I assume the modeling units are BPE pieces? I think a good step towards
resolving the difference would be to train
(i) a CTC model
(ii) a LF-MMI model
using those same BPE pieces.
Yes, the modeling units are 5000 tokens including .
I will do the suggested experiments.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#201 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2ABWX6JSLSM35IIELTPMCSRANCNFSM45NKCFJQ>
.
|
b_to_a_map=b_to_a_map, | ||
sorted_match_a=True) | ||
lm_path_lats = k2.top_sort(k2.connect(lm_path_lats.to('cpu'))).to(device) | ||
lm_scores = lm_path_lats.get_tot_scores(True, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd arg to get_tot_scores() here, representing log_semiring
, should be false, because ARPA-type language models are constructed in such a way that the backoff prob is included in the direct arc. I.e. we would be double-counting if we were to sum the probabilities of the non-backoff and backoff arcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more documentation to your code.
x -= self.mean | ||
|
||
if norm_vars: | ||
x /= self.std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norm_means
uses a guard requires_grad
to choose whether to perform an in-place update. Is there a reason not to do the same here?
The original implementation
https://github.com/espnet/espnet/blob/08feae5bb93fa8f6dcba66760c8617a4b5e39d70/espnet/nets/pytorch_backend/frontends/feature_transform.py#L135
uses self.scale
to do a multiplication, which is more efficient than dividing by self.std
.
def encode( | ||
self, speech: torch.Tensor, | ||
speech_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding doc describing the shape of various tensors?
return nnet_output | ||
|
||
@classmethod | ||
def build_model(cls, asr_train_config, asr_model_file, device): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cls
is never used.
I would suggest changing @classmethod
to @staticmethod
and removing cls
.
""" | ||
model = TransformerLM(**config) | ||
|
||
assert model_file is not None, f"model file doesn't exist" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"{model_file} doesn't exist"
if model_type == 'espnet': | ||
return load_espnet_model(config, model_file) | ||
elif model_type == 'snowfall': | ||
raise NotImplementedError(f'Snowfall model to be suppported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use f-string
here.
self.unk_idx = self.token2idx['<unk>'] | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to use dataclass
here?
Also, could you remove the class NumericalizerMixin
?
The extra level of inheritance makes the code hard to read.
# The original link of these models is: | ||
# https://zenodo.org/record/4604066#.YKtNrqgzZPY | ||
# which is accessible by espnet utils | ||
# The are ported to following link for users who don't have espnet dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The -> They
# The are ported to following link for users who don't have espnet dependencies. | ||
if [ ! -d snowfall_model_zoo ]; then | ||
echo "About to download pretrained models." | ||
git clone https://huggingface.co/GuoLiyong/snowfall_model_zoo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using git clone --depth 1
. It improves the clone speed.
blank_bias = -1.0 | ||
nnet_output[:, :, 0] += blank_bias | ||
|
||
supervision_segments = torch.tensor([[0, 0, nnet_output.shape[1]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the batch size always 1? A larger batch size can improve decoding speed.
|
||
ref = batch['supervisions']['text'] | ||
for i in range(len(ref)): | ||
hyp_words = text.split(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the format of text
?
Does text
depend on i
? If not, you can split
it outside of the for loop.
Wer results of this pr (by loaded models from espnet model zoo):
This pr implements following procedure with models from espnet model zoo:
Added benefits by loading espnet trained conformer encoder model with equivalent snowfall model definition:
Also, the loaded espnet transformer lm could be used as a baseline for snowfall lm training tasks.