Can't reproduce the redult of the paper #12

zz12375 · 2020-07-30T03:10:10Z

Hi, team.
I am very greatful you provide the code and data splits for your CPC audio paper "https: //arXiv.org/abs/2002.02848".

First I tried to pretrain Mod. CPC on libri-100 and frozen the features for common voice 1-hour ASR task, I got avg per of 45.2% on 5 languages (es, fr, it, ru, tt), which is reported as 43.9% in your paper (Table 3), I think my results is close (-1.3%) to what you reported, which seemed reasonable.

But when I test the pre-trained features on 5-hour common voice ASR tasks (es, fr, it, ru, tt), I just got a avg per (frozen features) of 42.5%, which had a big gap (-3.7%) with the reported per (38.8%, Table 5 in paper); when finetuning features, the gap was even bigger, the avg per was 37.2% (in the paper it is reported as 31.0%).
Unfortunately, the 5-hour common voice ASR experiments also perform badly when training from scratch, a avg per of 43.2%, far behind 38.3% reported in your paper.

I will be very thankful if you kindly provide more detailed hyper-parameters to help me reproduce your results.
Especially, I noticed you have set a optional argument --LSTM in ./eval/common_voices_eval.py to add a LSTM layer before the linear softmax layer. I think it would significantly increase the model capacity and may lead to better performance, did you use it?
Thnk you very much!

For now I used the default hyper-parameters on common voice ASR transfer experiments:
--batchSize 8
--lr 2e-4
--nEpoch 30
--kernelSize 8
......

sameerkhurana10 · 2021-01-03T22:58:56Z

Hi @zz12375,

I am also not able to reproduce the results.

Could you please let me know what is the PER you are getting on fr alone?

zz12375 · 2021-01-04T01:49:28Z

Hi~@sameerkhurana10

Evaluated on 300 epoch pretraining model, the PER tuned with 5-hour fr is 46.49% (frozen features) and 40.05% (finetuning features).

sameerkhurana10 · 2021-01-04T02:11:00Z

Thanks.

Here are some PERs on the french test set:

This model:

Classifier Training Data: 1 hour French
Feature Extractor Frozen: Yes
CPC Model Checkpoint: https://dl.fbaipublicfiles.com/librilight/CPC_checkpoints/60k_epoch4-d0f474de.pt
CPC Model Training Data: Libri Light 60k Hours
PER: 45. %

Wav2Vec [Paper]:

Classifier Training Data: 1 hour French
Feature Extractor Frozen: Yes
Model Checkpoint: https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_large.pt
Model Training Data: Librispeech 960 hours
PER: 42. %

ConvDMM [Paper]

Classifier Training Data: 1 hour French
Feature Extractor Frozen: Yes
Model Training Data: Librispeech 960 hours
PER: 46. %

Note: I did not use this toolkit to get PER, but I wanted to be consistent with the CTC phone classifier that this repo is using so I copied the following class in my codebase:

CPC_audio/cpc/eval/common_voices_eval.py

Line 128 in b98a1bd

class CTCphone_criterion(torch.nn.Module):

In your case, maybe increase the number of epochs

zz12375 · 2021-01-04T06:52:38Z

Thank you very much for sharing your results.

You mean increasing the number of epochs during the down-stream finetuning stage (CTC loss), or pretraining stage?

sameerkhurana10 · 2021-01-04T08:55:34Z

I meant during downstream CTC phone classifier training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce the redult of the paper #12

Can't reproduce the redult of the paper #12

zz12375 commented Jul 30, 2020 •

edited

Loading

sameerkhurana10 commented Jan 3, 2021

zz12375 commented Jan 4, 2021 •

edited

Loading

sameerkhurana10 commented Jan 4, 2021 •

edited

Loading

zz12375 commented Jan 4, 2021

sameerkhurana10 commented Jan 4, 2021

Can't reproduce the redult of the paper #12

Can't reproduce the redult of the paper #12

Comments

zz12375 commented Jul 30, 2020 • edited Loading

sameerkhurana10 commented Jan 3, 2021

zz12375 commented Jan 4, 2021 • edited Loading

sameerkhurana10 commented Jan 4, 2021 • edited Loading

zz12375 commented Jan 4, 2021

sameerkhurana10 commented Jan 4, 2021

zz12375 commented Jul 30, 2020 •

edited

Loading

zz12375 commented Jan 4, 2021 •

edited

Loading

sameerkhurana10 commented Jan 4, 2021 •

edited

Loading