Bugfix arxiv (#5)

facebookresearch · Feb 10, 2020 · b98a1bd · b98a1bd
1 parent e338437
commit b98a1bd
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # CPC_audio
 
-This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper [Unsupervised Pretraining Transfers well Across Languages](FILLME). This is an unsupervised method to train audio features directly from the raw waveform.
+This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper [Unsupervised Pretraining Transfers well Across Languages](https://arxiv.org/abs/2002.02848). This is an unsupervised method to train audio features directly from the raw waveform.
 
 Moreover, this code also implements all the evaluation metrics used in the paper:
 - [ABX discriminability](https://zerospeech.com/2017/track_1.html)
@@ -211,6 +211,19 @@ python cpc/eval/common_voices_eval.py per $OUTPUT_DIR --pathVal $PATH_COMMON_VOI
 
 This model is also available via [torch.hub](https://pytorch.org/docs/stable/hub.html). For more details, have a look at hubconf.py.
 
+## Citations
+Please consider citing this project in your publications if it helps your research.
+
+```
+@misc{rivire2020unsupervised,
+    title={Unsupervised pretraining transfers well across languages},
+    author={Morgane Rivière and Armand Joulin and Pierre-Emmanuel Mazaré and Emmanuel Dupoux},
+    year={2020},
+    eprint={2002.02848},
+    archivePrefix={arXiv},
+    primaryClass={eess.AS}
+}
+```
 
 ## License
 

diff --git a/cpc/dataset.py b/cpc/dataset.py
@@ -111,6 +111,7 @@ def prepare(self):
         print(f"Done, elapsed: {time.time() - start_time:.3f} seconds")
         print(f'Scanned {len(self.seqNames)} sequences '
               f'in {time.time() - start_time:.2f} seconds')
+        print(f"{len(self.packageIndex)} chunks computed")
         self.currentPack = -1
         self.nextPack = 0
 
@@ -130,6 +131,8 @@ def loadNextPack(self, first=False):
             del self.nextData
         self.nextPack = (self.currentPack + 1) % len(self.packageIndex)
         seqStart, seqEnd = self.packageIndex[self.nextPack]
+        if self.nextPack == 0 and len(self.packageIndex) > 1:
+            self.prepare()
         self.r = self.reload_pool.map_async(loadFile,
                                             self.seqNames[seqStart:seqEnd])