Neural machine translation

Team project for Natural Language Processing with Representation Learning (DS-GA 1011)

Data

Vietnamese-English and Chinese-English parallel corpus provided by the instructors.

Pre-trained word embeddings: using fastText word vectors (more information).

Please have your data ready in following structure:

<DATA_PATH>
    |- iwslt-vi-en
        |- train.tok.vi
        |- ...
    |- iwslt-zh-en
        |- train.tok.zh
        |- ...
    |- word_vectors
        |- cc.en.300.vec
        |- cc.vi.300.vec
        |- cc.zh.300.vec

Installation

Do this installation if you are going to experiment with the code

$ git clone https://github.com/ds1011teamproject/translation.git
$ mkdir data
$ mkdir model_saves

! If you are using different folders for data and models, update the data file paths in config/basic_conf.py.

Releasing updates:

Please do the following when pushing a change out:

increment version for libs
add change notes to changelogs/README.md

Run

Running on HPC

$ module load anaconda3/5.3.0  # HPC only
$ module load cuda/9.0.176 cudnn/9.0v7.0.5  # HPC only
$ conda create -n mt python=3.6
$ conda activate mt
$ conda install torch pandas numpy tqdm

See this guide for detailed instructions on how to run on HPC.

On HPC, you might need to add the following line to your ~/.bashrc:

. /share/apps/anaconda3/5.3.0/etc/profile.d/conda.sh

Running locally

This will execute the version that is installed in site-packages:

$ python -m main

Running in a Jupyter notebook

See main_nb.ipynb

RNN encoder-decoder

PyTorch implementation of recurrent neural network (RNN) encoder-decoder architecture model for statistical machine translation, cf. "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation" (Cho et al., 2014)

Further references

pytorch/fairseq/models/LSTM

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
changelogs		changelogs
config		config
libs		libs
other		other
sbatch_scripts		sbatch_scripts
.gitignore		.gitignore
Neural_MT__final_project_.pdf		Neural_MT__final_project_.pdf
README.md		README.md
demo_GridSearch.py		demo_GridSearch.py
demo_SingleModel.py		demo_SingleModel.py
demo_TrainedModelBLEU.py		demo_TrainedModelBLEU.py
demo_beam2.py		demo_beam2.py
grid_DROPOUT.py		grid_DROPOUT.py
grid_HIDDEN_LR.py		grid_HIDDEN_LR.py
grid_VOC_FT.py		grid_VOC_FT.py
matplotlibrc		matplotlibrc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural machine translation

Data

Installation

Releasing updates:

Run

Running on HPC

Running locally

Running in a Jupyter notebook

RNN encoder-decoder

Further references

About

Releases

Packages

Contributors 4

Languages

ds1011teamproject/translation

Folders and files

Latest commit

History

Repository files navigation

Neural machine translation

Data

Installation

Releasing updates:

Run

Running on HPC

Running locally

Running in a Jupyter notebook

RNN encoder-decoder

Further references

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages