Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!π₯³
Updated: Source code is released!π€©
I will release the code very soon.
βββ README.md
βββ ckpt # store model weights during training
βΒ Β βββ README.md
βββ data # store the data
βΒ Β βββ README.md
βββ gen_data.py # generate Dataset
βββ install_cli.sh # install required package
βββ loss.py # loss function
βββ main_xlm_bert.py # train mulingual-BERT
βββ main_xlm_roberta.py # train XLM-RoBERTa
βββ model.py # model implementation
βββ pred_data
βΒ Β βββ README.md
βββ preprocessing.py # preprocess the data
βββ pretrained_weights # store the pretrained weights
βΒ Β βββ README.md
βββ train.py # define training and validation loop
Use the following command so that you can install all of required packages:
sh install_cli.sh
The first step is to preprocess the data. Just use the following command:
python3 -u preprocessing.py
The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.
If you want to train model which use multilingual-BERT as the encoder, use the following command:
nohup python3 -u main_xlm_bert.py \
--base_path your base path \
--batch_size 8 \
--epochs 50 \
> train_xlm_bert_log.log 2>&1 &
If you want to train model which use XLM-RoBERTa as the encoder, use the following command:
nohup python3 -u main_xlm_roberta.py \
--base_path your base path \
--batch_size 8 \
--epochs 50 \
> train_xlm_roberta_log.log 2>&1 &
The final step is inference after training. Use the following command:
nohup python3 -u inference.py > inference.log 2>&1 &
Congralutions! You have got the final results!π€©
If you use our code, please indicate the source.