This is the official PyTorch repo for Multi-EmoBERT, a learning framework for multi-emotion recognition.
Multi-emotion Recognition Using Multi-EmoBERT and Emotion Analysis in Fake News
Jinfen Li, Lu Xiao
WebSci 2023
If Multi-EmoBERT is helpful for your research, please consider citing our paper:
@inproceedings{li2023multi,
title={Multi-emotion Recognition Using Multi-EmoBERT and Emotion Analysis in Fake News},
author={Li, Jinfen and Xiao, Lu},
booktitle={Proceedings of the 15th ACM Web Science Conference 2023},
pages={128--135},
year={2023}
}
install pip package
pip install multi-emotion==0.1.12
use pip package
from multi_emotion import multi_emotion
multi_emotion.predict(["I am so happy today"])
preview result
[{'text': 'i am so happy today', 'pred_label': 'joy,love,optimism', 'probability': '[{"anger": 0.00022063202050048858}, {"anticipation": 0.007108359131962061}, {"disgust": 0.0006860275752842426}, {"fear": 0.00044393239659257233}, {"joy": 0.9998739957809448}, {"love": 0.8244059085845947}, {"optimism": 0.931083083152771}, {"pessimism": 0.0002464792341925204}, {"sadness": 0.007342423778027296}, {"surprise": 0.001668739365413785}, {"trust": 0.009098367765545845}]'}]
Usage via Source Code
create a folder named "resources" and put the following resources here Stanford CoreNLP
NRC Emotion Lexicon v0.2: we use NRC-Emotion-Lexicon-Wordlevel-v0.2.txt and rename it as NRC-Emotion-Lexicon.txt
create a virtual environment
conda create -n emo_env python=3.9.16
install packages via conda first and then via pip
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
conda install -c anaconda cudnn
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
conda install openjdk=8
pip install -r requirements.txt
rename .env.example as .env and change the variable values in the file
Do grid search over different configs.
python main.py -m \
dataset=se_english \
seed=0,1,2,3,4,5 \
This command evaluates a checkpoint on the train, dev, and test sets.
python main.py \
training=evaluate \
training.ckpt_path=/path/to/ckpt \
training.eval_splits=train,dev,test \
python main.py \
training=evaluate \
training.ckpt_path=/path/to/ckpt \
In offline mode, results are not logged to Neptune.
python main.py logger.offline=True
In debug mode, results are not logged to Neptune, and we only train/evaluate for limited number of batches and/or epochs.
python main.py debug=True
Hydra will change the working directory to the path specified in configs/hydra/default.yaml
. Therefore, if you save a file to the path './file.txt'
, it will actually save the file to somewhere like logs/runs/xxxx/file.txt
. This is helpful when you want to version control your saved files, but not if you want to save to a global directory. There are two methods to get the "actual" working directory:
- Use
hydra.utils.get_original_cwd
function call - Use
cfg.work_dir
. To use this in the config, can do something like"${data_dir}/${.dataset}/${model.arch}/"
-
work_dir
current working directory (wheresrc/
is) -
data_dir
where data folder is -
log_dir
where log folder is (runs & multirun) -
root_dir
where the saved ckpt & hydra config are
Here, we assume the following:
- The
data_dir
isdata
, which meansdata_dir=${work_dir}/../data
. - The dataset is
semEval 2018 task 1-english
.
The commands below are used to build pre-processed datasets, saved as pickle files. The model architecture is specified so that we can use the correct tokenizer for pre-processing. Remember to put a xxx.yaml file in the configs/dataset folder for the dataset you want to build.
python scripts/build_dataset.py --data_dir data \
--dataset se_english --arch bert-base-uncased --split train
python scripts/build_dataset.py --data_dir data \
--dataset se_english --arch bert-base-uncased --split dev
python scripts/build_dataset.py --data_dir data \
--dataset se_english --arch bert-base-uncased --split test
If the dataset is very large, you have the option to subsample part of the dataset for smaller-scale experiements. For example, in the command below, we build a train set with only 1000 train examples (sampled with seed 0).
python scripts/build_dataset.py --data_dir data \
--dataset se_english --arch bert-base-uncased --split train \
--num_samples 1000 --seed 0
The command below is the most basic way to run main.py
python main.py -m \
data=se_english \
model=lm \
model.optimizer.lr=2e-5 \
setup.train_batch_size=32 \
setup.accumulate_grad_batches=1 \
setup.eff_train_batch_size=32 \
setup.eval_batch_size=32 \
setup.num_workers=3 \
seed=0,1,2
This repo implements a number of different methods for training the Task LM. Below are commands for running each method.
Task LM + Hashtag Encoding
python main.py -m \
data=se_english \
model=lm \
model.use_hashtag=True \
model.hashtag_emb_dim=80 \
model.optimizer.lr=2e-5 \
setup.train_batch_size=32 \
setup.accumulate_grad_batches=1 \
setup.eff_train_batch_size=32 \
setup.eval_batch_size=32 \
setup.num_workers=3 \
seed=0,1,2
Task LM + Sentiment Composition
python main.py -m \
data=se_english \
model=lm \
model.use_senti_tree=True \
model.phrase_emb_dim=80 \
model.optimizer.lr=2e-5 \
setup.train_batch_size=32 \
setup.accumulate_grad_batches=1 \
setup.eff_train_batch_size=32 \
setup.eval_batch_size=32 \
setup.num_workers=3 \
seed=0,1,2
Task LM + Emotion Correlation
python main.py -m \
data=se_english \
model=lm \
model.use_emo_cor=True \
model.optimizer.lr=2e-5 \
setup.train_batch_size=32 \
setup.accumulate_grad_batches=1 \
setup.eff_train_batch_size=32 \
setup.eval_batch_size=32 \
setup.num_workers=3 \
seed=0,1,2
exp_id is the folder name under your save_dir (e.g., "se_english_bert-base-uncased_use-hashtag-True_use-senti-tree-True_xxx"), ckpt_path is the checkpoint under the checkpoints folder in the exp_id folder. The results will be saved in the model_outputs folder in the exp_id folder.
python main.py -m \
data=se_english \
training=evaluate \
ckpt_path = xxx \
exp_id = xxx \
setup.train_batch_size=32 \
setup.accumulate_grad_batches=1 \
setup.eff_train_batch_size=32 \
setup.eval_batch_size=32 \
setup.num_workers=3 \
seed=0,1,2