Skip to content
/ xslue Public

ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy

Notifications You must be signed in to change notification settings

dykang/xslue

Repository files navigation

xSLUE

Data and code for our ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy. Please find our project page (http://xslue.com/) which includes dataset, examples, classifiers, and leaderboard. If you have any questions, please contact to Dongyeop Kang ([email protected]).

We provide an online platform for cross-style language understanding and evaluation. The Cross-Style Language Understanding and Evaluation (xSLUE) benchmark contains 15 different styles and 23 classification tasks. For each task, we also provide the fine-tuned BERT classifier for further analysis. Our analysis shows that some styles are highly dependent on each other (e.g., impoliteness and offense), and some domains (e.g., tweets, political debates) are stylistically more diverse than the others (e.g., academic manuscripts).

Citation

@inproceedings{kang2021xslue,
    title = "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding",
    author = "Kang, Dongyeop  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}    

Notes

  • The downloading links are borken now. Please use this Google drive link instead as of now. Later, I will be hosting S3 or dedicated server again.
  • Please contact to Dongyeop ([email protected]) if you like to add your cross-style system to the leaderboard or evaluate your system on the diagnostic cross-set.
  • For the license issue, we did not include GYAFC in the benchmark but include only the fine-tuned classifier. You can directly contact to the authors, and then use our pre-processing script.

Download xSLUE data and fine-tuned classifiers

Before running any xSLUE tasks you should download the xSLUE data or fine-tuned BERT classifiers by running these scripts: data_download data_download, or simply running these commands:

 ./download_xslue_data.sh
 ./download_xslue_model.sh

We also provide the links to download individual dataset and model files in the table at the bottom of this page below.

run_xslue.sh: Fine-tuning on xSLUE tasks for style classification

You need to unpack the downloaded data to some directory $XSLUE_DIR. An example python script for loading each dataset is provided here

cd code/style_classify/
./run_xslue.sh

or

XSLUE_DIR=$HOME/data/xslue
XSLUE_MODEL_DIR=$HOME/data/xslue_model

TASK_NAMES=("SentiTreeBank" "EmoBank_v"  "EmoBank_a" "EmoBank_d" "SARC" "SARC_pol" "StanfordPoliteness" "GYAFC"  "DailyDialog" "SarcasmGhosh" "ShortRomance" "CrowdFlower" "VUA" "TroFi" "ShortHumor" "ShortJokeKaggle" "HateOffensive" "PASTEL_politics" "PASTEL_country" "PASTEL_tod" "PASTEL_age" "PASTEL_education" "PASTEL_ethnic" "PASTEL_gender")

MODEL=bert-base-uncased

for TASK_NAME in "${TASK_NAMES[@]}"
do
    echo "Running ... ${TASK_NAME}"
    CUDA_VISIBLE_DEVICES=0 \
    python classify_bert.py \
        --model_type bert \
        --model_name_or_path ${MODEL} \
        --task_name ${TASK_NAME} \
        --do_eval --do_train \
        --do_lower_case \
        --data_dir ${XSLUE_DIR}/${TASK_NAME} \
        --max_seq_length 128 \
        --per_gpu_eval_batch_size=8   \
        --per_gpu_train_batch_size=8   \
        --learning_rate 2e-5 \
        --num_train_epochs 3 \
        --output_dir ${XSLUE_MODEL_DIR}/${TASK_NAME}/${MODEL}/ \
        --overwrite_output_dir --overwrite_cache
done

Dependencies

We used python 3.7. You should also install the additional packages required by the examples:

pip install -r ./requirements.txt

xSLUE Data and Classifiers

Please check more details in xslue.com/task. NOTE: the downloading links are borken now. Please use this Google drive link instead as of now. Later, I will be hosting S3 or dedicated server again.

Style Name Dataset Classifier Original
Formality GYAFC Not public download link
Politeness StanfordPoliteness download download link
Humor ShortHumor download download link
Humor ShortJokeKaggle download download link
Sarcasm SarcasmGhosh download download link
Sarcasm SARC download download link
Metaphor VUA download download link
Metaphor TroFi download download link
Emotion EmoBank download download link
Emotion CrowdFlower download download link
Emotion DailyDialog download download link
Offense HateOffensive download download link
Romance ShortRomance download download link
Sentiment SentiTreeBank download download link
Persona PASTEL download download link

Acknolwedgements

About

ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published