xSLUE

Data and code for our ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy. Please find our project page (http://xslue.com/) which includes dataset, examples, classifiers, and leaderboard. If you have any questions, please contact to Dongyeop Kang ([email protected]).

We provide an online platform for cross-style language understanding and evaluation. The Cross-Style Language Understanding and Evaluation (xSLUE) benchmark contains 15 different styles and 23 classification tasks. For each task, we also provide the fine-tuned BERT classifier for further analysis. Our analysis shows that some styles are highly dependent on each other (e.g., impoliteness and offense), and some domains (e.g., tweets, political debates) are stylistically more diverse than the others (e.g., academic manuscripts).

Citation

@inproceedings{kang2021xslue,
    title = "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding",
    author = "Kang, Dongyeop  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

Notes

The downloading links are borken now. Please use this Google drive link instead as of now. Later, I will be hosting S3 or dedicated server again.
Please contact to Dongyeop ([email protected]) if you like to add your cross-style system to the leaderboard or evaluate your system on the diagnostic cross-set.
For the license issue, we did not include GYAFC in the benchmark but include only the fine-tuned classifier. You can directly contact to the authors, and then use our pre-processing script.

Download xSLUE data and fine-tuned classifiers

Before running any xSLUE tasks you should download the xSLUE data or fine-tuned BERT classifiers by running these scripts: data_download data_download, or simply running these commands:

 ./download_xslue_data.sh
 ./download_xslue_model.sh

We also provide the links to download individual dataset and model files in the table at the bottom of this page below.

`run_xslue.sh`: Fine-tuning on xSLUE tasks for style classification

You need to unpack the downloaded data to some directory $XSLUE_DIR. An example python script for loading each dataset is provided here

cd code/style_classify/
./run_xslue.sh

or

XSLUE_DIR=$HOME/data/xslue
XSLUE_MODEL_DIR=$HOME/data/xslue_model

TASK_NAMES=("SentiTreeBank" "EmoBank_v"  "EmoBank_a" "EmoBank_d" "SARC" "SARC_pol" "StanfordPoliteness" "GYAFC"  "DailyDialog" "SarcasmGhosh" "ShortRomance" "CrowdFlower" "VUA" "TroFi" "ShortHumor" "ShortJokeKaggle" "HateOffensive" "PASTEL_politics" "PASTEL_country" "PASTEL_tod" "PASTEL_age" "PASTEL_education" "PASTEL_ethnic" "PASTEL_gender")

MODEL=bert-base-uncased

for TASK_NAME in "${TASK_NAMES[@]}"
do
    echo "Running ... ${TASK_NAME}"
    CUDA_VISIBLE_DEVICES=0 \
    python classify_bert.py \
        --model_type bert \
        --model_name_or_path ${MODEL} \
        --task_name ${TASK_NAME} \
        --do_eval --do_train \
        --do_lower_case \
        --data_dir ${XSLUE_DIR}/${TASK_NAME} \
        --max_seq_length 128 \
        --per_gpu_eval_batch_size=8   \
        --per_gpu_train_batch_size=8   \
        --learning_rate 2e-5 \
        --num_train_epochs 3 \
        --output_dir ${XSLUE_MODEL_DIR}/${TASK_NAME}/${MODEL}/ \
        --overwrite_output_dir --overwrite_cache
done

Dependencies

We used python 3.7. You should also install the additional packages required by the examples:

pip install -r ./requirements.txt

xSLUE Data and Classifiers

Please check more details in xslue.com/task. NOTE: the downloading links are borken now. Please use this Google drive link instead as of now. Later, I will be hosting S3 or dedicated server again.

Style	Name	Dataset	Classifier	Original
Formality	GYAFC	Not public	download	link
Politeness	StanfordPoliteness	download	download	link
Humor	ShortHumor	download	download	link
Humor	ShortJokeKaggle	download	download	link
Sarcasm	SarcasmGhosh	download	download	link
Sarcasm	SARC	download	download	link
Metaphor	VUA	download	download	link
Metaphor	TroFi	download	download	link
Emotion	EmoBank	download	download	link
Emotion	CrowdFlower	download	download	link
Emotion	DailyDialog	download	download	link
Offense	HateOffensive	download	download	link
Romance	ShortRomance	download	download	link
Sentiment	SentiTreeBank	download	download	link
Persona	PASTEL	download	download	link

Acknolwedgements

our style classification code is based on huggingface's transformers on GLUE tasks.
our BiLSTM baseline code is based on Pytorch-RNN-text-classification.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
code		code
.gitignore		.gitignore
README.md		README.md
correlation.png		correlation.png
requirement.txt		requirement.txt
run_download_data.sh		run_download_data.sh
run_download_model.sh		run_download_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xSLUE

Citation

Notes

Download xSLUE data and fine-tuned classifiers

`run_xslue.sh`: Fine-tuning on xSLUE tasks for style classification

Dependencies

xSLUE Data and Classifiers

Acknolwedgements

About

Releases

Packages

Languages

dykang/xslue

Folders and files

Latest commit

History

Repository files navigation

xSLUE

Citation

Notes

Download xSLUE data and fine-tuned classifiers

run_xslue.sh: Fine-tuning on xSLUE tasks for style classification

Dependencies

xSLUE Data and Classifiers

Acknolwedgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`run_xslue.sh`: Fine-tuning on xSLUE tasks for style classification

Packages