This is a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The resulting enriched datasets can be helpful for explainable semantic change modeling.
definitions_for_all_usages
: mapping from usage identifiers to generated definitionscode/
: various scripts we used in preparing the datasetshuman_evaluation/
: everything related to our evaluation effortswug_labels/
: the cluster labels themselves, the main part.
We provide cluster labels (sense definitions) for the following WUGs:
- Diachronic WUGs for English - English definitions
- Diachronic WUGs for German - German and English definitions
- NorDiaChange: Diachronic semantic change dataset for Norwegian (two subsets) - Norwegian and English definitions
- RuDSI: Word sense induction dataset for Russian - Russian and English definitions
Every WUG dataset in the wug_labels/
directory contains target word subdirectories, according to the original DWUG format.
Within each target word directory, we provide one file named cluster_gloss.tsv
. It is a tab-separated dataframe with two columns:
cluster
: the numerical identifier of the cluster from the original WUGgloss
: the definition generated for this cluster
The cluster labels should be used together with the original word usage graphs for the corresponding languages.
As a rule, one can find clusters assigned to every specific WUG usage (sentence) in the clusters/
directory.
NB: some clusters are too small to generate a meaningful definition (less than 3 usages). In these cases, the definition is accordingly "Too few examples to generate a proper definition!".
See details in the paper "Enriching Word Usage Graphs with Cluster Definitions" (LREC-COLING'2024) by Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
Code for fine-tuning encoder-decoder models on definition datasets
- English: mT0-Definition-En XL
- Norwegian: mT0-Definition-No XL
- Russian: mT0-Definition-Ru XL