This repository contains code and data for training and inference of a new Russian-language coreference resolution model trained on the RuCoCo corpus (see https://github.com/vdobrovolskii/rucoco).
First, to install dependcies run pip install -r requirements.txt
. Although the recommended pytorch version for AllenNLP 2.2.0 is 1.8.1, before training it is better to additionally run pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
. This will ensure support of 48 and 80 GB GPUs that is necessary for training with the command below. Coreference resolution is notorious for its extreme computational complexity.
To inference the model, download weights from: https://dl.dropbox.com/s/2m0c4o220pr1rfn/RucocoAncor_rubertb_a150_s20_sw04.tar.gz?dl=0
Then run:
allennlp evaluate --include-package rucoref RucocoAncor_rubertb_a150_s20_sw04.tar.gz data\test.conll --output-file metrics_on_test.json --predictions-output-file predictions.json
In order to train the model from scratch, use the command:
allennlp train --include-package rucoref coref_bertbase_lstm.jsonnet -s output_models
.
Set | PRECISION | RECALL | F1-SCORE |
---|---|---|---|
train | 96.1 | 89.3 | 92.5 |
development | 77.8 | 72.8 | 75.2 |
test | 81.1 | 78.2 | 79.6 |