Skip to content

This repository contains code for converting TREC CAsT datasets into the SCAI Eval 2024 dataset submission format.

License

Notifications You must be signed in to change notification settings

search-oriented-conversational-ai/scai-eval24-dataset-conversion-trec-cast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCAI Eval 2024 Dataset Conversion: TREC CAsT

This repository contains code for converting TREC CAsT datasets into the SCAI Eval 2024 dataset submission format.

Generic Setup

git clone [email protected]:johanneskiesel/treccastweb.git

Convert Data

2022

Adds text, title, and url of MSMARCO doc v2 and KILT, as well as the topics to the dataset. Does not add data for Washington Post as the license of that dataset prevents open sharing.

./src/bash/parse-treccast.sh \
  treccastweb/2022/2022_evaluation_topics_flattened_duplicated_v1.0.json 2022 \
  > data/trec-cast-2022.ndjson

python3 src/python/add-by-id.py data/2022-topic-per-* data/provenance-* data/trec-cast-2022.ndjson \
  | sponge data/trec-cast-2022.ndjson

Converted dataset: data/trec-cast-2022.ndjson

2021

Missing provenance at the moment.

./src/bash/parse-treccast.sh \
  treccastweb/2021/2021_manual_evaluation_topics_v1.0.json 2021 \
  > data/trec-cast-2021.ndjson

Converted dataset: data/trec-cast-2021.ndjson

Resources

Topics for 2022

Created for and described in the paper (as information needs):

Paul Owoicho, Ivan Sekulic, Mohammad Aliannejadi, Jeffrey Dalton and Fabio Crestani. Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), pages 632-642, 2023. ACM. DOI: 10.1145/3539618.3591683

Provided by the authors.

KILT

MSMARCO doc v1

MSMARCO doc v2

About

This repository contains code for converting TREC CAsT datasets into the SCAI Eval 2024 dataset submission format.

Resources

License

Stars

Watchers

Forks