This repository contains code for the paper Diversify Question Generation with Retrieval-Augmented Style Transfer
- we provide our processed_data in data_link.
- we also provide our model checkpoint in checkpoint_link.
- if you use our repository, please cite paper. If you find this code useful in your research, please consider citing:
@misc{gou2023diversify,
title={Diversify Question Generation with Retrieval-Augmented Style Transfer},
author={Qi Gou and Zehua Xia and Bowen Yu and Haiyang Yu and Fei Huang and Yongbin Li and Nguyen Cam-Tu},
year={2023},
eprint={2310.14503},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- squad1.1, zhou split
- This split of squad refers to Neural Question Generation from Text- A Preliminary Study
- data num of train/dev/test is 86,635/8,965/8,964 respectively.
- squad1.1, du split
- This split of squad refers to Learning to Ask: Neural Question Generation for Reading Comprehension
- data num of train/dev/test is 70484/10570/11877 respectively.
- newsqa
- This dataset refers to NewsQA: A Machine Comprehension Dataset
- data num of train/dev/test is 92549/5166/5126 respectively.
- process original data
python data/process_data.py
refer to data/readme.md
- convert and store corpus data into faiss vector
python rast/rag/prepare_dataset.py
refer to rast/rag/prepare_dataset.py
refer to rast/qg/readme.md
refer to rast/qg/readme.md
refer to rast/reward_mdoel/T5_QA/readme.md
refer to rast/rag/readme_v100.md