We present a tool capable of automatically generating questions in Portuguese with controlled difficulty. Our methodology is guided by three different kinds of questions: (A) Grammar, (B) Factoid (wh-questions), and (C) Pronoun reference. For the former (A), we have followed a rule-based approach by establishing rules aligned with the Portuguese grammar. For reading comprehension (B), we generate facoid (who-type) questions and, for that, we tested five different methods. The first one performs a syntax-based analysis by using the information obtained from Part-of-Speech tagging and Named Entity Recognition. The second carries out a semantic analysis of the sentences, through Semantic Role Labeling. The third method extracts the inherent dependencies within sentences using Dependency Parsing. The fourth takes advantage of the relative pronouns and adverbs found in the sentences. The fifth explores the usefulness and practicality of discourse connectors. Finally, for the last approach (C), we create pronoun reference questions, in which we do not only generate our questions but also the text excerpts they are generated from. We define heuristic functions that assign difficulty values for each question.
Apresentamos uma ferramenta capaz de gerar questões em Português, de forma automática e com controlo da dificuldade. A nossa metodologia permite gerar questões sobre (A) Gramática, (B) Compreensão de Leitura e (C) Referenciação de Pronomes. A primeira abordagem (A) permite gerar questões gramaticais. Para isso, seguimos uma técnica baseada em regras, para a qual foram estabelecidas regras bem definidas, de acordo com a gramática portuguesa. A segunda abordagem (B) visa gerar questões de compreensão de leitura (factuais) e, para tal, foram testados cinco métodos diferentes. O primeiro faz uma análise baseada em sintaxe, utilizando as informações obtidas através do analisador morfológico e do reconhecimento de entidades mencionadas. O segundo elabora uma análise semântica das frases, através da rotulagem dos papéis semânticos. O terceiro método extrai as relações de dependências inerentes às frases, através da Análise de Dependências. O quarto tira proveito dos pronomes e advérbios relativos encontrados nas frases. O quinto explora a utilidade dos conectores de discurso. Finalmente, para a última abordagem (C), foram criadas perguntas sobre referenciação de pronomes, nas quais, para além das perguntas geradas, são também geradas as passagens de texto a partir das quais surgem as questões. Foram estabelecidas funções heurísticas que atribuem valores de dificuldade para cada uma das perguntas.
- Factoid Question Generation
- Grammar Question Generation
- Pronoun Reference Question Generation
- Difficulty Controllable
Python 3
Java
- Install nlpnet (as indicated by the author):
Note: We
git clone https://github.com/erickrf/nlpnet cd nlpnet-master cython network.pyx python setup.py install
do not
recommend installingnlpnet
viapip
due to some possible incompatibility issues. - Clone our project:
git clone https://github.com/bernardoleite/question-generation-portuguese
- Install the Python packages from requirements.txt. If you are using a virtual environment for Python package management, you can install all python packages needed by using the following bash command:
cd question-generation-portuguese/ pip install -r requirements.txt
- If you are using Windows you may need to specify the java path. Go to
question-generation-portuguese/gen_module/utils/config.py
and specificy your java path:JAVA_PATH = "....../bin/java.exe"
You can use this software via web application or via code with demo.py
script.
-
After navigating to
question-generation-portuguese
folder, type the following command:python web_app.py
The first time you run the program the following question will appear:
Using the default treebank "pt_bosque" for language "pt". Would you like to download the models for: pt_bosque now? (Y/n)
Please, type
Y
and pressEnter
for downloading the models. -
In the end, the following message will appear:
Debugger PIN: 123-123-123 Running on http://127.0.0.1:XXXX/ (Press CTRL+C to quit)
Open a browser at the given link. You should see the homepage:
-
Go to
demo.py
insidequestion-generation-portuguese
folder. You can change some options:# Example text text_example = """Type here your text.""" # Number of questions (per question type). # Note: The system will attempt to generate up to the number of questions requested. It may not always be possible, depending on the text. NR_QUESTIONS_PER_TYPE = 10 # ->DESIRED<- difficulty degree. You can request different difficulty degrees per question type. # Note: The system will try to maximize, minimize or randomize the difficulty. It does not mean that it is possible in all cases, depending on the text. DIFFICULTY_DEGREE = 'DIFF' # Possible options: 'DIFF', 'EASY' or 'RANDOM' # List of possible requests list_requests = [ {'type': 'grammar', 'questions_requests': [['g_sequence', NR_QUESTIONS_PER_TYPE, DIFFICULTY_DEGREE], ['g_adverbstype', NR_QUESTIONS_PER_TYPE, DIFFICULTY_DEGREE], ['g_dettype', NR_QUESTIONS_PER_TYPE, DIFFICULTY_DEGREE], ........ ........ ........ ]
-
Run
demo.py
:python demo.py
The first time you run the program the following question will appear:
Using the default treebank "pt_bosque" for language "pt". Would you like to download the models for: pt_bosque now? (Y/n)
Please, type
Y
and pressEnter
for downloading the models. -
If all goes well, you should see the generated questions printed and the following message at the end:
Sucesso! Foi criado um novo questionário! Ver pasta -- results.
-
(Optionally) Go to newly created folder
results
for checking the generated questions in text files.
TODO
Important note: This is an experimental system, resulting from ongoing research. The generated questions may have grammatical errors(although we do our best not to 😉)
To ask questions, report issues or request features, please use the GitHub Issue Tracker.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks in advance!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is released under the General Public License Version 3.0 (or later). For details, please see the file LICENSE in the root directory.
Additionaly, this project includes third party software components: stanfordnlp, stanfordner, nlpnet, and this ner portuguese model from this masters thesis. Each of these components have their own license. Please see stanfordnlp/license, stanfordner, nlpnet, and the ner portuguese model correspondingly.
A commercial license may also be available for use in industrial projects, collaborations or distributors of proprietary software that do not wish to use the GPL v3 (or later). Please contact the author if you are interested.
If you use this software in your work, please kindly cite our research:
@inproceedings{leite2023_rules,
author={Bernardo Leite. and Henrique Cardoso.},
title={Do Rules Still Rule? Comprehensive Evaluation of a Rule-Based Question Generation System},
booktitle={Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU,},
year={2023},
pages={27-38},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011852100003470},
isbn={978-989-758-641-5},
issn={2184-5026},
}
@mastersthesis{leite2020_ms,
author = {Bernardo Leite},
booktitle = {Dissertation for obtaining the Master Degree in Informatics and Computing Engineering},
title = {Automatic Question Generation for the Portuguese Language},
school = {Faculty of Engineering, University of Porto},
url = {https://hdl.handle.net/10216/128541},
day = {20},
month = {07},
year = {2020}
}
Also consider citing the third party software components. Please, see on their respective pages -- links above.
- Bernardo Leite, [email protected]
- Henrique Lopes Cardoso, [email protected]
- Luís Paulo Reis, [email protected]