The seed dataset in contained here. Labelled datasets for both NER and relation extraction tasks are displayed there.
This file contains the unsupervised methods developed at the beginning of the project to extract labelled data from raw texts. The notebook make use of these methods and also displays the number of manually labeled relations.
The NER folder contains all files related to Named-entity-recognition tasks. Two pre-processing notebooks were implemented. The one called "merged" differs from the other in that it combined product and market entities (often embiguous). Fine-tuning of Bert for token classification can be found here. It also contains hyperparameter fine-tuning but unfortunately does not display optimum parameters.
This file contains the fine tuning of the bert model and this notebook describes the preprocessing procedure.
Here is an implementation of a k-nearest-neighbor classifier and draws the baseline for new relation retrieval.
Finally, this notebook implements the whole pipeline (NER+classification) to predict new relations from text inputs.