Skip to content

nikifori/Apella-plus-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Researchers Relevance Estimation to Scientific Disciplines for Recommendation System using Natural Language Processing Techniques (APELLA+)

This repository contains the source code and test datasets of the Diploma Thesis "Researchers Relevance Estimation to Scientific Disciplines for Recommendation System using Natural Language Processing Techniques", developed by Vasileios Moschopoulos and Konstantinos Nikiforidis, undergraduate students of the Department of Electrical and Computer Engineering at Aristotle University of Thessaloniki.

The project consists of two main parts. The first is an implementation for automated web scraping of researchers scientific publications data (title, abstract, year, etc) from one of the most popular scientific search engines (Google Scholar, Semantic Scholar, ResearchGate), while in the second part we implemented a pipeline of relevance ranking list extraction for university professors (from a register pool) with an open academic position, based on text embedding similarity comparisons. The pretrained models used for the sentence embeddings extraction are SciBERT and SPECTER, based on the BERT architecture trained on scientific text corpora, while also further fine tuning was performed on these models using the SimCSE framework, showing superior results on test datasets.

The best performing fine-tuned models SimCSE_smallD and SimCSE_largeD (batch_size:40, max_sequence_length:300) based on contrastive learning, can be found here (with PDF report), in a typical Hugging Face model format. The datasets used for models training/fine tuning lie also on the same folder.

The present data on /csv_files folder about professors personal info (name, rank, APELLA id, email, etc) are already publicly available as raw pdf/xlsx files at the School of Informatics AUTh official website (https://www.csd.auth.gr/).

results comparison

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published