Resume Categorization

This is a script which can categorize resumes into 24 different classes.

The containing classes are : HR, DESIGNER, INFORMATION-TECHNOLOGY, TEACHER, ADVOCATE, BUSINESS-DEVELOPMENT, HEALTHCARE, FITNESS, AGRICULTURE, BPO, SALES, CONSULTANT, DIGITAL-MEDIA, AUTOMOBILE, CHEF, FINANCE, APPAREL, ENGINEERING, ACCOUNTANT, CONSTRUCTION, PUBLIC-RELATIONS, BANKING, ARTS, AVIATION.

The output will be saved into a different directory named "prediction" . Inside the "prediction" folder, the resumes will be categorized inside the cateogy of each resume.

Dataset used : https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset

Example output folder structure:

prediction/
│
├── HR/
│ ├── hr_resume1.pdf
│ ├── hr_resume2.pdf
│ └── ...
│
├── ACCOUNTANT/
│ ├── accountant_resume1.pdf
│ ├── accountant_resume2.pdf
│ └── ...
│
├── TEACHER/
│ ├── teacher_resume1.pdf
│ ├── teacher_resume2.pdf
│ └── ...
│
└── ...

Steps to Run:

Create a virtual envitonment in python.
Then inside the virtual environment clone the github repo.
Use the requirement.txt to install the dependencies.
PLEASE download the model : https://mega.nz/file/Eq0jATbJ#LEmoVJzASIgJ_T88UjRAO9q9H1QK7DzxhPYYYwkWtWA
Put the model in the same directory as scripts.py (Makse sure the name of model is "bert_model.h5")
Run script.py from the command line as intended : python script.py "directory". Make sure you are in the same directory as script.py
resume-categorization (2).ipynb contains the model training and documentation guide.

Creating Virtual Environment:

Go to the directory you want to clone the repo. Open command line on that directory.

pip install venv
python -m venv "name of virtual environment"
git clone https://github.com/abdullahmoosa/resume-categorization-final.git
cd resume-catogirization-final
pip install -r requirements.txt

Running the script:

After installing requirement.txt and putting the bert_model.h5 in the same directory as the script.py,

python script.py path_to_directory_containing_the_resume_pdfs

Here replace 'path_to_directory_containing_the_resume_pdfs' with the actual directory containing the pdfs.

Model Creation and Steps:

First preprocess the texts - remove punctuations, remove stopwords etc.
Tokenize the inputs.
Generate word vectors.
Train various models like - CNN,LSTM,BERT on the input data and evaluate the accuracy.

Important Findings:

BERT performs the best.
The dataset is imbalanced. Therefore the accuracy is not good for some classes.
For further details please review "resume-categorization (2).ipynb" .

Correct Prediction of Model per class :

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
categorized_resumes.csv		categorized_resumes.csv
requirements.txt		requirements.txt
resume-categorization (2).ipynb		resume-categorization (2).ipynb
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Categorization

Steps to Run:

Creating Virtual Environment:

Running the script:

Model Creation and Steps:

Important Findings:

About

Releases

Packages

Languages

abdullahmoosa/resume-categorization-final

Folders and files

Latest commit

History

Repository files navigation

Resume Categorization

Steps to Run:

Creating Virtual Environment:

Running the script:

Model Creation and Steps:

Important Findings:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages