This project demonstrates how to fine-tune a pre-trained model on a multilingual corpus and evaluate its performance across multiple languages, even on those not included in the fine-tuning process or languages with low resource availability.
In this notebook-based project, we aim to:
- Fine-tune a language model on a specific language within a multilingual corpus.
- Explore the transfer learning effects of this fine-tuning to other languages.
- Measure how well the model generalizes across languages that were not explicitly used during fine-tuning.
The code and implementation are entirely contained within this Jupyter Notebook, making it easy to follow along, understand, and replicate the steps.
We fine-tune the model on one or more languages from the 'Xtreme' dataset 'PanX' subset and evaluate its performance on others.
To run the project:
- Open the notebook in Jupyter or any notebook-compatible environment.
- Follow along with the code, executing the cells step-by-step.
- The notebook will guide you through data preparation, model loading, fine-tuning, and evaluation.
At the end of the notebook, the evaluation section provides insight into how well fine-tuning on a specific language benefits other languages. Detailed results and performance metrics are provided for each evaluated language.
- Natural Language Processing with Transformers by Lewis Tunstall, Leandro von Werra, Thomas Wolf for the Book that guided me through the project