Lingo Blend

This repository contains Python implementations for language classification and English translation tasks, specifically designed for Bengali, Hindi, Punjabi, Tamil, and Telugu languages.

Overview

Framework used: PyTorch, HuggingFace Transformers
Languages: Bengali, Hindi, Punjabi, Tamil, Telugu

Language Classification

Our language classification process utilizes the MuRIL (Multilingual Representations for Indian Languages) model. MuRIL is a state-of-the-art multilingual representation model developed specifically for Indian languages. The text is processed through the MuRIL to obtain embeddings which are passed through a dense layer for probability prediction. MuRIL is designed to handle code-mixed text effectively, making it ideal for processing Romanized code-mixed sentences.

English Translation

Our process involves identifying the language, transliterating the text to the native script, and then translating it into English. We employ the MuRIL to identify the language of the given text as described in the language classification task. Once the language is identified, we transliterate the text into its native script using the Python-based transliteration tool IndicXlit, a multilingual transliteration model developed by AI4Bharat. After transliteration, we translate the text into English using a distilled version of the NLLB-200 (No Language Left Behind) model, primarily intended for research in machine translation, especially for low-resource languages.

Acknowledgments

The MuRIL model: MuRIL model card
IndicXlit: IndicXlit GitHub Repository
The NLLB-200 model: NLLB-200 model card

Feel free to contribute to the development and improvement of this project! Your contributions are valuable in advancing multilingual text processing techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
language_identification.ipynb		language_identification.ipynb
translation.ipynb		translation.ipynb
transliteration.ipynb		transliteration.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lingo Blend

Overview

Language Classification

English Translation

Acknowledgments

About

Releases

Packages

Languages

License

Atul-AI08/Lingo-Blend

Folders and files

Latest commit

History

Repository files navigation

Lingo Blend

Overview

Language Classification

English Translation

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages