Skip to content

manishiitg/IndicLMJudge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inspired by paper https://arxiv.org/abs/2306.05685

Using Qwen-1.5-72B (through vLLM) as LLM judge to evaluate models on indic language hindi/english/hinglish

Evaluation Dataset : https://huggingface.co/datasets/manishiitg/human_eval

Final Results are published at https://huggingface.co/datasets/manishiitg/llm_judge

To evaluate your own model add it add https://github.com/manishiitg/IndicLMJudge/blob/main/scripts/indic_eval/common_vars.sh and simply bash scripts/lmjudge.sh

LLM Judge Language: hi

Model Language Score No# Questions
Qwen/Qwen1.5-72B-Chat-AWQ hi 8.3722 562
Qwen/Qwen1.5-14B-Chat hi 8.2561 561
google/gemma-7b-it hi 7.8930 561
Qwen/Qwen1.5-7B-Chat hi 7.8518 562
manishiitg/open-aditi-hi-v3 hi 7.7464 562
manishiitg/open-aditi-hi-v4 hi 7.5537 562
manishiitg/open-aditi-hi-v2 hi 7.2536 562
teknium/OpenHermes-2.5-Mistral-7B hi 7.2240 562
ai4bharat/Airavata hi 6.9355 550
01-ai/Yi-34B-Chat hi 6.5692 562
manishiitg/open-aditi-hi-v1 hi 4.6521 562
sarvamai/OpenHathi-7B-Hi-v0.1-Base hi 4.2417 606
Qwen/Qwen1.5-4B-Chat hi 4.0970 562

LLM Judge Language: en

Model Language Score No# Questions
Qwen/Qwen1.5-14B-Chat en 9.1956 362
Qwen/Qwen1.5-72B-Chat-AWQ en 9.1577 362
Qwen/Qwen1.5-7B-Chat en 9.1503 362
01-ai/Yi-34B-Chat en 9.1373 362
mistralai/Mixtral-8x7B-Instruct-v0.1 en 9.1340 362
teknium/OpenHermes-2.5-Mistral-7B en 9.0006 362
manishiitg/open-aditi-hi-v3 en 8.9069 362
manishiitg/open-aditi-hi-v4 en 8.9064 362
google/gemma-7b-it en 8.7945 362
Qwen/Qwen1.5-4B-Chat en 8.7224 362
manishiitg/open-aditi-hi-v2 en 8.4343 362
ai4bharat/Airavata en 7.3923 362
manishiitg/open-aditi-hi-v1 en 6.6413 361
sarvamai/OpenHathi-7B-Hi-v0.1-Base en 5.9009 318

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published