Release OpenCompass v0.2.5 · open-compass/opencompass

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!

🌟 Highlights

Simplify the huggingface / vllm / lmdeploy model wrapper. meta_template is no longer needed to be hand-crafted in model configs
Introduce evaluation results README in ~20 dataset config folders.

🚀 New Features

#1065 Add LLaMA-3 Series Configs
#1048 Add TheoremQA with 5-shot
#1094 Support Math evaluation via judgemodel
#1080 Add gpqa prompt from simple_evals, openai
#1074 Add mmlu prompt from simple_evals, openai
#1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs

📖 Documentation

#1053 Update readme
#1102 Update NeedleInAHaystack Docs
#1110 Update README.md
#1205 Remove --no-batch-padding and Use --hf-num-gpus

🐛 Bug Fixes

#1036 Update setup.py install_requires
#1051 Fixed the issue caused
#1043 fix multiround
#1070 Fix sequential runner
#1079 Fix Llama-3 meta template

⚙ Enhancements and Refactors

#1163 enable HuggingFacewithChatTemplate with --accelerator via cli
#1104 fix prompt template
#1109 Update performance of common benchmarks

🎉 Welcome New Contributors

@liuwei130, @IcyFeather233, @VVVenus1212, @binary-husky, @dmitrysarov, @eltociear, @acylam, @lfy79001, @JuhaoLiang1997, @yaoyingyy, and @jxd0712 made their first contributions. Welcome to the OpenCompass community!

🔗 Full Change Logs

[Fix] Update setup.py install_requires by @Leymore in #1036
add ChemBench by @liuwei130 in #1032
[Fix] logger.error -> logger.debug in OpenAI by @Leymore in #1050
[Sync] Bump version to 0.2.4 by @Leymore in #1052
[Doc] Update readme by @tonysy in #1053
[fix]Fixed the issue caused by the repeated loading of VLLM model dur… by @IcyFeather233 in #1051
[Sync] Sync with internal code 2024.04.19 by @Leymore in #1064
[Fix] fix multiround by @bittersweet1999 in #1043
[Feature] Add LLaMA-3 Series Configs by @Leymore in #1065
[Feature] Add TheoremQA with 5-shot by @Leymore in #1048
[Fix] Fix sequential runner by @Leymore in #1070
Add lmdeploy tis python backend model by @ispobock in #1014
Fix Llama-3 meta template by @liushz in #1079
Add humaneval prompt from simple_evals, openai by @jingmingzhuo in #1076
[Feature] Support Math evaluation via judgemodel by @bittersweet1999 in #1094
[Feature] support arenahard evaluation by @bittersweet1999 in #1096
Update CIBench by @kleinzcy in #1089
[Feature] Add gpqa prompt from simple_evals, openai by @Francis-llgg in #1080
[Deperecate] Remove multi-modal related stuff by @kennymckormick in #1072
add vllm get_ppl by @VVVenus1212 in #1003
fix: python path bug by @binary-husky in #1063
fix output typing, change mutable list to immutable tuple by @dmitrysarov in #989
[Doc] Update NeedleInAHaystack Docs by @DseidLi in #1102
[Feature] add support for Flames datasets by @Yggdrasill7D6 in #1093
adapt to lmdeploy v0.4.0 by @lvhan028 in #1073
[Fix] fix prompt template by @bittersweet1999 in #1104
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README by @liushz in #1103
[Update] Update performance of common benchmarks by @tonysy in #1109
[Fix] fix cmb dataset by @bittersweet1999 in #1106
[Docs] Update README.md by @eltociear in #1110
[Feature] Adding support for LLM Compression Evaluation by @acylam in #1108
[Fix] remove redundant pre-commit check by @Leymore in #891
fix LightllmApi workers bug by @helloyongyang in #1113
[Feature] Add mmlu prompt from simple_evals, openai by @Leymore in #1074
[Feature] update drop dataset from openai simple eval by @kleinzcy in #1092
add mgsm datasets by @Yggdrasill7D6 in #1081
[Fix] Fix AGIEval chinese sets by @xu-song in #972
S3Eval Dataset by @lfy79001 in #916
[Feature] Add AceGPT-MMLUArabic benchmark by @JuhaoLiang1997 in #1099
[Fix] fix links by @bittersweet1999 in #1120
[Fix] Fix NeedleBench Summarizer Typo by @DseidLi in #1125
[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs by @acylam in #1123
[Sync] Update accelerator by @Leymore in #1122
[Fix] fix alpacaeval while add caching path by @bittersweet1999 in #1139
[Fix] fix multiround by @bittersweet1999 in #1146
[Fix] Fix Needlebench Summarizer by @DseidLi in #1143
[Feature] Add huggingface apply_chat_template by @Leymore in #1098
[Feat] Support dataset_suffix check for mixed configs by @xu-song in #973
[Format] Add some config lints by @Leymore in #892
[Sync] Sync with internal codes 2024.05.14 by @Leymore in #1156
[Fix] fix arenahard summarizer by @bittersweet1999 in #1154
[Fix] use ProcessPoolExecutor during mbpp eval by @Leymore in #1159
[Fix] Update stop_words in huggingface_above_v4_33 by @Leymore in #1160
Update accelerator by @liushz in #1152
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli by @Leymore in #1163
update test workflow by @zhulinJulia24 in #1167
[Sync] Sync with internal codes 2024.05.17 by @Leymore in #1171
add dependency in daily test workflow by @zhulinJulia24 in #1173
[Sync] Sync with internal codes 2024.05.21.1 by @Leymore in #1175
Update MathBench by @liushz in #1176
[Fix] fix template by @bittersweet1999 in #1178
Fix a bug in drop_gen.py by @kleinzcy in #1191
[Fix] temporary files using tempfile by @yaoyingyy in #1186
[Fix] add support for lmdeploy api judge by @bittersweet1999 in #1193
[Fix] fix length by @bittersweet1999 in #1180
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks by @jxd0712 in #1190
[Feat] Update charm summary by @Leymore in #1194
Update accelerator by @liushz in #1195
[Sync] Sync with internal codes 2024.05.28 by @Leymore in #1204
Fix VLLM argument error by @xu-song in #1207
[Docs] Remove --no-batch-padding and Use --hf-num-gpus by @Leymore in #1205
[Fix] Rollback opt model configs by @Leymore in #1213
Update running command readme by @Leymore in #1206
[Sync] Sync with internal code 2024.05.30 by @Leymore in #1214

Full Changelog: 0.2.4...0.2.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCompass v0.2.5

🌟 Highlights

🚀 New Features

📖 Documentation

🐛 Bug Fixes

⚙ Enhancements and Refactors

🎉 Welcome New Contributors

🔗 Full Change Logs

Contributors