OpenCompass v0.2.5
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!
🌟 Highlights
- Simplify the huggingface / vllm / lmdeploy model wrapper.
meta_template
is no longer needed to be hand-crafted in model configs - Introduce evaluation results README in ~20 dataset config folders.
🚀 New Features
- #1065 Add LLaMA-3 Series Configs
- #1048 Add TheoremQA with 5-shot
- #1094 Support Math evaluation via judgemodel
- #1080 Add gpqa prompt from simple_evals, openai
- #1074 Add mmlu prompt from simple_evals, openai
- #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs
📖 Documentation
- #1053 Update readme
- #1102 Update NeedleInAHaystack Docs
- #1110 Update README.md
- #1205 Remove --no-batch-padding and Use --hf-num-gpus
🐛 Bug Fixes
- #1036 Update setup.py install_requires
- #1051 Fixed the issue caused
- #1043 fix multiround
- #1070 Fix sequential runner
- #1079 Fix Llama-3 meta template
⚙ Enhancements and Refactors
- #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
- #1104 fix prompt template
- #1109 Update performance of common benchmarks
🎉 Welcome New Contributors
- @liuwei130, @IcyFeather233, @VVVenus1212, @binary-husky, @dmitrysarov, @eltociear, @acylam, @lfy79001, @JuhaoLiang1997, @yaoyingyy, and @jxd0712 made their first contributions. Welcome to the OpenCompass community!
🔗 Full Change Logs
- [Fix] Update setup.py install_requires by @Leymore in #1036
- add ChemBench by @liuwei130 in #1032
- [Fix] logger.error -> logger.debug in OpenAI by @Leymore in #1050
- [Sync] Bump version to 0.2.4 by @Leymore in #1052
- [Doc] Update readme by @tonysy in #1053
- [fix]Fixed the issue caused by the repeated loading of VLLM model dur… by @IcyFeather233 in #1051
- [Sync] Sync with internal code 2024.04.19 by @Leymore in #1064
- [Fix] fix multiround by @bittersweet1999 in #1043
- [Feature] Add LLaMA-3 Series Configs by @Leymore in #1065
- [Feature] Add TheoremQA with 5-shot by @Leymore in #1048
- [Fix] Fix sequential runner by @Leymore in #1070
- Add lmdeploy tis python backend model by @ispobock in #1014
- Fix Llama-3 meta template by @liushz in #1079
- Add humaneval prompt from simple_evals, openai by @jingmingzhuo in #1076
- [Feature] Support Math evaluation via judgemodel by @bittersweet1999 in #1094
- [Feature] support arenahard evaluation by @bittersweet1999 in #1096
- Update CIBench by @kleinzcy in #1089
- [Feature] Add gpqa prompt from simple_evals, openai by @Francis-llgg in #1080
- [Deperecate] Remove multi-modal related stuff by @kennymckormick in #1072
- add vllm get_ppl by @VVVenus1212 in #1003
- fix: python path bug by @binary-husky in #1063
- fix output typing, change mutable list to immutable tuple by @dmitrysarov in #989
- [Doc] Update NeedleInAHaystack Docs by @DseidLi in #1102
- [Feature] add support for Flames datasets by @Yggdrasill7D6 in #1093
- adapt to lmdeploy v0.4.0 by @lvhan028 in #1073
- [Fix] fix prompt template by @bittersweet1999 in #1104
- [Fix] Fix Math Evaluation with Judge Model Evaluator & Add README by @liushz in #1103
- [Update] Update performance of common benchmarks by @tonysy in #1109
- [Fix] fix cmb dataset by @bittersweet1999 in #1106
- [Docs] Update README.md by @eltociear in #1110
- [Feature] Adding support for LLM Compression Evaluation by @acylam in #1108
- [Fix] remove redundant pre-commit check by @Leymore in #891
- fix LightllmApi workers bug by @helloyongyang in #1113
- [Feature] Add mmlu prompt from simple_evals, openai by @Leymore in #1074
- [Feature] update drop dataset from openai simple eval by @kleinzcy in #1092
- add mgsm datasets by @Yggdrasill7D6 in #1081
- [Fix] Fix AGIEval chinese sets by @xu-song in #972
- S3Eval Dataset by @lfy79001 in #916
- [Feature] Add AceGPT-MMLUArabic benchmark by @JuhaoLiang1997 in #1099
- [Fix] fix links by @bittersweet1999 in #1120
- [Fix] Fix NeedleBench Summarizer Typo by @DseidLi in #1125
- [Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs by @acylam in #1123
- [Sync] Update accelerator by @Leymore in #1122
- [Fix] fix alpacaeval while add caching path by @bittersweet1999 in #1139
- [Fix] fix multiround by @bittersweet1999 in #1146
- [Fix] Fix Needlebench Summarizer by @DseidLi in #1143
- [Feature] Add huggingface apply_chat_template by @Leymore in #1098
- [Feat] Support dataset_suffix check for mixed configs by @xu-song in #973
- [Format] Add some config lints by @Leymore in #892
- [Sync] Sync with internal codes 2024.05.14 by @Leymore in #1156
- [Fix] fix arenahard summarizer by @bittersweet1999 in #1154
- [Fix] use ProcessPoolExecutor during mbpp eval by @Leymore in #1159
- [Fix] Update stop_words in huggingface_above_v4_33 by @Leymore in #1160
- Update accelerator by @liushz in #1152
- [Feat] enable HuggingFacewithChatTemplate with --accelerator via cli by @Leymore in #1163
- update test workflow by @zhulinJulia24 in #1167
- [Sync] Sync with internal codes 2024.05.17 by @Leymore in #1171
- add dependency in daily test workflow by @zhulinJulia24 in #1173
- [Sync] Sync with internal codes 2024.05.21.1 by @Leymore in #1175
- Update MathBench by @liushz in #1176
- [Fix] fix template by @bittersweet1999 in #1178
- Fix a bug in drop_gen.py by @kleinzcy in #1191
- [Fix] temporary files using tempfile by @yaoyingyy in #1186
- [Fix] add support for lmdeploy api judge by @bittersweet1999 in #1193
- [Fix] fix length by @bittersweet1999 in #1180
- support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks by @jxd0712 in #1190
- [Feat] Update charm summary by @Leymore in #1194
- Update accelerator by @liushz in #1195
- [Sync] Sync with internal codes 2024.05.28 by @Leymore in #1204
- Fix VLLM argument error by @xu-song in #1207
- [Docs] Remove --no-batch-padding and Use --hf-num-gpus by @Leymore in #1205
- [Fix] Rollback opt model configs by @Leymore in #1213
- Update running command readme by @Leymore in #1206
- [Sync] Sync with internal code 2024.05.30 by @Leymore in #1214
Full Changelog: 0.2.4...0.2.5