OpenCompass v0.2.2
Welcome to OpenCompass v0.2.2, a release brimming with new features, essential fixes, and significant improvements across the board. With a focus on enhancing functionality and expanding dataset support, this update underscores our commitment to providing a robust platform for our users.
🌟 Highlights:
- Broadened Dataset Support: Introduction of diverse datasets like
T-Eval
,CIBench
,IFEval
, andNPHardEval
, and more, broadening the horizons for research and evaluation. - API Integrations and Updates: New support for APIs like Nanbeige and updates to existing ones such as Zhipu and Sensetime, enhancing model interaction capabilities.
- Dataset Collection Release: Integrated dataset collection is availabe in 0.2.2.rc1. Dataset used in OpenCompass 2.0 leaderboard is NOT included in this collection.
Dive into what's new and improved:
🌟 New Features:
-
📦 Datasets Expansion:
-
🛠 API and Model Enhancements:
-
📖 Documentation and CI Enhancements:
🐛 Bug Fixes:
- Various fixes have been applied to address issues across datasets, evaluators, and configurations, ensuring a smoother experience for all users (#787, #788, #789).
🎉 Welcome New Contributors:
- We're excited to welcome our new contributors: @notoschord, @zhulinJulia24, @QipengGuo, @RangiLyu, @del-zhenwu, and @hailsham. Thank you for your valuable contributions!
🔗 Full Changelog
- Dev by @xmshi-trio in #779
- [Fix] add temperature in alles by @bittersweet1999 in #787
- [Feature] Add support of Nanbeige API by @notoschord in #786
- [Fix] Update gsm8k agent prompt by @tonysy in #788
- [Fix] hot fix for requirements by @yingfhu in #789
- [Feature] Add configs for creationbench by @bittersweet1999 in #791
- Add test runner, one case, daily and pr trigger by @zhulinJulia24 in #751
- [Fix] reorganize subject files by @bittersweet1999 in #801
- Update evaluate turbomind by @RunningLeon in #804
- Added support for multi-needle testing in needle-in-a-haystack test by @DseidLi in #802
- [Sync] Add InternLM2 Keyset Evaluation Demo by @Leymore in #807
- [Doc] Update news by @Leymore in #810
- Fix turbomind and update docs by @RunningLeon in #808
- fix configs template for yi_6b_200k model by @DseidLi in #815
- Test runner update - split step, change schedule time and disable hf cache by @zhulinJulia24 in #814
- Add LightllmApi KeyError log & Update doc by @helloyongyang in #816
- Update cdme config and evaluator by @QipengGuo in #812
- Update hf_internlm2_chat template by @RangiLyu in #823
- [Feature] add Compass arena by @bittersweet1999 in #828
- [Fix] fix strings by @bittersweet1999 in #833
- [Feature] Add IFEval by @jingmingzhuo in #813
- [Feature] add mtbench by @bittersweet1999 in #829
- [Feature] Update API implementation by @tonysy in #834
- [Doc] Update FAQ & Contribution Guide by @Leymore in #830
- add fail notify by @zhulinJulia24 in #836
- [Sync] Updata dataset cfg for InternMath by @Leymore in #837
- [Fix] fix corev2 by @bittersweet1999 in #838
- [Feat] minor update agent related by @yingfhu in #839
- [Update] Update Sensetime API by @tonysy in #844
- [Fix] Update MedBench by @xmshi-trio in #845
- [Fix] Fix acc of IFEval by @jingmingzhuo in #849
- [Fix] Update Zhipu API and Fix issue min_out_len issue of API models by @tonysy in #847
- Create link-check.yml by @del-zhenwu in #853
- Update runtime.txt to fix rouge_chinese bugs. by @QipengGuo in #803
- [Fix] fix compass arena by @bittersweet1999 in #854
- add end_str for turbomind by @RunningLeon in #859
- add daily test case by @zhulinJulia24 in #864
- [Feature] support alpacaeval by @bittersweet1999 in #809
- [Fix] Fix error in gsm8k evaluator by @yanyc428 in #782
- [CI] Update github workflow image by @Leymore in #874
- Update daily test by @zhulinJulia24 in #871
- support NPHardEval by @Skyfall-xzz in #835
- [Fix] add do sample demo for subjective dataset by @bittersweet1999 in #873
- [Sync] Sync with internal codes 2024.02.05 by @Leymore in #876
- [Fix] hotfix for mtbench by @bittersweet1999 in #877
- fix lawbench 2-1 f0.5 score calculation bug by @Yggdrasill7D6 in #795
- [feat] support multipl-e by @Connor-Shen in #846
- fix bug of gsm8k_postprocess by @hailsham in #863
- [Feature] add global retriever config by @hailsham in #842
For a full list of updates, visit our Full Changelog.
Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉
Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.