Skip to content

OpenCompass v0.2.4

Compare
Choose a tag to compare
@liushz liushz released this 09 Apr 10:06
· 288 commits to main since this release
b39f501

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!

🌟 Highlights

  • Enhanced support for multiple datasets including QuALITY, APPS and TACO.
  • Introducing multi-model judging for subjective test.
  • Bug fixes and improvements in configurations and documentation.

🚀 New Features

🌐 General

  1. Feat #963 - Support for APPS dataset.
  2. Feature #976 - Add the implementation of QuALITY datasets.
  3. Feature #984 - Add support for setting prediction paths.
  4. Feature #1006 - Support alpacaeval_v2.
  5. Feature #1016 - Add multi-model judge.
  6. Feature #1019 - Add ATC Choice Version.

📖 Documentation

  1. Updates docs #1015 - General documentation updates and improvements.

🐛 Bug Fixes

  1. Fix #964 - Fix the config's name of deepseek-coder.
  2. Fix #890 - Update links and link checkers.
  3. Fix #977 - Fix a bug in internlm2 series configs.
  4. Fix #975 - Fix documentation issues.
  5. Fix #992 - Fix running issues in turbomind_tis.
  6. Fix #994 - Change status to list in base.py.
  7. Fix #995, Fix #1020 - Quick fixes and refactors for configs.

⚙ Enhancements and Refactors

  1. Modify requirements/runtime.txt #983 - Update numpy version requirement.
  2. Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
  3. Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.

🎉 Welcome New Contributors

🔗 Full Change Logs

[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in #964
[Fix] Update links and link checkers by @Leymore in #890
[Feat] support apps by @Connor-Shen in #963
fix doc problem by @seanzhang-zhichen in #975
[Fix] fix a bug in internlm2 series configs by @jingmingzhuo in #977
[Feature] Add the implement of QuALITY datasets by @jingmingzhuo in #976
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in #983
[Feature] add support for set prediction path by @bittersweet1999 in #984
[Feat] Support TACO by @Connor-Shen in #966
[Feature] update apps by @Connor-Shen in #985
[Fix] update apps/taco by @Connor-Shen in #988
[Feature] add one script for subjective by @bittersweet1999 in #993
Fix running issues in turbomind_tis by @ispobock in #992
[Fix] base.py change status into list by @Chaseldot in #994
[Fix] quick fix for configs by @bittersweet1999 in #995
[Feature] update needlebench and configs by @DseidLi in #986
[Feature] support alpacaeval_v2 by @bittersweet1999 in #1006
updates docs by @Y0oMu in #1015
[Feature] Add multi-model judge and fix some problems by @bittersweet1999 in #1016
[Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in #1020
[Feature] Add ATC Choice Version by @DseidLi in #1019
[Fix] Simplify needlebench summarizer by @DseidLi in #1024

For a detailed overview of all changes, check out our Full Changelog.