Release v0.5.0 · stanford-crfm/helm

The --run-specs flag was renamed to --run-entries (#2404)
The run_specs*.conf files were renamed to run_entries*.conf (#2430)
The model_metadata field was removed from schema*.yaml files (#2195)
The helm.proxy.clients package was moved to helm.clients (#2413)
The helm.proxy.tokenizers package was moved to helm.tokenizers (#2403)
The frontend only supports JSON output produced by helm-summarize at version 0.3.0 or newer (#2455)
The Sequence class was renamed to GeneratedOutput (#2551)
The black linter was upgraded from 22.10.0 to 24.3.0, which produces different output - run pip install --upgrade black==24.3.0 to upgrade this dependency (#2545)
The anthropic dependency was upgraded from anthropic~=0.2.5 to anthropic~=0.17 - run pip install --upgrade anthropic~=0.17 to upgrade this dependency (#2432)
The openai dependency was upgraded from openai~=0.27.8 to openai~=1.0- run pip install --upgrade openai~=1.0 to upgrade this dependency (#2384)
- The SQLite cache is not compatible across this dependency upgrade - if you encounter an ModuleNotFoundError: No module named 'openai.openai_object' error after upgrading openai, you will have to delete your old OpenAI SQLite cache (e.g. by running rm prod_env/cache/openai.sqlite)

Added OpenAI gpt-3.5-turbo-1106, gpt-3.5-turbo-0125, gpt-4-vision-preview, gpt-4-0125-preview, and gpt-3.5-turbo-instruct (#2189, #2295, #2376, #2400)
Added Google Gemini 1.0, Gemini 1.5, and Gemini Vision (#2186, #2189, #2561)
Improved handling of content blocking in the Vertex AI client (#2546, #2313)
Added Claude 3 (#2432, #2440, #2536)
Added Mistral Small, Medium and Large (#2307, #2333, #2399)
Added Mixtral 8x7b Instruct and 8x22B (#2416, #2562)
Added Luminous Multimodal (#2189)
Added Llava and BakLava (#2234)
Added Phi-2 (#2338)
Added Qwen1.5 (#2338, #2369)
Added Qwen VL and VL Chat (#2428)
Added Amazon Titan (#2165)
Added Google Gemma (#2397)
Added OpenFlamingo (#2237)
Removed logprobs from models hosted on Together (#2325)
Added support for vLLM (#2402)
Added DeepSeek LLM 67B Chat (#2563)
Added Llama 3 (#2579)
Added DBRX Instruct (#2585)

Added support for text-to-image models (#1939)
Refactored of Metric class structure (#2170, #2171, #2218)
Fixed bug in computing general metrics (#2172)
Added a --disable-cache flag to disable caching in helm-run (#2143)
Added a --schema-path flag to support user-provided schema.yaml files in helm-summarize (#2520)

Switched to the new React frontend for local development by default (#2251)
Added support for displaying images (#2371)
Made various improvements to project and version dropdown menus (#2272, #2401, #2458)
Made row and column headers sticky in leaderboard tables (#2273, #2275)

Lite v1.1.0
- Added results for Phi-2 and Mistral Medium
Lite v1.2.0
- Added results for Llama 3, Mixtral 8x22B, OLMo, Qwen1.5, and Gemma
HEIM v1.1.0
- Added results for Adobe GigaGAN and DeepFloyd IF
Instruct v1.0.0
- Initial release with results for OpenAI GPT-4, OpenAI GPT-3.5 Turbo, Anthropic Claude v1.3, Cohere Command beta
MMLU v1.0.0
- Initial release with 22 models
MMLU v1.1.0
- Added results for Llama 3, Mixtral 8x22B, OLMo, and Qwen1.5 (32B)

Thank you to the following contributors for your work on this HELM release!

Provide feedback