The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
0.5.0 source code.tar.gz	2025-09-01	1.8 MB	0
0.5.0 source code.zip	2025-09-01	3.9 MB	0
README.md	2025-09-01	4.2 kB	0
Totals: 3 Items		5.8 MB	0

OpenCompass v0.5.0 Release Notes

🌟 Highlights

✨ Comprehensive Scientific Benchmarks: Integrated 10+ specialized datasets (MedXpertQA, ClimaQA, SmolInstruct, etc.), covering multiple scientific fields such as chemistry, physics, biology, and earth sciences ✨ Cascade Evaluator: Supported cascading eval methods from rules to LLM judgments. ✨ New Runner: Supported for Rjob Runner has now been completed. ✨ OpenAISDK Streaming: Provided a more stable OpenAI API method. ✨ New Evaluation Examples: Published the real-time evaluation config of CompassAcademic Leaderboard and the Intern-S1 related benchmark evaluation config.

🚀 New Features

🔧 Cascade Evaluator (#1992) 🔧 Rjob Runner (#2144) 🔧 OpenAISDK Streaming (#2208) 🔧 Evaluation Example for CompassAcademic Leaderboard. (#2202) 🔧 Evaluation Example for Intern-S1 and Scientific Benchmarks. (#2220) 🔧 So Many New Scientific Datasets!

MedXpertQA for expert-level medical knowledge evaluation (#2002)
ClimaQA for climate question evaluation (#2017)
HealthBench for better measuring capabilities of AI systems for health (#2099)
ProteinLMBench for protein related tasks (#2064) ...

📖 Documentation

📝 Fixed 404 links between Chinese/English docs (#2001)
📝 Added CompassAcademic Leaderboard task tutorial (#2202) 📝 Added Intern-S1 evaluation task tutorial (#2220) 📝 Fixed format problems of the dataset statistics page (#2170) 📝 Align NIAH CLI command guide to the actual CLI argument parser (#2194) 📝 Set correct paths for the examples (#2198)

🐛Bug Fixes

🔧 Fixed compare error base_evaluator (#2010) 🔧 Fixed OpenICL Math Evaluator Config (#2007) 🔧 Added Error Case for content filter (#2167) 🔧 Fix the OpenAI SDK to adapt to gpt-5 (#2236) 🔧 Fixed dataset repeat by concatenating (#2039) 🔧 Concat OpenaiSDK reasoning content (#2041)

⚙ Enhancements and Refactors###

⚙ Infrastructure Refactors:

Set dump-eval-details as default behavior (#1999)
Refactorized openicl eval task (#1990)
Added openai_extra_kwargs for API customization (#2210)

⚙ CI/CD Improvements:

Fixed baseline score (#2000)
Updated baseline for kernal change of vllm and lmdeploy (#2011)
Updated baseline and fix lmdeploy version (#2098)
Added check rule (#2101)
Updated testcases' baseline (#2184) ...

🎉 Welcome New Contributors

A warm welcome to our newest contributors:

@Yejin0111 for MedXpertQA clinical dataset (#2002)
@smgjch for matbench development (#2021)
@taolinzhang for rewardbench dataset (#2029)
@xiexinch for fixing lawbench evaluation (#2037)
@xuxuxuxuxuxjh for ClinicBench, PubMedQA and ScienceQA datasets (#2061)
@mar-cry for NEJM AI benchmark (#2063)
@bio-mlhui for CARDBiomedBench dataset (#2071)
@tchenglv520 for Lifescience subset support for MMLU & SciEval (#2059)
@Flaick for MMLU Pro Biomedical version support (#2081)
@yuehua-s for o4-mini model (#2083)
@kkscilife for adding CI check rule (#2101)
@yusun-nlp for SmolInstruct dataset (#2127)
@soki123 for SRbench dataset (#2105)
@suencgo for PHYBench dataset (#2125)
@Zhouzone for updating Earth Silver benchmark (#2140)
@uyzhang for R-Bench dataset (ICML 2025) (#2091)
@f14-bertolotti for stabilizing MBPP evaluation (#2111)
@fly2tomato for debugging Rjob runner (#2171)
@debuggingworld for fixing Qwen3 model config field error (#2152)
@blueternalness for aligning NIAH CLI command guide (#2194)
@KADCA21 for BlueLM-2.5 API (#2193)
@dbinthesky for KCLE feature (#2224)
@FarongWen for EESE dataset and configs (#2223)
@mazihan880 for CodeCompass dataset and configs (#2214)

Full Changelog: https://github.com/open-compass/opencompass/compare/0.4.2...0.5.0

Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! 🌟

Source: README.md, updated 2025-09-01

OpenCompass Files

OpenCompass is an LLM evaluation platform

OpenCompass v0.5.0 Release Notes

🌟 Highlights

🚀 New Features

📖 Documentation

🐛Bug Fixes

⚙ Enhancements and Refactors###

🎉 Welcome New Contributors

OpenCompass Files

OpenCompass is an LLM evaluation platform

Get an email when there's a new version of OpenCompass

OpenCompass v0.5.0 Release Notes

🌟 Highlights

🚀 New Features

📖 Documentation

🐛Bug Fixes

⚙ Enhancements and Refactors###

🎉 Welcome New Contributors