| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 0.5.0 source code.tar.gz | 2025-09-01 | 1.8 MB | |
| 0.5.0 source code.zip | 2025-09-01 | 3.9 MB | |
| README.md | 2025-09-01 | 4.2 kB | |
| Totals: 3 Items | 5.8 MB | 0 | |
OpenCompass v0.5.0 Release Notes
π Highlights
β¨ βComprehensive Scientific Benchmarks: Integrated 10+ specialized datasets (MedXpertQA, ClimaQA, SmolInstruct, etc.), covering multiple scientific fields such as chemistry, physics, biology, and earth sciences β¨ Cascade Evaluator: Supported cascading eval methods from rules to LLM judgments. β¨ New Runner: Supported for Rjob Runner has now been completed. β¨ OpenAISDK Streaming: Provided a more stable OpenAI API method. β¨ New Evaluation Examples: ο»ΏPublished the real-time evaluation config of CompassAcademic Leaderboard and the Intern-S1 related benchmark evaluation config.
π New Features
π§ Cascade Evaluator (#1992) π§ Rjob Runner (#2144) π§ OpenAISDK Streaming (#2208) π§ Evaluation Example for CompassAcademic Leaderboard. (#2202) π§ Evaluation Example for Intern-S1 and Scientific Benchmarks. (#2220) π§ So Many New Scientific Datasets!
- MedXpertQA for expert-level medical knowledge evaluation (#2002)
- ClimaQA for climate question evaluation (#2017)
- HealthBench for better measuring capabilities of AI systems for health (#2099)
- ProteinLMBench for protein related tasks (#2064) ...
π Documentation
π Fixed 404 links between Chinese/English docs (#2001)
π Added CompassAcademic Leaderboard task tutorial (#2202)
π Added Intern-S1 evaluation task tutorial (#2220)
π Fixed format problems of the dataset statistics page (#2170)
π Align NIAH CLI command guide to the actual CLI argument parser (#2194)
π Set correct paths for the examples (#2198)
πBug Fixes
π§ Fixed compare error base_evaluator (#2010) π§ Fixed OpenICL Math Evaluator Config (#2007) π§ Added Error Case for content filter (#2167) π§ Fix the OpenAI SDK to adapt to gpt-5 (#2236) π§ Fixed dataset repeat by concatenating (#2039) π§ Concat OpenaiSDK reasoning content (#2041)
β Enhancements and Refactors###
β Infrastructure Refactors:
- Set dump-eval-details as default behavior (#1999)
- Refactorized openicl eval task (#1990)
- Added openai_extra_kwargs for API customization (#2210)
β CI/CD Improvements:
- Fixed baseline score (#2000)
- Updated baseline for kernal change of vllm and lmdeploy (#2011)
- Updated baseline and fix lmdeploy version (#2098)
- Added check rule (#2101)
- Updated testcases' baseline (#2184) ...
π Welcome New Contributors
A warm welcome to our newest contributors:
- @Yejin0111 for MedXpertQA clinical dataset (#2002)
- @smgjch for matbench development (#2021)
- @taolinzhang for rewardbench dataset (#2029)
- @xiexinch for fixing lawbench evaluation (#2037)
- @xuxuxuxuxuxjh for ClinicBench, PubMedQA and ScienceQA datasets (#2061)
- @mar-cry for NEJM AI benchmark (#2063)
- @bio-mlhui for CARDBiomedBench dataset (#2071)
- @tchenglv520 for Lifescience subset support for MMLU & SciEval (#2059)
- @Flaick for MMLU Pro Biomedical version support (#2081)
- @yuehua-s for o4-mini model (#2083)
- @kkscilife for adding CI check rule (#2101)
- @yusun-nlp for SmolInstruct dataset (#2127)
- @soki123 for SRbench dataset (#2105)
- @suencgo for PHYBench dataset (#2125)
- @Zhouzone for updating Earth Silver benchmark (#2140)
- @uyzhang for R-Bench dataset (ICML 2025) (#2091)
- @f14-bertolotti for stabilizing MBPP evaluation (#2111)
- @fly2tomato for debugging Rjob runner (#2171)
- @debuggingworld for fixing Qwen3 model config field error (#2152)
- @blueternalness for aligning NIAH CLI command guide (#2194)
- @KADCA21 for BlueLM-2.5 API (#2193)
- @dbinthesky for KCLE feature (#2224)
- @FarongWen for EESE dataset and configs (#2223)
- @mazihan880 for CodeCompass dataset and configs (#2214)
Full Changelog: https://github.com/open-compass/opencompass/compare/0.4.2...0.5.0
Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! π