openbench is an open-source, provider-agnostic evaluation infrastructure designed to run standardized, reproducible benchmarks on large language models (LLMs), enabling fair comparison across different model providers. It bundles dozens of evaluation suites — covering knowledge, reasoning, math, code, science, reading comprehension, long-context recall, graph reasoning, and more — so users don’t need to assemble disparate datasets themselves. With a simple CLI interface (e.g. bench eval <benchmark> --model <model-id>), you can quickly evaluate any model supported by Groq or other providers (OpenAI, Anthropic, HuggingFace, local models, etc.). openbench also supports private/local evaluations: you can integrate your own custom benchmarks or data (e.g. internal test suites, domain-specific tasks) to evaluate models in a privacy-preserving way.

Features

  • 30+ built-in benchmark suites spanning knowledge, math, reasoning, code, science, graph tasks and more
  • Provider-agnostic: works with many LLM providers including Groq, OpenAI, Anthropic, HuggingFace, local models, and others
  • Simple CLI commands for listing, describing, and evaluating benchmarks (bench list, bench describe, bench eval, etc.)
  • Support for custom/local benchmarks so users can evaluate domain-specific tasks privately
  • Consistent scoring and result logging for reproducible, comparable evaluation outcomes
  • Extensible architecture that simplifies adding new benchmarks or evaluation metrics

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow openbench

openbench Web Site

Other Useful Business Software
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight Icon
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight

Lock Down Any Resource, Anywhere, Anytime

CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of openbench!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2025-12-04