Evaluation harnesses, benchmark suites, and quality measurement tools.
GitHub stars among selected evaluation and benchmark alternatives.
Ranked alternatives
Popularity-ranked entries linked back to graph nodes and deterministic sources.
| Rank | Component | Kind | Openness | Popularity metric | Tasks | Source |
|---|---|---|---|---|---|---|
| 1 | OpenAI Evalssoftware:openai-evals | software | open source | 18,645GitHub stars as of 2026-06-10 | evaluation | source |
| 2 | DeepEvalsoftware:deepeval | software | open source | 16,056GitHub stars as of 2026-06-10 | evaluation | source |
| 3 | Ragassoftware:ragas | software | open source | 14,313GitHub stars as of 2026-06-10 | evaluation, rag | source |
| 4 | LM Evaluation Harnesssoftware:lm-evaluation-harness | software | open source | 12,897GitHub stars as of 2026-06-10 | evaluation | source |
| 5 | Prompt flowsoftware:promptflow | software | open source | 11,144GitHub stars as of 2026-06-10 | evaluation | source |
| 6 | OpenCompasssoftware:opencompass | software | open source | 7,075GitHub stars as of 2026-06-10 | evaluation | source |
| 7 | SWE-benchbenchmark:swe-bench | benchmark | source available | 5,120GitHub stars as of 2026-06-10 | agentic-coding, software-agents | source |
| 8 | MTEBbenchmark:mteb | benchmark | open source | 3,295GitHub stars as of 2026-06-10 | embeddings, semantic-search, rag | source |
| 9 | HELMsoftware:helm | software | open source | 2,819GitHub stars as of 2026-06-10 | evaluation | source |
| 10 | Lightevalsoftware:lighteval | software | open source | 2,439GitHub stars as of 2026-06-10 | evaluation | source |