Skip to content

WBWikibenchmodel intelligence

Search models and benchmarksTab

Overview Models Benchmarks Providers Leaderboard Compare

Article Edit History

SWE-bench Verified

SWE-bench Verified

Category: coding
Score unit: %
Higher is better: yes

Real-world GitHub issue resolution.

Leaderboard

#	Model	Provider	%	Evaluated	Source
1	Claude Opus 4.7	Anthropic	74.5%	—
2	Claude 3.5 Sonnet	Anthropic	49.0%	—

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.