Skip to content
WBWikibenchmodel intelligence
Tab
Account
OverviewModelsBenchmarksProvidersLeaderboardCompare
ArticleEditHistory

SWE-bench Verified

SWE-bench Verified

Category
coding
Score unit
%
Higher is better
yes

Real-world GitHub issue resolution.

Leaderboard

#ModelProvider%EvaluatedSource
1Claude Opus 4.7Anthropic74.5%—
2Claude 3.5 SonnetAnthropic49.0%—

+ Add result

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.