SWE-bench Verified
Real-world GitHub issue resolution.
Leaderboard
| # | Model | Provider | % | Evaluated | Source |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | 74.5% | — | |
| 2 | Claude 3.5 Sonnet | Anthropic | 49.0% | — |
Real-world GitHub issue resolution.
| # | Model | Provider | % | Evaluated | Source |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | 74.5% | — | |
| 2 | Claude 3.5 Sonnet | Anthropic | 49.0% | — |