Leaderboards
Compare models8 leaderboards5 categories27 ranked entries
coding
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.7 | 95.0% |
| 2 | o1 | 92.4% |
| 3 | Claude 3.5 Sonnet | 92.0% |
| 4 | Mistral Large 2 | 92.0% |
| 5 | GPT-4o | 90.2% |
| 6 | Llama 3.3 70B | 88.4% |
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.7 | 74.5% |
| 2 | Claude 3.5 Sonnet | 49.0% |
general
| # | Model | Score |
|---|---|---|
| 1 | Gemini 2.0 Flash | 1356 |
| 2 | GPT-4o | 1287 |
| 3 | Claude 3.5 Sonnet | 1271 |
math
| # | Model | Score |
|---|---|---|
| 1 | o1 | 83.3% |
| # | Model | Score |
|---|---|---|
| 1 | o1 | 94.8% |
| 2 | Llama 3.3 70B | 77.0% |
| 3 | Mistral Large 2 | 76.9% |
| 4 | GPT-4o | 76.6% |
multimodal
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.7 | 76.1% |
| 2 | Gemini 2.0 Flash | 71.7% |
| 3 | Claude 3.5 Sonnet | 70.4% |
| 4 | GPT-4o | 69.1% |
reasoning
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.7 | 83.3% |
| 2 | o1 | 78.0% |
| # | Model | Score |
|---|---|---|
| 1 | GPT-4o | 88.7% |
| 2 | Claude 3.5 Sonnet | 88.7% |
| 3 | Gemini 2.0 Flash | 87.0% |
| 4 | Llama 3.3 70B | 86.0% |
| 5 | Mistral Large 2 | 84.0% |