MMLU
Massive Multitask Language Understanding — 57 academic subjects.
Leaderboard
| # | Model | Provider | % | Evaluated | Source |
|---|---|---|---|---|---|
| 1 | GPT-4o | OpenAI | 88.7% | — | |
| 2 | Claude 3.5 Sonnet | Anthropic | 88.7% | — | |
| 3 | Gemini 2.0 Flash | Google DeepMind | 87.0% | — | |
| 4 | Llama 3.3 70B | Meta | 86.0% | — | |
| 5 | Mistral Large 2 | Mistral AI | 84.0% | — |