MMLU

Massive Multitask Language Understanding — 57 academic subjects.

Leaderboard

#	Model	Provider	%	Evaluated
1	GPT-4o	OpenAI	88.7%	—
2	Claude 3.5 Sonnet	Anthropic	88.7%	—
3	Gemini 2.0 Flash	Google DeepMind	87.0%	—
4	Llama 3.3 70B	Meta	86.0%	—
5	Mistral Large 2	Mistral AI	84.0%	—