Skip to content
WBWikibenchmodel intelligence
Tab
Account
OverviewModelsBenchmarksProvidersLeaderboardCompare
ArticleEditHistory

MMLU

MMLU

Category
reasoning
Score unit
%
Higher is better
yes
Website
official ↗

Massive Multitask Language Understanding — 57 academic subjects.

Leaderboard

#ModelProvider%EvaluatedSource
1GPT-4oOpenAI88.7%—
2Claude 3.5 SonnetAnthropic88.7%—
3Gemini 2.0 FlashGoogle DeepMind87.0%—
4Llama 3.3 70BMeta86.0%—
5Mistral Large 2Mistral AI84.0%—

+ Add result

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.