Skip to content
WBWikibenchmodel intelligence
Tab
Account
OverviewModelsBenchmarksProvidersLeaderboardCompare
ArticleEditHistory

MATH

MATH

Category
math
Score unit
%
Higher is better
yes

Competition mathematics problems.

Leaderboard

#ModelProvider%EvaluatedSource
1o1OpenAI94.8%—
2Llama 3.3 70BMeta77.0%—
3Mistral Large 2Mistral AI76.9%—
4GPT-4oOpenAI76.6%—

+ Add result

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.