Skip to content

WBWikibenchmodel intelligence

Search models and benchmarksTab

Overview Models Benchmarks Providers Leaderboard Compare

Article Edit History

MATH

MATH

Category: math
Score unit: %
Higher is better: yes

Competition mathematics problems.

Leaderboard

#	Model	Provider	%	Evaluated	Source
1	o1	OpenAI	94.8%	—
2	Llama 3.3 70B	Meta	77.0%	—
3	Mistral Large 2	Mistral AI	76.9%	—
4	GPT-4o	OpenAI	76.6%	—

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.