Skip to content

WBWikibenchmodel intelligence

Search models and benchmarksTab

Overview Models Benchmarks Providers Leaderboard Compare

Article Edit History

GPQA Diamond

GPQA Diamond

Category: reasoning
Score unit: %
Higher is better: yes

Graduate-level science questions, expert-curated.

Leaderboard

#	Model	Provider	%	Evaluated	Source
1	Claude Opus 4.7	Anthropic	83.3%	—
2	o1	OpenAI	78.0%	—

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.