Skip to content
WBWikibenchmodel intelligence
Tab
Account
OverviewModelsBenchmarksProvidersLeaderboardCompare
ArticleEditHistory

GPQA Diamond

GPQA Diamond

Category
reasoning
Score unit
%
Higher is better
yes

Graduate-level science questions, expert-curated.

Leaderboard

#ModelProvider%EvaluatedSource
1Claude Opus 4.7Anthropic83.3%—
2o1OpenAI78.0%—

+ Add result

Wikibench — community-edited AI benchmark data.AboutContent licensed CC BY-SA 4.0.