Wikibench

A community-edited reference of AI model benchmark results. Browse models, benchmarks, and providers. Compare any two models side by side, or browse the leaderboards.

Benchmark coverage

Recorded results by benchmark category.

Benchmark coverageRecorded results by benchmark category.coding — Count: 8 — Category: coding — 88reasoning — Count: 7 — Category: reasoning — 77math — Count: 5 — Category: math — 55multimodal — Count: 4 — Category: multimodal — 44general — Count: 3 — Category: general — 33
  • coding — Count: 8 — Category: coding — 8
  • reasoning — Count: 7 — Category: reasoning — 7
  • math — Count: 5 — Category: math — 5
  • multimodal — Count: 4 — Category: multimodal — 4
  • general — Count: 3 — Category: general — 3

Models by provider

Catalogued models for the most represented providers.

  • Anthropic — Count: 2 — Provider: Anthropic — 2
  • OpenAI — Count: 2 — Provider: OpenAI — 2
  • Google DeepMind — Count: 1 — Provider: Google DeepMind — 1
  • Meta — Count: 1 — Provider: Meta — 1
  • Mistral AI — Count: 1 — Provider: Mistral AI — 1