Claude 3.5 Sonnet

Mid-tier Anthropic model.

Benchmark results

BenchmarkCategoryScoreVerifiedSource
Chatbot Arenageneral1271yes
HumanEvalcoding92.0%yes
MMLUreasoning88.7%yes
MMMUmultimodal70.4%yes
SWE-bench Verifiedcoding49.0%yes

+ Add result