o1
Reasoning-tuned model.
Benchmark results
| Benchmark | Category | Score | Verified | Source |
|---|---|---|---|---|
| AIME 2024 | math | 83.3% | yes | |
| GPQA Diamond | reasoning | 78.0% | yes | |
| HumanEval | coding | 92.4% | yes | |
| MATH | math | 94.8% | yes |
Reasoning-tuned model.
| Benchmark | Category | Score | Verified | Source |
|---|---|---|---|---|
| AIME 2024 | math | 83.3% | yes | |
| GPQA Diamond | reasoning | 78.0% | yes | |
| HumanEval | coding | 92.4% | yes | |
| MATH | math | 94.8% | yes |