Mathematical Reasoning
OpenAI o3: Breakthrough in mathematical reasoning and code generation
OpenAI's o3 model achieves breakthrough performance on mathematical reasoning benchmarks, including 96.7% on AIME 2024 and strong performance on frontier math tasks.
Abstract
We introduce OpenAI o3, a new reasoning model that achieves breakthrough performance on mathematical reasoning, coding, and scientific problem-solving tasks.
Key Results
- AIME 2024: 96.7% (near-perfect)
- GPQA Diamond: 87.7% (PhD-level science)
- FrontierMath: 25.2% (unsolved problems)
- ARC-AGI: 75.7% to 87.5% (adaptive reasoning)
Capabilities
- Extended thinking: Longer inference-time computation
- Self-correction: Identifies and fixes reasoning errors
- Tool use: Integrated Python and search
- Multimodal: Image and text understanding
Safety
- Deliberation alignment
- Adversarial robustness testing
- Chain-of-thought monitoring
Availability
Research preview for safety testing. General availability pending.