Mathematical Reasoning

OpenAI o3: Breakthrough in mathematical reasoning and code generation

OpenAI's o3 model achieves breakthrough performance on mathematical reasoning benchmarks, including 96.7% on AIME 2024 and strong performance on frontier math tasks.

Visit Website

Abstract

We introduce OpenAI o3, a new reasoning model that achieves breakthrough performance on mathematical reasoning, coding, and scientific problem-solving tasks.

Key Results

AIME 2024: 96.7% (near-perfect)
GPQA Diamond: 87.7% (PhD-level science)
FrontierMath: 25.2% (unsolved problems)
ARC-AGI: 75.7% to 87.5% (adaptive reasoning)

Capabilities

Extended thinking: Longer inference-time computation
Self-correction: Identifies and fixes reasoning errors
Tool use: Integrated Python and search
Multimodal: Image and text understanding

Safety

Deliberation alignment
Adversarial robustness testing
Chain-of-thought monitoring

Availability

Research preview for safety testing. General availability pending.