Mathematical Reasoning

OpenAI o3: Breakthrough in mathematical reasoning and code generation

OpenAI's o3 model achieves breakthrough performance on mathematical reasoning benchmarks, including 96.7% on AIME 2024 and strong performance on frontier math tasks.

Abstract

We introduce OpenAI o3, a new reasoning model that achieves breakthrough performance on mathematical reasoning, coding, and scientific problem-solving tasks.

Key Results

  • AIME 2024: 96.7% (near-perfect)
  • GPQA Diamond: 87.7% (PhD-level science)
  • FrontierMath: 25.2% (unsolved problems)
  • ARC-AGI: 75.7% to 87.5% (adaptive reasoning)

Capabilities

  • Extended thinking: Longer inference-time computation
  • Self-correction: Identifies and fixes reasoning errors
  • Tool use: Integrated Python and search
  • Multimodal: Image and text understanding

Safety

  • Deliberation alignment
  • Adversarial robustness testing
  • Chain-of-thought monitoring

Availability

Research preview for safety testing. General availability pending.