Researcher

Greg Burnham

Senior Researcher at Epoch AI

Explained FrontierMath benchmark for measuring AI mathematical reasoning capabilities and discussed AI's advancement in solving PhD-level math problems.

Mentions

Articles

Outlets

Nov 2025 — Feb 2026

Coverage

Associated Topics

Companies Covered

Coverage Patterns

How media typically covers Greg Burnham

Article Types

news1
analysis1

Media Angles

technical2

Narrative Framing

breakthrough1
trend1

Appears In

Associated AI Models

GPT-5.21Claude Opus 4.61Gemini Deep Think1Aletheia1Claude1Gemini 31

Articles

Most recent first

Articles Written

Greg Burnham as author

Benchmark Scores = General Capability + Claudiness

Epoch AIanalysisneutralNov 24, 2025

Benchmark scores across AI models are dominated by a single 'General Capability' dimension, with a secondary 'Claudiness' dimension that captures Claude's unique performance profile across agentic tasks, vision, and math.

“Author of "Benchmark Scores = General Capability + Claudiness" in Epoch AI”

AI ResearchFoundation ModelsBenchmarkingGenerative AI

Also Mentioned In

Referenced in coverage

How Far Can AI Go in Solving Math Mysteries?

IEEE SpectrumnewscautiousFeb 26, 2026

State-of-the-art AI models now solve over 40% of FrontierMath benchmark problems, up from 2% at launch, while Google DeepMind's Aletheia achieved publishable PhD-level mathematics autonomously, forcing the need for new, harder benchmarks.

“Explained FrontierMath benchmark for measuring AI mathematical reasoning capabilities and discussed AI's advancement in solving PhD-level math problems.”

AI ResearchAI SafetyFoundation ModelsBenchmarking and Evaluation