Researcher at Unknown
Author of the research paper on measuring long horizon execution in LLMs
How media typically covers Arvindh Goel
Arvindh Goel as author
Short-task benchmarks create an illusion of diminishing returns in LLM scaling; marginal gains in single-step accuracy compound into exponential improvements in long-horizon task execution, with larger models demonstrating significantly better execution capability across extended task sequences.
“Author of the research paper on measuring long horizon execution in LLMs”