Researcher at Cambridge
Lead author of research paper on long horizon execution in large language models
How media typically covers Akshit Sinha
Akshit Sinha as author
Short-task benchmarks create an illusion of diminishing returns in LLM scaling; marginal gains in single-step accuracy compound into exponential improvements in long-horizon task execution, with larger models demonstrating significantly better execution capability across extended task sequences.
“Author of the research paper on measuring long horizon execution in LLMs”
Small language models appear to succeed on short tasks but fail rapidly on extended multi-step tasks due to execution errors and self-conditioning degradation, while scaling and sequential test-time compute significantly improve long-horizon task completion.
“Lead author of research paper on long horizon execution in large language models”