Achieved the highest score on ARC-AGI v1 (79.6%) and v2 (29.4%) using multi-agent collaboration with evolutionary test-time compute.
How media typically covers Jeremy Berman
Referenced in coverage
Jeremy Berman achieved a new state-of-the-art score of 79.6% on ARC-AGI v1 and 29.4% on v2 using an evolutionary test-time compute architecture with English instructions instead of Python, achieving 25× greater efficiency than OpenAI's o3.
“Achieved the highest score on ARC-AGI v1 (79.6%) and v2 (29.4%) using multi-agent collaboration with evolutionary test-time compute.”