Researcher at Google DeepMind
Co-author of the SimpleQA Verified benchmark paper
How media typically covers Lukas Haas
Lukas Haas as author
SimpleQA Verified, an improved factuality benchmark for LLMs, reveals that Gemini 2.5 Pro achieves state-of-the-art F1-score of 55.6, outperforming GPT-5 on measuring parametric knowledge and hallucination mitigation.
“Co-author of the SimpleQA Verified benchmark paper”