Journalist at Unknown
Pointed out flaws in the 'Workslop' study via a LinkedIn post regarding AI research methodology
How this journalist typically writes
Directly quoted in these articles
OpenAI's GDPval benchmark shows Claude Opus 4.1 achieves expert-level performance on 47.6% of real-world professional tasks across 44 different professions, while performance varies significantly across models with GPT-4o performing worst at only 10%.
“Pointed out flaws in the 'Workslop' study via a LinkedIn post regarding AI research methodology”