Friday, May 1, 2026
OpenAI banned goblins (seriously)
OpenAI's GPT-5.5 system prompt explicitly bans talking about goblins due to a bizarre bug (wild), while Apple dropped LaDiR research showing latent diffusion can boost LLM reasoning beyond chain-of-thought. Meanwhile, Hugging Face is sounding alarms that AI evals now cost $40K per run, effectively pricing academics out of holding frontier models accountable (yikes). Should we worry when only Big Tech can afford to test AI safety?
Top Stories
Ars Technica
OpenAI's GPT-5.5 system prompt contains explicit instructions to avoid mentioning goblins and similar creatures, revealing an unusual emergent behavior in the latest model where it inappropriately references these topics in unrelated conversations.
Apple Machine Learning Research
Apple researchers propose LaDiR, combining latent diffusion models with LLMs to enable iterative refinement and parallel exploration of reasoning paths, outperforming traditional autoregressive chain-of-thought methods on mathematical and planning tasks. This approach addresses limitations of sequential token generation by allowing models to revise reasoning holistically.
PyTorch
PyTorch's AutoSP is a compiler-based tool that automates sequence parallelism for long-context LLM training (100k+ tokens), eliminating complex manual code changes while maintaining performance parity with hand-written implementations and integrating seamlessly with DeepSpeed.
João Moura highlights AI agents capable of building themselves, indicating progress in autonomous, self-improving AI systems that could reduce human oversight in AI development.
Hugging Face
AI evaluation has become prohibitively expensive, with agent benchmarks costing $40,000+ per run and reliability testing multiplying costs 8×, effectively excluding academic researchers and independent auditors from evaluating frontier systems. Unlike static benchmarks that could be compressed 100×, agent and training-in-the-loop evaluations resist cost reduction, reversing the traditional training-dominant compute model.
Keep Reading
Industry Voices
Enjoyed this issue?
Get daily AI intel delivered to your inbox. No fluff, just the stories that matter.