Tuesday, May 26, 2026

Pentagon shopping for Anthropic alternatives

Anthropic's having quite a week—their new Claude Mythos Preview is crushing exploit development benchmarks (yikes), they're closing a $30B+ funding round next week, and the Pentagon is actively testing their replacements. Meanwhile, an AI agent just autonomously discovered a reasoning algorithm that cuts inference compute by 70%, which is wild. Should we be more worried about the exploits or excited about the efficiency?

Anthropic

Anthropic's Claude Mythos Preview demonstrates unprecedented exploit development capabilities, successfully creating end-to-end exploits for real-world vulnerabilities across multiple benchmarks while vastly outperforming other frontier models. This capability advancement prompted Anthropic to deploy the model through a restricted program due to cybersecurity concerns, as such capabilities are expected to become widely available within 6-12 months.

anthropicclaudecybersecurityllm

Pentagon tests rival AI models in race to replace Anthropic

Bloomberg

The U.S. Pentagon is testing competing AI models to potentially replace Anthropic, signaling a shift in the Department of Defense's AI vendor strategy and the critical role of AI in military operations.

anthropicpentagondefensegovernment

Evaluating Multi-Agent Systems at Scale

OpenAI

OpenAI demonstrates a scalable evaluation framework for multi-agent systems that combines agent-level evals with macro-analysis using clustering and trace diagnosis. The approach identifies recurring failure patterns across thousands of runs and pinpoints which agents, handoffs, or tools require inspection, enabling teams to prioritize fixes by impact rather than reviewing traces individually.

agentsopenaievaluationmulti-agent

Anthropic to close over 30 billion round as soon as next week

Bloomberg

Anthropic is set to close a funding round valuing the company at over $30 billion next week, marking one of the largest AI investments to date and highlighting fierce competition among foundation model developers.

anthropicfundingllmfoundation-models

Claude Code autonomously discovers a reasoning algorithm cutting inference compute 70%

arXiv

AutoTTS enables AI agents to automatically discover optimal test-time scaling strategies for LLMs, achieving 70% inference compute reduction while improving accuracy. The framework shifts research focus from manual heuristic design to creating environments where AI can autonomously optimize its own reasoning patterns.

llmtest-time-computeagentsinference-optimization