Wednesday, April 8, 2026

Top AI News - April 08, 2026

XpertBench benchmark reveals that state-of-the-art LLMs achieve only ~55-66% success on expert-level professional tasks across domains like healthcare, finance, and legal, exposing a significant 'expert-gap' in current AI capabilities. The benchmark uses 1,346 expert-curated tasks with detailed rubrics to assess genuine professional competency beyond conventional benchmarks.

llmbenchmarksevaluationexpert-systems

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk

Wired

Meta halted work with AI data contractor Mercor after a security breach potentially exposed proprietary training data from major AI labs including OpenAI and Anthropic, prompting industrywide reassessment of vendor relationships and highlighting vulnerabilities in the AI supply chain.

securitydata-breachmetaopenai

Hugging Face Ultra-Scale Playbook

Hugging Face

Hugging Face published an in-depth educational playbook on training LLMs at scale, covering various parallelism strategies backed by 4,000 experiments across 512 GPUs. The resource stands out for its comprehensive treatment of distributed training techniques with interactive visualizations, exemplifying quality open-source AI education.

hugging-facellmopen-sourcegpu

Claude Code v2.1.92 introduces Ultraplan

Reddit - ClaudeAI

Article about Claude Code v2.1.92 and 'Ultraplan' feature is inaccessible due to network blocking, preventing analysis of the update's details.

claudeanthropiccoding-assistant

Per-Layer Embeddings: A simple explanation of the magic behind the small Gemma 4 models

Reddit - LocalLlama

Google's Gemma 4 introduces Per-Layer Embeddings (PLE) in small 2B/4B models for efficient edge deployment, alongside 31B dense and 26B MoE variants. This architecture maximizes parameter efficiency by giving each decoder layer its own token embeddings, trading static memory for on-device performance.

gemmagooglellmedge-ai

Keep Reading

•

Sam Altman and OpenAI governance investigationThe New Yorker

•

Data Agent Benchmark

•

Anthropic compute scale announcementAnthropic

•

I Still Prefer MCP Over Skills

•

Meta-Harness: End-to-End Optimization of Model HarnessesarXiv

Enjoyed this issue?

Get daily AI intel delivered to your inbox. No fluff, just the stories that matter.

Top Stories

Keep Reading

Enjoyed this issue?