Sunday, March 15, 2026
OpenAI shows how agents get shell access
OpenAI's shipping a security agent (Codex Security), interactive visual modules for ChatGPT's math formulas, and—most notably—just pulled back the curtain on how their Responses API actually works: containerized environments with shell access for agents (wild). Meanwhile, Anthropic's answering with their own Claude Marketplace for enterprises. Running AI agents with shell access in containers—would you trust it?
Top Stories
OpenAI's Responses API now provides agents with a controlled computer environment featuring shell access, containerized execution, network controls, and automatic context compaction, enabling complex multi-step workflows beyond simple model prompting. This infrastructure addresses key practical challenges in production agent deployment including security, state management, and workflow reusability.
OpenAI launched interactive visual modules in ChatGPT for 70+ math and science concepts, allowing users to manipulate variables and see real-time effects on formulas and graphs. This feature targets the 140 million weekly users who already use ChatGPT for STEM learning, addressing widespread math anxiety with research-backed interactive pedagogy.
OpenAI
OpenAI launched Codex Security, an AI agent that identifies code vulnerabilities with high precision by building project-specific threat models and validating findings to reduce false positives by over 50%. The tool is now available to enterprise customers and being offered free to open-source maintainers to strengthen software security across the ecosystem.
Google Research
Google Research trained LLMs to perform Bayesian probabilistic reasoning by having them mimic optimal Bayesian models rather than just learning correct answers, achieving 80% agreement with mathematical ideals and enabling cross-domain generalization. This demonstrates LLMs can learn abstract reasoning skills through distillation and transfer them to new tasks.
Anthropic
Anthropic introduces Claude Marketplace, enabling enterprises to apply their existing Anthropic commitments toward Claude-powered partner solutions from companies like GitLab, Harvey, and Snowflake. This consolidates AI procurement and extends customer investment across multiple enterprise-ready tools.
Keep Reading
Industry Voices
Sjoerd van Steenkiste
Research Scientist at Google Research
Pushes the boundaries of object-centric learning and compositional world models for AI systems that can decompose and reason about scenes.
Tal Linzen
Research Scientist at Google Research
Bridges linguistics and deep learning to probe what language models actually understand about syntax and meaning.
Sammy Azdoufal
Software Engineer
Shares practical insights on building and scaling production ML systems at major tech companies.
Enjoyed this issue?
Get daily AI intel delivered to your inbox. No fluff, just the stories that matter.