Sunday, March 15, 2026

OpenAI shows how agents get shell access

OpenAI's shipping a security agent (Codex Security), interactive visual modules for ChatGPT's math formulas, and—most notably—just pulled back the curtain on how their Responses API actually works: containerized environments with shell access for agents (wild). Meanwhile, Anthropic's answering with their own Claude Marketplace for enterprises. Running AI agents with shell access in containers—would you trust it?

OpenAI's Responses API now provides agents with a controlled computer environment featuring shell access, containerized execution, network controls, and automatic context compaction, enabling complex multi-step workflows beyond simple model prompting. This infrastructure addresses key practical challenges in production agent deployment including security, state management, and workflow reusability.

openaiagentsapiinfrastructure

OpenAI adds interactive visual modules to ChatGPT, enabling real-time exploration of math and science formulas

OpenAI launched interactive visual modules in ChatGPT for 70+ math and science concepts, allowing users to manipulate variables and see real-time effects on formulas and graphs. This feature targets the 140 million weekly users who already use ChatGPT for STEM learning, addressing widespread math anxiety with research-backed interactive pedagogy.

openaichatgpteducationproduct-launch

OpenAI reveals Codex Security, an AI-powered application security agent that identifies code vulnerabilities

OpenAI

OpenAI launched Codex Security, an AI agent that identifies code vulnerabilities with high precision by building project-specific threat models and validating findings to reduce false positives by over 50%. The tool is now available to enterprise customers and being offered free to open-source maintainers to strengthen software security across the ecosystem.

openaiagentssecuritycodex

Teaching LLMs to reason like Bayesians

Google Research

Google Research trained LLMs to perform Bayesian probabilistic reasoning by having them mimic optimal Bayesian models rather than just learning correct answers, achieving 80% agreement with mathematical ideals and enabling cross-domain generalization. This demonstrates LLMs can learn abstract reasoning skills through distillation and transfer them to new tasks.

llmreasoningbayesian-inferencedistillation

Anthropic launches Claude Marketplace, plus two new Claude Code features

Anthropic

Anthropic introduces Claude Marketplace, enabling enterprises to apply their existing Anthropic commitments toward Claude-powered partner solutions from companies like GitLab, Harvey, and Snowflake. This consolidates AI procurement and extends customer investment across multiple enterprise-ready tools.

anthropicclaudeenterprise-ai-adoptionmarketplace