Sunday, April 5, 2026

Claude Code can control your whole computer now

Anthropic just dropped computer use into Claude Code, meaning you can now run full UI testing and debugging straight from the CLI (wild). Meanwhile, economists are making the case that comparative advantage could keep us employed even when AI outperforms humans at literally everything—though LLMs still can't recognize themselves in a mirror, so maybe we've got time. Should we trust AI that can't pass a self-awareness test to build our products?

Anthropic's Claude Code now supports computer use from the CLI, allowing the AI to control native apps, perform end-to-end UI testing, and debug visual issues by directly interacting with macOS GUIs. This eliminates the need for traditional test harnesses while implementing safety controls like per-app permissions and session locks.

anthropicclaudeagentsui-testing

AI Applications and Vertical Integration

AI application companies are vertically integrating either downward into building proprietary domain-specific models (using usage data as training fuel) or upward into full-service delivery, transforming from pure software plays into full-stack businesses that control more of the value chain and can better differentiate in competitive markets.

vertical-integrationagentsllmenterprise-ai-adoption

Plentiful, high-paying jobs in the age of AI

Noahpinion

Contrary to widespread belief in the AI industry, humans may retain well-paid jobs even when AI becomes better at everything, because limited compute resources will force allocation of AI to highest-value tasks, leaving humans with comparative advantage in other areas. The real dangers are wealth inequality, difficult transitions, and potential resource conflicts rather than mass unemployment.

labor-marketcomparative-advantageagieconomic-impact

What Pretext Reinforced About AI Loops

Nibzard

Pretext demonstrates that successful AI coding agents require rigorous engineering loops with hard constraints and empirical validation rather than relying on model authority. The project shows AI should accelerate throughput within a disciplined framework where most suggestions are rejected, not trusted as the primary decision-maker.

agentsai-codingsoftware-engineeringllm

A Mirror Test For LLMs

LessWrong

Researchers created a mirror test for LLMs where models must identify their own outputs among unlabeled tokens. Top models like Claude Opus 4.6 succeeded by recognizing their writing style rather than employing true self-aware communication strategies, ultimately failing under adversarial conditions—suggesting current LLMs lack genuine self-awareness despite showing superficial capabilities.

llmself-awarenessinterpretabilityanthropic