PostTrainBench: Teaching AI to train itself

PostTrainBench just dropped a new way to measure whether AI agents can actually fine-tune language models on their own (spoiler: this matters way more than it sounds). Meanwhile, Plaud is going hard at CES with an AI pin and meeting notetaker, while a much thornier problem is brewing in the background: AI completely broke SaaS pricing models, and startups are scrambling to figure out how to charge for anything anymore (yikes). On the research side, Meta's KernelEvolve is automating kernel optimization across AI accelerators, and scientists discovered that foundation models are converging on eerily similar ways of representing matter. So here's the real question: if AI agents can now fine-tune themselves, who needs us anymore?

Image via Every

Top Stories

PostTrainBench

GitHub

PostTrainBench introduces a benchmark for measuring autonomous AI agents' capability to improve small language models through post-training under resource constraints, directly evaluating AI systems' ability to conduct independent R&D tasks.

agentsllmbenchmarkfine-tuning

How AI Made Pricing Hard Again

Every

AI's per-token costs have broken traditional SaaS economics, forcing startups to rethink pricing models with cross-functional rigor or face bankruptcy like MoviePass. Success requires aligning pricing strategy with actual cost structure and customer value perception rather than copying subscription-based playbooks.

pricingai-economicssaasllm

Plaud launches a new AI pin and a desktop meeting notetaker

TechCrunch

Plaud released a new AI pin notetaker and desktop meeting app to compete across both physical and digital note-taking markets, leveraging its 1.5M-unit installed base to challenge specialized meeting transcription services.

hardwareai-notetakertranscriptionmeeting-notes

Universally Converging Representations of Matter Across Scientific Foundation Models

arXiv

Scientific foundation models across diverse architectures and modalities learn remarkably similar internal representations of matter, but only for familiar training domains—revealing both convergence toward physical reality and fundamental limitations in generalization beyond training data.

foundation-modelsrepresentation-learningscientific-mlmolecules

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

arXiv

Meta's KernelEvolve automates kernel optimization for recommendation models across heterogeneous AI accelerators, reducing development time from weeks to hours while achieving 100% correctness across diverse hardware platforms.

agentskernel-optimizationdeep-learninghardware-accelerators