New research shows that splitting coding work across multiple LLM agents causes a 25–39pp accuracy drop that only better specifications can fix—not fancy coordination tools.
NVIDIA, Sysdig, and a wave of indie tools are shipping OS-level monitoring for coding agents. The industry just admitted that sandboxing alone isn't enough.
Cursor's Kimi revelation shows why practitioners need to trace the hidden dependencies in their AI development tools.
Google's new AI tool catches 53% of Linux kernel bugs by pattern-matching rather than trying to understand code, suggesting narrow AI applications beat ambitious ones.
While others build complex infrastructure for AI agents to navigate websites, Rover inverts the model by making the website itself the execution environment.
The real risk of OpenAI acquiring Astral isn't that uv goes proprietary. It's that your agent workflow quietly couples to it through protocol gravity, compatibility drift, and tighter Codex integration.
Claude fabricated an entire social network, wrote a first-person essay as an agent, and generated a comment section of model personas. The lesson isn't that LLMs hallucinate. It's that hallucination becomes a coordination mechanism.
New research shows AI coding agents exhibit consistent biases in problem-solving approaches that persist within model families but change across versions, creating novel challenges for production systems.
SWE-Skills-Bench finds most agent skills don't improve real repo outcomes, and some make things worse. Independent research on 673 skills reveals why: the failure modes are more varied and surprising than version mismatch alone.
JSSE passes 99.81% of test262 with zero human code. That's the easy part. Maintainability, harness trust, and the missing layers above conformance are where agent-generated code gets hard.