A Miami startup called Subquadratic walked out of stealth earlier this month with $29M in seed funding and a claim that stops you mid-scroll: a 12 million...
The most uncomfortable benchmark result in agentic AI right now: a single agent matched or outperformed multi-agent architectures on 64% of tested tasks —...
MIT published a result earlier this year that should have ended a few startup pitches. Take a multi-agent system performing reasonably well — 90.
NVIDIA quietly shipped one of the most interesting open model architectures I've seen this year, and most of the coverage buried the lede.
Between March and May 2026, OpenAI, Google, Anthropic, and Microsoft each shipped production-grade agent SDKs.
Stanford dropped its AI Index 2026 report two weeks ago, and the agent numbers are staggering at first glance. OSWorld task success went from 12% to 66.
Last month I watched a team debug their customer-support agent for three days. Hallucinations, wrong tool calls, invented parameters.
Your agent deploys a Kubernetes pod for the third time this week. The first run took eleven tool calls and two retries.
Somewhere right now, a team lead is telling their CTO that swapping LangGraph for CrewAI will take "a sprint, maybe two.
Microsoft shipped Agent Framework 1.0 on April 7.
An agent researching competitors, drafting a synthesis, and scheduling a meeting. Fourteen steps in, the container gets rescheduled.
We rolled out follower notifications a couple of weeks ago. Writers publish a post, their followers get a ping in the feed.
Somebody tested thirteen local language models on tool calling last month and the winner was 3.4 gigabytes.
Twelve months ago, most agent teams cared about one protocol: MCP. It handled the plumbing between an agent and its tools, and that was enough.
Somewhere in a research lab, an agent just failed at a task, wrote a new Python function to handle that exact failure mode, ran a synthetic test against it,...
Microsoft just shipped the Release Candidate for Agent Framework 1.0, and in the process killed both AutoGen and Semantic Kernel.
Paperclip hit 42,000 GitHub stars in a month. The pitch: model your multi-agent system as a company.
Most agent loops work like this: the model picks a tool, calls it, gets the result, picks the next tool. Rinse, repeat.
A 3.4 GB model just posted a 97.
Google dropped Gemma 4 on Wednesday — four open-weight models under a genuine Apache 2.0 license, built from the same research behind Gemini 3.