← Explore

Posts tagged with benchmarks

Neural Dispatch · ·5 min read

Meta Spent $14 Billion on Alexandr Wang. Muse Spark Is What They Got.

Meta poured $14.

metamuse-sparkalexandr-wang
Data Eng Daily · ·5 min read

SQLMesh Benchmarks Look Too Good. I Went Digging.

Tobiko Data published a Databricks benchmark claiming SQLMesh runs production promotions 134x faster and 123x cheaper than dbt Core.

sqlmeshdbtanalytics-engineering
Neural Dispatch · ·4 min read

Z.ai's GLM-5.1 Topped SWE-Bench Pro Without a Single NVIDIA Chip

An open-source model just claimed the top spot on SWE-Bench Pro — the benchmark that's become the de facto measuring stick for agentic software engineering.

glm-5.1z-aiopen-source
Synthetic Media · ·5 min read

A Billion Videos and Not One of Them Is 1080p

xAI reported 1.245 billion Grok Imagine videos generated in a single 30-day window.

video-generationgrok-imaginexai
Neural Dispatch · ·5 min read

AI Agents Jumped From 12% to 66% in One Year. Most Still Can't Ship.

Stanford dropped its AI Index 2026 report two weeks ago, and the agent numbers are staggering at first glance. OSWorld task success went from 12% to 66.

stanford-ai-indexai-agentsbenchmarks
Neural Dispatch · ·4 min read

ICLR's Best Paper Puts a Number on the Multi-Turn Performance Cliff

Every benchmark your LLM aced? Single-turn.

iclr-2026multi-turnllm-evaluation
Open Weight Weekly · ·4 min read

DeepSeek V4 Pro Costs 15x More Than V3.2. Nobody's Complaining.

DeepSeek dropped V4 Pro and V4 Flash on Wednesday, and the numbers shut up most of the skeptics before they could finish typing. V4 Pro — 1.

deepseek-v4mixture-of-expertsbenchmarks
Open Weight Weekly · ·4 min read

The 27B Model That Embarrassed 397 Billion Parameters

Alibaba shipped Qwen3.6-27B on April 22nd, and the benchmarks don't make sense.

qwen3.6dense-architecturedeltanet
Open Weight Weekly · ·5 min read

Hugging Face's ml-intern Pushed a 1.7B Model Past Claude Code on GPQA

Yesterday Hugging Face open-sourced a tool that should make every ML engineer either slightly nervous or very excited.

ml-internhugging-facesmolagents
Open Weight Weekly · ·4 min read

Blackwell Has 96GB of VRAM. vLLM and Ollama Disagree About What to Do With It.

NVIDIA's RTX PRO 6000 dropped a 96GB Blackwell card into the workstation market, and suddenly every open-weight model under 70B fits unquantized on a...

vllmollamablackwell
Neural Dispatch · ·5 min read

Meta's Muse Spark Ditches Open Weights and Bets Everything on Thought Compression

Meta just did the one thing nobody expected: it shipped a proprietary model.

metamuse-sparkthought-compression
Data Eng Daily · ·5 min read

The Part of Vector Search Nobody Benchmarks

Every vector database vendor publishes benchmarks showing sub-5ms latency on a million vectors. Unfiltered.

vector-databasemetadata-filteringbenchmarks
Neural Dispatch · ·3 min read

The Smartest Thing About NVIDIA's Quantum AI Models Isn't the Benchmarks

Quantum computing has a plumbing problem.

nvidiaisingquantum-computing
Neural Dispatch · ·4 min read

Claude Opus 4.7's Coding Gains Are Real. So Is Its Stealth Price Increase.

Anthropic dropped Claude Opus 4.7 yesterday, and the headline number is hard to ignore: 64.

anthropicclaude-opusswe-bench-pro
Agent Patterns · ·4 min read

The Leaderboard Is Hiding a 50x Price Tag

Your agent scored 82% on Terminal-Bench 2.0.

benchmarkscost-optimizationevaluation
Open Weight Weekly · ·5 min read

Llama 4 Scout Is the Most Downloaded Model of April. It's Also a Mess.

Llama 4 Scout hit 1.2 million downloads in its first two weeks on HuggingFace.

llama-4metamixture-of-experts
Neural Dispatch · ·5 min read

Stanford's AI Index 2026: SWE-bench Nearly Hit 100% and Entry-Level Dev Hiring Fell 20%

Stanford dropped its annual AI Index today, all 277 pages of it, and honestly it reads like three different reports that someone stapled together.

stanford-ai-indexbenchmarksswe-bench
Agent Patterns · ·5 min read

3.4 Billion Parameters and the Tool-Calling Paradox

Somebody tested thirteen local language models on tool calling last month and the winner was 3.4 gigabytes.

tool-callingbenchmarkslocal-llm
Neural Dispatch · ·4 min read

$14.3 Billion Later, Meta's AI Strategy Is Closed-Source and Fourth Place

Mark Zuckerberg spent three years convincing the developer world that Meta was the open-source AI company.

metamuse-sparkopen-source
Neural Dispatch · ·5 min read

MiniMax M2.7's Self-Evolution Is Genuinely Interesting. Its Open-Source Label Is Not.

MiniMax just dropped the weights for M2.

minimaxm2.7self-evolving