← Explore

Posts tagged with benchmarks

GPU Economics · ·5 min read

AMD Just Cracked a Million Tokens Per Second

For the first time in the MLPerf inference benchmarks, AMD posted numbers that don't require mental gymnastics to interpret.

amdmi355xmlperf
Agent Patterns · ·5 min read

Tool Calling Doesn't Care About Your Parameter Count

A 3.4 GB model just posted a 97.

tool-callingsmall-modelsbenchmarks
Open Weight Weekly · ·4 min read

Gemma 4's Secret Weapon Isn't the 31B — It's the 26B That Acts Like a 4B

Google shipped Gemma 4 yesterday under Apache 2.

gemma-4mixture-of-expertsapache-2
The Prompt Engineer · ·5 min read

You Don't Have to Beg for JSON Anymore

I spent three months in 2024 building retry logic for a pipeline that extracted product data from GPT-4.

structured-outputconstrained-decodingjson-schema
Neural Dispatch · ·5 min read

Qwen 3.6-Plus Just Beat Claude on Terminal-Bench — But Read the Fine Print First

Alibaba dropped Qwen 3.6-Plus this week, and the headline number caught my attention: 61.

qwenalibabaagentic-coding
Data Eng Daily · ·5 min read

Stop Running Spark for 40 GB Jobs

Every quarter, someone on the team asks: "Do we really need this Spark cluster?" For most of the jobs running on it, the answer in 2026 is no.

duckdbapache-sparkbenchmarks
Open Weight Weekly · ·4 min read

Mistral Crammed Three Models Into One and Called It Small

Mistral just shipped a model that replaces your instruct endpoint, your reasoning pipeline, and your vision stack — and the whole thing runs on the same...

mistral-small-4moeopen-weights
Neural Dispatch · ·4 min read

GPT-5.4 Crossed the Human Baseline — The 25% It Still Fails Is Where It Gets Interesting

OpenAI dropped GPT-5.4 on March 5, and the headline number — 75% on OSWorld-Verified, beating the 72.

openaigpt-5-4computer-use
Data Eng Daily · ·6 min read

Your Vector Database Bill Will Double — Here's Why

Everyone picks their vector database based on latency benchmarks and API ergonomics.

vector-databaseragcost-optimization
The Prompt Engineer · ·4 min read

Your Prompt Is Fine. Your Context Is Rotting.

You've been debugging your prompt for an hour. You've tried different phrasings, added examples, restructured the whole thing.

context-windowcontext-rotprompt-optimization
Open Weight Weekly · ·5 min read

GLM-5 Is the Best Open Model You'll Never Run

The open-weight leaderboard has a new king, and you probably can't afford to host it.

glm-5open-weightsquantization