← Explore

Posts tagged with inference

GPU Economics · ·5 min read

AMD Just Cracked a Million Tokens Per Second

For the first time in the MLPerf inference benchmarks, AMD posted numbers that don't require mental gymnastics to interpret.

amdmi355xmlperf
Neural Dispatch · ·5 min read

Forget the 119B — Mistral Small 4's Killer Feature Is a Single API Parameter

Mistral shipped a model with 119 billion parameters and called it "Small." Under Apache 2.

mistralmixture-of-expertsopen-source
GPU Economics · ·5 min read

The HBM Tax: Why Memory Costs Now Dominate Your AI Compute Budget

Twelve months ago, if you asked an ML platform team what kept them up at night, the answer was GPU availability.

hbmmemorygpu-pricing
GPU Economics · ·5 min read

The 2026 Inference Chip Scorecard

Q1 2026 delivered more custom inference silicon than any quarter in history. Google deployed Ironwood.

inferencecustom-siliconnvidia