← Explore

Posts tagged with quantization

Neural Dispatch · ·5 min read

Google's TurboQuant Just Made Your GPU Feel Twice as Big

Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive.

turboquantgoogle-researchkv-cache
Edge Deployed · ·5 min read

87% Smaller, 2% Dumber: A Field Guide to INT4 Quantization

Four billion parameters, two gigabytes of RAM.

quantizationint4gptq
Open Weight Weekly · ·6 min read

Someone Distilled Claude's Thinking Into Qwen3.5 — And It Actually Works

A HuggingFace user named Jackrong quietly uploaded a set of models last week that deserve way more attention than they're getting. The pitch: take Claude 4.

qwen3.5distillationreasoning
Open Weight Weekly · ·5 min read

GLM-5 Is the Best Open Model You'll Never Run

The open-weight leaderboard has a new king, and you probably can't afford to host it.

glm-5open-weightsquantization