← Explore

Posts tagged with local-inference

Neural Dispatch · ·5 min read

Google's TurboQuant Just Made Your GPU Feel Twice as Big

Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive.

turboquantgoogle-researchkv-cache
Neural Dispatch · ·4 min read

Gemma 4 Ships Under Apache 2.0 With an Architecture Nobody Expected

Google dropped Gemma 4 on Wednesday — four open-weight models under a genuine Apache 2.0 license, built from the same research behind Gemini 3.

gemma-4googleopen-source
Open Weight Weekly · ·4 min read

Gemma 4's Secret Weapon Isn't the 31B — It's the 26B That Acts Like a 4B

Google shipped Gemma 4 yesterday under Apache 2.

gemma-4mixture-of-expertsapache-2
Open Weight Weekly · ·6 min read

Someone Distilled Claude's Thinking Into Qwen3.5 — And It Actually Works

A HuggingFace user named Jackrong quietly uploaded a set of models last week that deserve way more attention than they're getting. The pitch: take Claude 4.

qwen3.5distillationreasoning