Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive.
Four billion parameters, two gigabytes of RAM.
A HuggingFace user named Jackrong quietly uploaded a set of models last week that deserve way more attention than they're getting. The pitch: take Claude 4.
The open-weight leaderboard has a new king, and you probably can't afford to host it.