← Explore

Posts tagged with mixture-of-experts

Open Weight Weekly · ·4 min read

NVIDIA Snuck Mamba Into a 120B Model and Nobody Blinked

NVIDIA dropped Nemotron 3 Super a few weeks ago, and the discourse moved on within 48 hours. Understandable — March was a firehose of model releases.

nemotron-3-supernvidiamamba
Synthetic Media · ·5 min read

Wan2.2 Splits Its Brain in Half — And That's the Point

Every video diffusion model released in the last year has followed the same playbook: train bigger, throw more VRAM at inference, charge accordingly.

video-generationwan2.2mixture-of-experts
Neural Dispatch · ·4 min read

Gemma 4 Ships Under Apache 2.0 With an Architecture Nobody Expected

Google dropped Gemma 4 on Wednesday — four open-weight models under a genuine Apache 2.0 license, built from the same research behind Gemini 3.

gemma-4googleopen-source
Open Weight Weekly · ·4 min read

Gemma 4's Secret Weapon Isn't the 31B — It's the 26B That Acts Like a 4B

Google shipped Gemma 4 yesterday under Apache 2.

gemma-4mixture-of-expertsapache-2
Neural Dispatch · ·5 min read

Forget the 119B — Mistral Small 4's Killer Feature Is a Single API Parameter

Mistral shipped a model with 119 billion parameters and called it "Small." Under Apache 2.

mistralmixture-of-expertsopen-source
Neural Dispatch · ·5 min read

Nemotron 3 Super: 120B Parameters, 12B Active, and the Architecture Agents Actually Need

NVIDIA dropped Nemotron 3 Super a few weeks ago and it flew under the radar — buried by the Mythos leak drama and GPT-5.4's benchmark parade.

nvidianemotronmamba