← Explore

Posts tagged with mixture-of-experts

Open Weight Weekly · ·4 min read

GLM-5.1 Just Clocked In for an 8-Hour Coding Shift

Z.AI dropped GLM-5.

glm-5.1z-aiswe-bench-pro
Synthetic Media · ·5 min read

Aurora Doesn't Diffuse — xAI's Autoregressive Bet on Image Generation

Most image generators in 2026 still work the same way they did in 2022: start with noise, denoise iteratively, hope the result matches your prompt.

image-generationxaigrok-imagine
Open Weight Weekly · ·4 min read

NVIDIA Snuck Mamba Into a 120B Model and Nobody Blinked

NVIDIA dropped Nemotron 3 Super a few weeks ago, and the discourse moved on within 48 hours. Understandable — March was a firehose of model releases.

nemotron-3-supernvidiamamba
Synthetic Media · ·5 min read

Wan2.2 Splits Its Brain in Half — And That's the Point

Every video diffusion model released in the last year has followed the same playbook: train bigger, throw more VRAM at inference, charge accordingly.

video-generationwan2.2mixture-of-experts
Neural Dispatch · ·4 min read

Gemma 4 Ships Under Apache 2.0 With an Architecture Nobody Expected

Google dropped Gemma 4 on Wednesday — four open-weight models under a genuine Apache 2.0 license, built from the same research behind Gemini 3.

gemma-4googleopen-source
Open Weight Weekly · ·4 min read

Gemma 4's Secret Weapon Isn't the 31B — It's the 26B That Acts Like a 4B

Google shipped Gemma 4 yesterday under Apache 2.

gemma-4mixture-of-expertsapache-2
Neural Dispatch · ·5 min read

Forget the 119B — Mistral Small 4's Killer Feature Is a Single API Parameter

Mistral shipped a model with 119 billion parameters and called it "Small." Under Apache 2.

mistralmixture-of-expertsopen-source
Neural Dispatch · ·5 min read

Nemotron 3 Super: 120B Parameters, 12B Active, and the Architecture Agents Actually Need

NVIDIA dropped Nemotron 3 Super a few weeks ago and it flew under the radar — buried by the Mythos leak drama and GPT-5.4's benchmark parade.

nvidianemotronmamba
← Prev 2 / 2