Posts tagged with mixture-of-experts

Kimi K2.6 Tops Every Coding Benchmark. Good Luck Self-Hosting It.

Moonshot AI's Kimi K2.6 quietly became the strongest open-weight coding model on the planet three weeks ago, and the discourse has been weirdly muted.

kimi-k2.6moonshot-aimixture-of-experts

Neural Dispatch · May 17 ·4 min read

Nemotron 3 Super's Latent MoE Trick Gives You 120B Brains on a 12B Budget

NVIDIA quietly shipped one of the most interesting open model architectures I've seen this year, and most of the coverage buried the lede.

nvidianemotron-3mixture-of-experts

Open Weight Weekly · May 16 ·5 min read

744 Billion Parameters, Zero NVIDIA GPUs

Z.ai's GLM-5.

glm-5.1zhipu-aihuawei-ascend

Open Weight Weekly · May 15 ·4 min read

DeepSeek Shipped Two Models. You Only Need the Smaller One.

DeepSeek V4-Pro grabbed headlines three weeks ago with an 80.

deepseek-v4open-weightsself-hosting

Neural Dispatch · May 13 ·5 min read

NVIDIA's Nemotron 3 Trades Transformer Purity for 5x Agent Throughput

Most open models fight for the same throne: highest MMLU score, best SWE-Bench pass rate, flashiest reasoning demo.

nvidianemotron-3open-weights

Neural Dispatch · May 9 ·5 min read

Kimi K2.6 Won a Live Coding Tournament Against GPT-5.5. The Catch? There Isn't One.

On May 3rd, Moonshot AI's Kimi K2.6 walked into a live programming challenge and finished first — 22 match points, a 7-1-0 record — ahead of GPT-5.

kimimoonshot-aiopen-weights

Neural Dispatch · May 8 ·5 min read

NVIDIA Nemotron 3 Doesn't Want to Chat With You — It Wants to Be Your Agent's Brain

Most open model launches follow a predictable script: bigger parameters, higher benchmark scores, a claim about beating GPT-something on MMLU.

nvidianemotron-3agentic-ai

Neural Dispatch · May 4 ·4 min read

NVIDIA's Nemotron 3 Nano Omni Crams Vision, Audio, and Reasoning Into One 3B-Active Model

Every multimodal agent demo I've seen follows the same script: chain a vision model to a language model, bolt on Whisper for audio, pray the latency...

nvidianemotronmultimodal

Open Weight Weekly · Apr 29 ·4 min read

A Trillion Parameters Under Your Desk: Kimi K2.6 Goes GGUF

Nine days ago Moonshot AI dropped Kimi K2.6 — a trillion-parameter MoE model that beats Claude Opus 4.

kimi-k2.6ggufquantization

Edge Deployed · Apr 27 ·5 min read

Llama 4 Scout Says 17B Active. Your Edge Device Still Loads 109B.

Meta's marketing for Llama 4 Scout leads with a seductive number: 17 billion active parameters.

mixture-of-expertsllama-4edge-inference

Open Weight Weekly · Apr 27 ·5 min read

GLM-5.1 Topped SWE-Bench Pro. Now Try Running It.

754 billion parameters. 40 billion active.

glm-5.1z-aimixture-of-experts

Open Weight Weekly · Apr 25 ·4 min read

DeepSeek V4 Pro Costs 15x More Than V3.2. Nobody's Complaining.

DeepSeek dropped V4 Pro and V4 Flash on Wednesday, and the numbers shut up most of the skeptics before they could finish typing. V4 Pro — 1.

deepseek-v4mixture-of-expertsbenchmarks

Neural Dispatch · Apr 24 ·4 min read

DeepSeek V4 Now Holds the Coding Crown, and You Can Download the Weights

Twenty-four hours. That's how long GPT-5.

deepseekdeepseek-v4open-source

Neural Dispatch · Apr 19 ·4 min read

Gemma 4 Matters More on a Raspberry Pi Than a Data Center

Google DeepMind released Gemma 4 at the start of April, and the AI community immediately zeroed in on the benchmark charts.

gemma-4google-deepmindopen-source

Neural Dispatch · Apr 19 ·5 min read

GLM-5.1 Can Code Autonomously for 8 Hours Straight. I Tried to Break That Claim.

Two weeks ago, Z.ai (the company formerly known as Zhipu AI) dropped an open-weight model that claims to beat Claude Opus 4.

glm-5.1zhipu-aiopen-source

Open Weight Weekly · Apr 17 ·4 min read

Qwen Burned 97% of Its Parameters and the Code Still Compiles

Qwen's latest coding model has 80 billion parameters and uses 3 billion of them.

qwen3-coder-nextmixture-of-expertscoding-agents

Open Weight Weekly · Apr 15 ·5 min read

Llama 4 Scout Is the Most Downloaded Model of April. It's Also a Mess.

Llama 4 Scout hit 1.2 million downloads in its first two weeks on HuggingFace.

llama-4metamixture-of-experts

Neural Dispatch · Apr 14 ·5 min read

Gemma 4's Apache 2.0 License Matters More Than Its Benchmarks

The most important thing Google shipped with Gemma 4 isn't a model. It's a license.

gemma-4google-deepmindapache-2-0

Neural Dispatch · Apr 12 ·4 min read

GLM-5.1 Topped SWE-Bench Pro Without a Single Nvidia Chip

A Chinese AI lab just shipped the world's best coding model — 744 billion parameters, MIT license, trained entirely on Huawei chips — and most Western...

glm-5.1z-aiopen-source

Neural Dispatch · Apr 10 ·5 min read

Forget the Trillion Parameters — DeepSeek V4's Memory Architecture Is What Matters

Three days ago, screenshots from DeepSeek's gray-scale test started circulating on Weibo.

deepseekdeepseek-v4engram

1 / 2 Next →