Posts tagged with mixture-of-experts

Qwen3-Coder-Next Runs Like a 3B Model and Codes Like a 70B

Alibaba's Qwen team has a running habit of shipping models that make you question your assumptions about parameter counts.

qwen3-coder-nextalibabamixture-of-experts

Neural Dispatch · Jul 4 ·4 min read

The $4.40 Model That Codes Better Than GPT-5.5

Z.ai dropped GLM-5.

glm-5-2z-aiopen-weights

Neural Dispatch · Jul 3 ·5 min read

Meituan's LongCat-2.0 Was Secretly Winning OpenRouter — on Zero Nvidia Hardware

For two months, a model called "Owl Alpha" quietly climbed the OpenRouter charts. No launch event, no press tour, no Twitter hype cycle.

longcat-2meituanopen-source

Neural Dispatch · Jun 3 ·4 min read

A 5-Billion-Parameter Coder From Microsoft Just Beat Haiku by 16 Points on SWE-Bench

Two days ago at Build, Microsoft did something it's never done before: announced a full family of homegrown AI models that compete directly with the...

microsoft-buildmai-thinking-1mai-code-1-flash

Open Weight Weekly · Jun 3 ·5 min read

MiniMax M3 Claims to Beat GPT-5.5. The Weights Aren't Out Yet.

Two days ago MiniMax dropped a launch announcement for M3 that reads like a greatest-hits compilation of everything the open-weight community has been asking...

minimax-m3sparse-attentionmixture-of-experts

Neural Dispatch · Jun 2 ·5 min read

Project Polaris: GitHub Built Its Own Coding Model to Replace OpenAI Inside Copilot

The biggest announcement from Microsoft Build 2026 wasn't a new Azure service or another Windows agent feature. It was a quiet decoupling.

project-polarisgithub-copilotmicrosoft-build

Open Weight Weekly · May 30 ·5 min read

Poolside Spent $626M Training in Secret. Their Open Model Runs on a MacBook.

Poolside raised $626 million, hired a team of ex-DeepMind and ex-Meta researchers, and then went quiet for nearly three years.

laguna-xs2poolsidemixture-of-experts

Neural Dispatch · May 28 ·5 min read

How 760 Million Active Parameters Outscored Claude on HMMT — Without a Single NVIDIA GPU

Zyphra just dropped a model that makes me question everything I assumed about parameter efficiency. ZAYA1-8B is an 8.

open-sourcemixture-of-expertsamd

Open Weight Weekly · May 27 ·5 min read

LLaDA 2.0 Doesn't Predict the Next Token. It's Faster Than Models That Do.

Every large language model you've touched — GPT, Claude, Llama, Qwen, Gemma — generates text the same fundamental way. Predict the next token.

llada-2discrete-diffusionant-group

Neural Dispatch · May 25 ·4 min read

DeepSeek V4 Pro Costs 13x Less Than GPT-5.5. NIST Measured the Gap.

DeepSeek just made a move that'll force some uncomfortable spreadsheet conversations. The 75% promotional discount on V4 Pro?

deepseekopen-weightsnist

Open Weight Weekly · May 25 ·5 min read

Zyphra Trained on AMD. The Benchmarks Don't Care.

A model with 760 million active parameters just scored 91.9% on AIME 2025 with test-time compute.

zaya1-8bzyphraamd

Open Weight Weekly · May 18 ·5 min read

Kimi K2.6 Tops Every Coding Benchmark. Good Luck Self-Hosting It.

Moonshot AI's Kimi K2.6 quietly became the strongest open-weight coding model on the planet three weeks ago, and the discourse has been weirdly muted.

kimi-k2.6moonshot-aimixture-of-experts

Neural Dispatch · May 17 ·4 min read

Nemotron 3 Super's Latent MoE Trick Gives You 120B Brains on a 12B Budget

NVIDIA quietly shipped one of the most interesting open model architectures I've seen this year, and most of the coverage buried the lede.

nvidianemotron-3mixture-of-experts

Open Weight Weekly · May 16 ·5 min read

744 Billion Parameters, Zero NVIDIA GPUs

Z.ai's GLM-5.

glm-5.1zhipu-aihuawei-ascend

Open Weight Weekly · May 15 ·4 min read

DeepSeek Shipped Two Models. You Only Need the Smaller One.

DeepSeek V4-Pro grabbed headlines three weeks ago with an 80.

deepseek-v4open-weightsself-hosting

Neural Dispatch · May 13 ·5 min read

NVIDIA's Nemotron 3 Trades Transformer Purity for 5x Agent Throughput

Most open models fight for the same throne: highest MMLU score, best SWE-Bench pass rate, flashiest reasoning demo.

nvidianemotron-3open-weights

Neural Dispatch · May 9 ·5 min read

Kimi K2.6 Won a Live Coding Tournament Against GPT-5.5. The Catch? There Isn't One.

On May 3rd, Moonshot AI's Kimi K2.6 walked into a live programming challenge and finished first — 22 match points, a 7-1-0 record — ahead of GPT-5.

kimimoonshot-aiopen-weights

Neural Dispatch · May 8 ·5 min read

NVIDIA Nemotron 3 Doesn't Want to Chat With You — It Wants to Be Your Agent's Brain

Most open model launches follow a predictable script: bigger parameters, higher benchmark scores, a claim about beating GPT-something on MMLU.

nvidianemotron-3agentic-ai

Neural Dispatch · May 4 ·4 min read

NVIDIA's Nemotron 3 Nano Omni Crams Vision, Audio, and Reasoning Into One 3B-Active Model

Every multimodal agent demo I've seen follows the same script: chain a vision model to a language model, bolt on Whisper for audio, pray the latency...

nvidianemotronmultimodal

Open Weight Weekly · Apr 29 ·4 min read

A Trillion Parameters Under Your Desk: Kimi K2.6 Goes GGUF

Nine days ago Moonshot AI dropped Kimi K2.6 — a trillion-parameter MoE model that beats Claude Opus 4.

kimi-k2.6ggufquantization

1 / 2 Next →