Established 2023 · Updated continuously

Production machine learning, deep-dived.

ML Systems Review is an independent engineering publication covering production machine learning systems — architecture case studies, benchmarks, and long-form investigations into how AI products actually work. No sponsorship, no affiliate links, no marketing copy.

Case studies Benchmarks Full archive

Latest

View all →

Benchmarks

Apple M4 Max first NPU benchmarks: tflops per watt analysis

First inference benchmarks on the M4 Max 38 TOPS NPU: ViT-L/16 throughput, INT8 quant impact, and tflops per watt vs 4090.

By Lukas Berg · April 16, 2026

ML Ecosystem

The llama.cpp 2026 rewrite: what changed in the inference engine

Kernel generator, KV cache rearrangement, Metal/CUDA backend unification — a 2.1x throughput delta on 70B quantized models.

By Priya Ramachandran · April 15, 2026

Model Architecture

DeepSeek-V3.5 paper notes: what's actually novel

Reading notes on the DeepSeek-V3.5 release: MoE routing updates, efficiency gains, and which contributions hold up versus rebadged 3.1.

By Dr. Marcus Brennan · April 13, 2026

ML Ecosystem

The Hugging Face ecosystem: what changed in 2026

Transformers 5.0, Spaces v2, revised Inference Endpoints pricing, Diffusers consolidation.

By Priya Ramachandran · April 10, 2026

Featured

Pinned by the editors

Case Study

Anatomy of a production ML failure: Zillow's iBuy collapse

By Dr. Nadia Volkov · September 30, 2024

Distributed Systems

CRDTs in production: lessons from Figma's multiplayer engine

How Figma's multiplayer engine keeps hundreds of concurrent editors in sync with CRDTs, plus the operational-transform alternative they rejected.

By Priya Ramachandran · November 5, 2023

Distributed Systems

Discord's architecture: why they're migrating from Elixir to Rust

BEAM scaling walls, NIF interop, and the selective Rust migration pattern Discord used for their hot-path services.

By Priya Ramachandran · February 12, 2024

What we cover

Architecture case studies, reproducible benchmarks, MLOps and reliability, and post-mortems of production ML failures. Topics where the engineering matters more than the model.

Who writes for us

Five engineers and researchers with graduate degrees from Stanford, CMU, Berkeley, and Oxford, and a decade-plus of combined production ML experience across startups, mid-sized tech, and consulting. Every article is reviewed for technical accuracy before publication.

How we stay independent

MLSR is founder-funded. We take no sponsorships, affiliate commissions, or paid placements. See our editorial standards for the full policy.