1. DeepSeek V4 Paper Exposes FP4 Quantization Breakthrough for Trillion-Parameter MoE Architecture
DeepSeek has released the full technical paper for its V4 model, expanding on an earlier 58-page preview with substantial additional technical depth. The document outlines how the team achieved FP4 quantization-aware training (QAT) directly in late-stage training—a departure from conventional approaches that typically ...