Linux Kernel 'War Stories' Repository Exposes Critical Bug Narratives, Performance Regressions, and CVE Case Studies
A new GitHub repository proposes compiling a definitive archive of Linux kernel 'war stories'—detailed narratives of catastrophic bugs, severe performance regressions, and critical CVE case studies. The project aims to document the symptom, investigation, root cause, fix, and ultimate design lesson for each major incident, creating a vital historical record for systems engineers and kernel developers.
The repository outlines three core document types. The first, `war-stories-bugs.md`, would chronicle three major incidents: the `zone_reclaim_mode` NUMA reclaim catastrophe (disabled by default in v3.16), THP compound page locking overhead on hugetlbfs AIO (fixed in v3.12), and a RSS `percpu_counter` inaccuracy that caused wrong OOM victims on 100+ CPU machines (introduced in v6.2, fixed in 2026). A second document, `war-stories-regressions.md`, details four performance regression case studies, including THP defrag stalls, automatic NUMA balancing overhead from v3.13, khugepaged CPU storms, and swap readahead window mismatches.
The third and most critical document, `war-stories-cves.md`, would trace the full lifecycle of four major security vulnerabilities. It begins with the infamous Dirty COW (CVE-2016-5195), detailing its exploitation primitive, the `FOLL_WRITE` fix, and the subsequent design changes that led to `FOLL_PIN` and `PG_anon_exclusive`. The narrative would also cover the more recent StackRot vulnerability (CVE-2023-3269) and its fix within the maple tree data structures. This collection serves as a stark reminder of the complex, high-stakes engineering required to maintain the world's most critical software infrastructure.