Perf Book
The book "Performance Analysis and Tuning on Modern CPU"
...It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book walks through vectorization, memory layout, data-oriented design, and algorithmic choices, illustrating when compiler flags, intrinsics, or hand-rolled assembly make sense. It also demonstrates tool-driven workflows—using profilers and PMU events—to locate true bottlenecks and validate that changes actually help. ...