"This book will be primarily useful for software developers who work with performance-criticalapplications and do low-level optimizations. To name just a few areas: High-PerformanceComputing (HPC), Game Development, data-center applications (like Facebook, Google, etc.),High-Frequency Trading. But the scope of the book is not limited to the mentioned industries.This book will be useful for any developer who wants to understand the performance of theirapplication better and know how it can be diagnosed and improved. The author hopes that3 the material presented in this book will help readers develop new skills that can be applied intheir daily work."
in the interests of full disclosure, i have been planning to write a book along these lines for about ten years, and have about four chapters of it done. so there's possibly a dash of the curmudgeon in my review. also, let me preface this by saying the book's freely available under CC, so baller move there, Denis, and how much shit can you really give Free Documentation? so three huzzahs for the CC.
most of the information here is valid and useful, though at least half the text will be review for anyone who remembers their graduate computer architecture classes. there's a walk through the various components of a high-performance modern processor, and the attribution of stalls to components (and correlation with performance monitoring counters) is well done, and makes up a majority of the book's best content.
overall, it reads like a series of blog posts, rather than a coherent whole (and indeed, several chapters have clear intellectual origin on the blogs of Perf Twitter, which counts @dendibakh among its ranks). editing goes from bad to laughable as the book goes on; i found a "larger" that ought have been "shorter" on page 29 or so (i notified the author); by the late chapters, the text loses many of its articles (as in 'the', 'an', 'a'), and you might find yourself unable to refrain from reading chapters aloud in an exaggerated russian accent. with that said, the meaning is usually clear, the lapses merely jarring.
the author shows fine forearm form on the front cover, can't deduct any points there. several people, in fact, pointed at the book as i carried it around, asking "are you working out/dieting/acquiring larger, better-toned forearms", and i had to reply "no, i am running to stay in place, lest the next generation kill me and take my cash monies, but it is running only of the mind and, like, stresses; i am fat". this of course elicited at least one "no you're not fat", and thus the Dance begins. as they said in Gravity's Rainbow: "it has happened before, but there is nothing to compare it to now" but i digress.
the development of Roofline and TMA analysis is welcome, especially with the thorough concordance between Andi Kleen's TMA scripts and the standard linux perf tool. it's a solid expansion of Ahmad Yasin's 2014 paper, and provides by itself a decent hand-held tour of high-level perf usage.
beyond that, things go downhill quickly. a great many references are to various corporate sites and webfora, and will likely become unreachable within a few years (stackoverflow is *not a permanent record*, people!). they're furthermore not citations so much as "if you'd like to know more than the basic terminology of this complex subject, read this." many topics in the back half of the book are given only a paragraph or two's coverage; there's no historical or motivational material; there is nowhere *near* the density of mini case studies and examples and evidence that a serious book on the subject needs to have. no interesting algorithms are presented. discussion of parallelism as it factors into language design is restricted to some garbage intel toy language of use to precisely no one save drunken children and sober goldfish (I can't easily look up this trivia question of a language, due to the absence of an index) -- the amount of Intel pimpery is overall kinda tacky. if i wish to be lectured about closed source tools like VTune, i'll just read my Intel Software Development Manuals where VTune is held up like a messiah child with shimmering mandorla. how do you write a book about performance tuning on modern processors, where modern is since 1972 or so, without mentioning IPIs? you cannot explain MOESI meaningfully in two paperback pages, and i'm not sure there's any mention of memory fences (again, no index).
how do you write a book that's going to be read largely for linux perf work, and not mention...
so i might sound more negative here than i intended to be: 3 is my default rating, the peak of my Gaussian; this is not at all a bad book, and i enjoyed reading it. i can't say i personally learned much, but again, i'm not perhaps the intended readership (the stuff on compiler optimization failure diagnostics was largely new to me, there we go).
with about twice or three times as much content, this could have been the perf-chasing handbook so many await. in its current size, it's more a fine survey of topics. it brought back into mental cache a number of things i'd let flush to the backbrain.
This book is about 170 pages, but don't let the size fool you, because there's a lot of useful information packed in these pages.
**Warning**: This book is not for you if you are looking for performance *optimization*. This is about performance tuning and analysis, the stage that comes after optimization. *However, I think even if you are just optimizing, you can benefit a lot from the analysis techniques mentioned in this book.*
**Warning 2:** Not a complete beginners guide. The author gives a background on almost every topic he explores but I think it is expected that the reader has some knowledge of CPU Architecture and the memory hierarchy. Some knowledge of assembly will also be useful.
The book taught me a lot more about processors and measuring performance. Knowing how to measure is essential if you want to optimize anything. Everything and I mean every topic is demonstrated by the author using various tools. I am definitely going to come back to it and use it as a reference.
This book is essential if you want to understand performance on modern CPUs. It is really beginner friendly and is accompanied with the authors blog so you can consider as a stepping stone to more advanced stuff such as Agner Fog's manual or Intel performance manuals.
Taking a star off due to the fact that the paper version of the book is not very optimized for reading: there are many color-codings (and the book is monochromatic) and many links to internet resources, so you have to continuously interrupt your reading, unpack your laptop and find the article the author is talking about. However, otherwise, it's a worthy reading: short, concise, very well-written. The examples provided are mostly about Intel and Intel profiler, however, the architectural advice can be extrapolated to other architectures, and there are many examples for Linux perf. The author goes through the basics of modern CPU architecture, profiling tools, common issues and their resolutions, points out how compilers analyze and optimize your code, and overall tunes you to look out for potential performance bottlenecks, thus basically establishing a mental framework for optimization and tuning of hot paths.
This book provides both, a good introduction to low level hardware and software of modern days. If you don't know what retiring is, or what are uops, you'll learn a lot from this book. Beside the theory it also includes a lot of samples and tools (and how to use them) that will allow you to probe and search for the performance of your application.
The book could be shorter and longer at the same time. Shorter, in parts like vectorization with LLVM and C++, and longer, where it dives in concepts like raw vectorization or multithreading. If you read already What Every Programmer Should Know About Memory, this could be a good CPU counterpart to the memory book.
An excellent summary of CPU performance optimizations as of 2020. It tells, in high level, how a modern CPU works, how to measure its performance, how to apply the measurement tools intelligently - introducing top-down microarch analysis - and finally, how to address individual performance bottlenecks. There is also a section on optimizing multithreaded applications which is very welcome as single-thread optimizations do not carry over to multiple threads straightforwardly.
Big-O analysis is outside its scope. Some topics are explored more in width than in depth and that's fine as there are numerous links provided to other resources should you need a more in-depth treatment of a particular topic.
Nice quick read. Not too deep, not shallow. Well written. Just to get a taste on low-level programming and take a pick under the hood. It's very accessible even if you don't know much about about CPU and hardware level. (though some chapters might be easier than others). The chapter on "Performance Analysis Approaches" definitely worth re-reading and taking notes. Second part of the book is less coherent and each chapter can be read independently on it's own.
Rereading it again this year and except for a glaring omission of index it has become my favorite book on perf analysis and tuning (not that there are many books at this detailed level in this genre!). There are many obscure low-level concepts mentioned in this book that made a lot of sense to me now after working on a low latency application at work for many years. This genre needs many further books like this - the few that I know of comes from the camp of Brendan Gregg and Dick Sites but this one is freely available and it is currently in revision for a second edition!!