diff --git a/index.rst b/index.rst index 1b40e749..901912ad 100644 --- a/index.rst +++ b/index.rst @@ -26,6 +26,11 @@ by `Input Output Global `_. Release History --------------- +* February, 2025 + + * :ref:`The Checklist ` first draft finished. + * :ref:`Triage ` first draft finished. + * September, 2024 * :ref:`Memory Footprints of Data Types ` first draft finished. diff --git a/src/Measurement_Observation/Stg_RTS_Profiling/reduced_stack.rst b/src/Measurement_Observation/Stg_RTS_Profiling/reduced_stack.rst index 7c531630..aa6c354c 100644 --- a/src/Measurement_Observation/Stg_RTS_Profiling/reduced_stack.rst +++ b/src/Measurement_Observation/Stg_RTS_Profiling/reduced_stack.rst @@ -1,4 +1,4 @@ -.. Reduced Stack +.. _Reduced Stack: :lightgrey:`The Reduced Stack Method` ===================================== diff --git a/src/Optimizations/Code_Changes/unroll_stacks.rst b/src/Optimizations/Code_Changes/unroll_stacks.rst index 410063f7..1072f7e2 100644 --- a/src/Optimizations/Code_Changes/unroll_stacks.rst +++ b/src/Optimizations/Code_Changes/unroll_stacks.rst @@ -1,4 +1,4 @@ -.. _Unroll Stacks: +.. _Unroll Monad Transformers Chapter: :lightgrey:`Unroll Monad Transformer Stacks` ============================================ diff --git a/src/Preliminaries/how_to_use.rst b/src/Preliminaries/how_to_use.rst index a8e09d81..e9926399 100644 --- a/src/Preliminaries/how_to_use.rst +++ b/src/Preliminaries/how_to_use.rst @@ -32,12 +32,12 @@ picking your favorite Haskell library and attempting to optimize that! The book assumes you are using GHC |ghcVersion| and a Linux distribution (kernel version ``5.8`` and higher). Should you be using an older compiler than some -sections, such as :ref:`Using EventLog +sections, such as :ref:`using EventLog `; which arrived in ``GHC 8.8`` -may still be useful, while others such as :ref:`Using Cachegrind +may still be useful, while others such as :ref:`using Cachegrind `; which relies on :term:`DWARF` symbols (added in ``GHC 8.10.x``) may not be applicable. -Similarly, some chapters, such as :ref:`Using perf +Similarly, some chapters, such as :ref:`using perf ` will only be applicable for Linux and Linux based operating systems. @@ -61,21 +61,25 @@ Where to Begin -------------- The book is structured into discrete independent parts to better serve as a -handbook. Thus, the book is not meant to be read in a linear order. Instead, one -should pick and choose which chapter to read next based on their needs because -*the book assumes you have a problem that needs solving*. - -There are two parts: Part 1, focuses on measurement, profiling and observation -of Haskell programs. This part is ordered from the bottom-up; it begins with -tools and probes that are language agnostic and close to the machine, such as -:ref:`Perf ` and :ref:`Cachegrind `, then -proceeds through each `intermediate representation +handbook and is not meant to be read in a linear order. Instead, one should pick +and choose which chapter to read next based on their needs because *the book +assumes you have a problem that needs solving*. + +The best place to start is :ref:`triage `, this should help you +narrow down your next steps. If you are short on time, or just have a problem to +solve, then skip to the :ref:`checklist `. + +The book is roughly divided into two parts: Part 1, focuses on measurement, +profiling and observation of Haskell programs. This part is ordered from the +bottom-up; it begins with tools and probes that are language agnostic and close +to the machine, such as :ref:`perf ` and :ref:`cachegrind +`, then proceeds through each `intermediate representation `_ (IR) describing the tools, probes, and information available at each IR. Part 2, provides an ordered sequence of techniques to optimize code. It is ordered from the easiest methods, such as choosing the right libraries; to the -hardest and more invasive methods, such as exploiting :ref:`Backpack ` for fine-grained :term:`Unboxed` data types or exploiting :term:`Levity Polymorphism` to control the runtime representation of a data type. diff --git a/src/Preliminaries/index.rst b/src/Preliminaries/index.rst index 46db39af..46de35c2 100644 --- a/src/Preliminaries/index.rst +++ b/src/Preliminaries/index.rst @@ -6,6 +6,8 @@ Preliminaries :name: Preliminaries how_to_use + triage + the_checklist what_makes_fast_hs philosophies_of_optimization golden_rules diff --git a/src/Preliminaries/philosophies_of_optimization.rst b/src/Preliminaries/philosophies_of_optimization.rst index 66a5a791..d58bdf40 100644 --- a/src/Preliminaries/philosophies_of_optimization.rst +++ b/src/Preliminaries/philosophies_of_optimization.rst @@ -172,7 +172,7 @@ statements for they will waste your time and iteration cycles. References and Footnotes -======================== +------------------------ .. [#] See `this `__ series by Casey Muratori. We thank him for his labor. diff --git a/src/Preliminaries/the_checklist.rst b/src/Preliminaries/the_checklist.rst new file mode 100644 index 00000000..deb3fa60 --- /dev/null +++ b/src/Preliminaries/the_checklist.rst @@ -0,0 +1,49 @@ +.. _The Checklist: + +The Checklist +============= + +Here is a checklist of things you might try to improve the performance of your +program. + +- [ ] Are you compiling with ``-O2``, ``-Wall``? +- [ ] Have you checked for :ref:`memory leaks ` on the heap? +- [ ] Have you checked for :ref:`stack leaks `? +- [ ] Have you :ref:`weighed ` your data structures? +- [ ] Do you understand the :ref:`memory footprints ` of your data types? +- [ ] Can you reduce the memory footprint of you data types? +- [ ] Are you using data structures that have a :ref:`low impedence ` to your problem? +- [ ] Have you set up benchmarks? Are they realistic? Do the exercise the full data space? +- [ ] Are your data types strict? Can you unpack them? +- [ ] Have you removed excessive polymorphism? +- [ ] Are you using ``Text`` or ``ByteString`` instead of ``String``? +- [ ] Can you inline and monomorphise critical functions, especially in hot loops? +- [ ] Are you :ref:`accidentally allocating ` in a hot loop? +- [ ] Are any functions in a hot loop taking more than five arguments? Can you reduce the number of arguments? +- [ ] Can you :ref:`defunctionalize ` critical functions? Is GHC defunctionalizing for you? +- [ ] Are you using a :ref:`left fold over a list `? +- [ ] Are your datatypes definitions ordered such that the most common constructor is first? +- [ ] Are you using explicit export lists? +- [ ] Have you checked for :userGuide:`missed specializations `? +- [ ] Have you checked the ratio of known to unknown function calls? +- [ ] Have you inspected the Core? +- [ ] Have you inspected the STG? +- [ ] Would your program benefit from compiling with ``LLVM``? +- [ ] Are you :ref:`shotgun parsing `? Can you lift information + into the type system to avoid subsequent checks over the same data? +- [ ] Are you grouping things that need the same processing together? +- [ ] Could your program benefit for concurrency of parallelism? +- [ ] Could your program benefit from the :ref:`one-shot monad trick `? +- [ ] Have you :ref:`unrolled ` your monad transformers? +- [ ] Have you inspected the :ref:`cache behavior `? + +.. + The grouping things should be about data oriented design and using things like zigs arraylist + +.. todo:: + Each item should have a concomitant link. + +See also +-------- + +- `This older checklist `__. diff --git a/src/Preliminaries/triage.rst b/src/Preliminaries/triage.rst new file mode 100644 index 00000000..ca9b5482 --- /dev/null +++ b/src/Preliminaries/triage.rst @@ -0,0 +1,70 @@ +.. _Triage: + +======== + Triage +======== + +This is a triage; it is the signpost that marks the start of your journey and +should give you enough direction to make your next steps. + +Symptoms +-------- + +You do not have a problem, but want to learn performance-oriented Haskell + Begin with the :ref:`Philosophies of Optimization `. Then read the :ref:`the programs of consistent + lethargy `, and some of the case studies. This + should give you enough to decide your next steps. If you decide to begin + doing some optimizations see the :ref:`checklist ` for more + ideas. + +You have a performance regression that you want to understand and fix + You need to diagnose the regression and begin thinking in terms of an + investigation. Read :ref:`how to debug ` to make sure + you know how to make progress. Since you have observed a regression, try to + find a commit or state in your project where you *do not* observe the + regression. This will let you bisect your project to narrow down the space + of changes that START. You may also consider other forms of profiling and + observation, such as: + + - Running a :ref:`tickyticky ` profile. + - :ref:`Checking the heap `. + - Inspecting the :ref:`Core `. + - Inspecting the :ref:`STG `. + - Observing the :ref:`cache behavior `. + - Observing the :ref:`CPU's Performance Counters `. + +You have a program that you want to begin optimizing + If you are short on time, begin with the :ref:`checklist ` + and then check for :ref:`memory leaks `. If not, begin with + the easy changes: + + - Use better datastructures. + - Carry checks in the type system so that the program is not always checking + the same predicates. + - Filter before you enter a hot loop. + - Remove niceties in the hot loops, such as logging. + - :ref:`Check the heap `. The :ref:`klister ` case study is a good example of this kind of optimization. + + Then move into the more invasive changes such as: + + - :ref:`unrolling ` your monad transformers. + - Using the :ref:`one-shot monad trick `. + - Selectively :ref:`defunctionalizing ` critical functions. + - Critically analyzing your architecture from a performance perspective. + +You have a program that you've optimized, but want to optimize more + If you have already harvested the low hanging fruit then you have likely + driven the program into a local maxima. Therefore, if you still need more + speed then you must make more invasive changes, such as we listed + above. However, the best changes you can make will exploit properties of the + problem domain to reduce the work your program must do to arrive at a + result. Often times these will be architectural changes. + + .. todo:: + + In lieu of having links for you continue in this case. You can search for + data-oriented design to begin refactoring your system in this manner. I + highly recommend this `this talk + `__ by Andrew Kelley. diff --git a/src/Preliminaries/what_makes_fast_hs.rst b/src/Preliminaries/what_makes_fast_hs.rst index 19b0a921..948a9e29 100644 --- a/src/Preliminaries/what_makes_fast_hs.rst +++ b/src/Preliminaries/what_makes_fast_hs.rst @@ -1,4 +1,4 @@ -.. _sec-lethargy: +.. _What Makes Fast HS Chapter: The Programs of Consistent Lethargy =================================== @@ -148,7 +148,6 @@ without thinking about their memory representation; and especially around laziness. As such, most of these instances are well known and have floated around the community for some time. - How does Excessive Pointer Chasing Slow Down Runtime Performance? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -256,7 +255,7 @@ quality); second, in order to observe it, the programmer must track the memory allocation of their program across many functions, modules and packages, which is not a common experience when writing Haskell. For our purposes', we'll inspect examples that GHC should have no problem finding and optimizing. See the -:ref:`Impact of seq Removal on SBV's cache ` case study for an example of excessive memory allocation in a widely used library. +:ref:`Impact of seq Removal on SBV's cache ` case study for an example of excessive memory allocation in a widely used library. .. todo:: Not yet written, see `#18 `_ @@ -267,6 +266,7 @@ transformations is beneficial; it trains you to start thinking in terms of memory allocation when reading or writing Haskell code, and teaches you to perform these optimizations manually when GHC fails to optimize. +.. _excessive-closure-allocation: How does Excessive Closure Allocation Slow Down Runtime Performance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -442,6 +442,8 @@ expressing itself in the implementation. Need example as case study see `#20 `_ +.. _shotgun-parsing: + Problem Domain Invariants are Difficult to Express """"""""""""""""""""""""""""""""""""""""""""""""""