benchmarking ways of doing lowlevel work in dotnet.
- Don't do cheap work in parallel. For example, in some of these benchmarks I get a 2x speedup for 16x the cpu cost.
- For lookups,
Span
is fastest. PreferArray
/List
for lookups in hotpaths overDictionary
/ConcurrentDictionary
. - Avoid add/remove from
ConcurrentDictionary
in hotpaths due to allocations. Interlocked
is expensive to use in hotpaths. do per-thread sums or write out to a seperate results span for processing back on the main thread. Using aForRange()
parallel work function is best, for example:/Helpers/Extras/ForRange.cs
- As per
Kozi
on the C# Discord:The more important lesson here is "don't write to the same memory region from multiple threads if possible". Writes within the same cache line will slow access on other threads. And THAT'S why doing it per-thread is better. And only summing at the end. You minimise the writes to a shared cache line. https://www.youtube.com/watch?v=WDIkqP4JbkE&t=247s
- As per
- Linq and PLinq are not that bad. Not super great, but not that bad.
MemoryOwner<T>
is your friend.
these are the benchmarks, contained in subfolders of /Benchmarks/
. Look at each sub folder for a ReadMe.md
with individual findings:
Collections_Threaded
checks speed/correctness of doing collection read/writes from threadsParallel_Work
checks doing work onSpan<T>
from threadsParallel_Lookup
checks a real-world critical path scenario, random access lookup of 100,000 entities. Benchmark tests using different backing storage collections and Sequential vs Parallel.
- open solution in visual studio 2022
- run solution
- pick a benchmark
- wait a long time for benchmarks to run
Program.cs
- entrypointBenchmarks/*/*.cs
- benchmark testsHelpers
- helpers for the benchmarking, such as:DumbWork.cs
- helper containing input data and output verification logicData.cs
- helper containing structure of test data worked on in benchmarkszz_Extensions.cs
- extension method forSpan<T>
andArray
to make parallel easier.