Description
Summary
Kestrel currently doesn't use the normal memory pool. It uses a byte- array and keep expanding it - without ever shrinking.
This issue is about coming up with good logic about when the memory pool should shrink and how.
People with more context
@halter73, @shirhatti @davidfowl
Motivation and goals
Today the server implementations in ASP.NET Core (Kestrel, IIS, and HTTP.sys) do not use the ArrayPool
, it uses custom pool called the SlabMemoryPool. The buffers are pinned because they are used for IO (mostly pinvoke layers). We rarely pin user provided buffers and can generally avoid fragmentation by pinning up front for the lifetime of the application (at least that was the idea).
This pool allocates slabs of memory 128K on the POH and slices them into 32 4K blocks (aligned 4K blocks). If there are no free blocks, a new 128K slab is allocated (32K more blocks). Before the POH it used the 128K large allocation to get the byte[] into the LOH.
Now for the big problem:
- The pool never shrinks, and this has been the case since the beginning of ASP.NET Core.
- We need it to shrink in 2 cases, when there's memory pressure and when it would be "productive" to remove unused memory (this case is trickier).
ASP.NET Core tries its best to avoid holding onto buffers from the pool for an extended period as best it can. It does this by delaying the allocation until there's data to be read from the underlying IO operation (where possible). This helps but doesn't solve the memory problem in a bunch of cases:
- If the client sends data slow enough to not sever the connection, it can force us to allocate and do more reads (this is rare).
- Large payloads both incoming and outgoing (this is more common). Big JSON requests (megabytes) or big JSON responses. This has improved with System.Text.Json because it's a streaming JSON serializer.
- gRPC scenarios (streaming etc) - Each message is fully buffered before being parsed by the protobuf library (the serializer is synchronous).
- Lots of concurrent Websockets that send occasional data. This usually results in bursts of activity that results in a bunch of allocations.
Traffic spikes result in allocating a bunch of memory in the MemoryPool
that never gets removed. This is beginning to show up more in constrained container scenarios where memory is limited.
The goal is to reduce memory consumption when not at peak load.
Risks / unknowns
This is hard to get right and could become a configuration nightmare if we can't do enough automatically. It could also regress performance if we need to "collect memory" on the allocation path or any hot path.