Skip to content

Allow server Memory Pool to shrink #27394

@mkArtakMSFT

Description

@mkArtakMSFT

Summary

Kestrel currently doesn't use the normal memory pool. It uses a byte- array and keep expanding it - without ever shrinking.
This issue is about coming up with good logic about when the memory pool should shrink and how.

People with more context

@halter73, @shirhatti @davidfowl

Motivation and goals

Today the server implementations in ASP.NET Core (Kestrel, IIS, and HTTP.sys) do not use the ArrayPool, it uses custom pool called the SlabMemoryPool. The buffers are pinned because they are used for IO (mostly pinvoke layers). We rarely pin user provided buffers and can generally avoid fragmentation by pinning up front for the lifetime of the application (at least that was the idea).

This pool allocates slabs of memory 128K on the POH and slices them into 32 4K blocks (aligned 4K blocks). If there are no free blocks, a new 128K slab is allocated (32K more blocks). Before the POH it used the 128K large allocation to get the byte[] into the LOH.

Now for the big problem:

  • The pool never shrinks, and this has been the case since the beginning of ASP.NET Core.
  • We need it to shrink in 2 cases, when there's memory pressure and when it would be "productive" to remove unused memory (this case is trickier).

ASP.NET Core tries its best to avoid holding onto buffers from the pool for an extended period as best it can. It does this by delaying the allocation until there's data to be read from the underlying IO operation (where possible). This helps but doesn't solve the memory problem in a bunch of cases:

  • If the client sends data slow enough to not sever the connection, it can force us to allocate and do more reads (this is rare).
  • Large payloads both incoming and outgoing (this is more common). Big JSON requests (megabytes) or big JSON responses. This has improved with System.Text.Json because it's a streaming JSON serializer.
  • gRPC scenarios (streaming etc) - Each message is fully buffered before being parsed by the protobuf library (the serializer is synchronous).
  • Lots of concurrent Websockets that send occasional data. This usually results in bursts of activity that results in a bunch of allocations.

Traffic spikes result in allocating a bunch of memory in the MemoryPool that never gets removed. This is beginning to show up more in constrained container scenarios where memory is limited.

The goal is to reduce memory consumption when not at peak load.

Risks / unknowns

This is hard to get right and could become a configuration nightmare if we can't do enough automatically. It could also regress performance if we need to "collect memory" on the allocation path or any hot path.

Metadata

Metadata

Assignees

Labels

Theme: meeting developer expectationsarea-networkingIncludes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractionsdesign-proposalThis issue represents a design proposal for a different issue, linked in the description

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions