Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(proxy-wasm) fix memory growth on low-isolation modes #420

Closed
wants to merge 3 commits into from

Conversation

hishamhm
Copy link
Contributor

We were observing linear memory growth when running requests in low-isolation modes.

The root cause comes from the fact that small allocations had instance-long lifetimes (including, for example, the arrays for args and rets for each Wasm call, causing the observable linear growth when running wrk).

Turns out that ngx_pfree, which was being used in the code to free args and rets, does not free small allocations. It only frees large allocations; small allocations are only freed when the pool itself is destroyed.

I observed an initial improvement by moving args and rets allocations to the stack. I added this as a perf() commit here since I think this is still a worthwhile change, given these are usually very small arrays.

The general solution, though, is to use a request-lived pool for pwctx, so we don't hoard small request-specific allocations over the lifetime of an instance.

A third commit changes the setup of the pwexec pool so that it uses the pwctx pool under all isolation modes, like it does for logs. Wasm isolation mode should pertain only to the Wasm-accessible store, not to the temporary C-side data structures, which we now free on each request.

Given our new understanding of ngx_pfree, I think there are probably more opportunities in the codebase for replacing known-small data which is deterministically freed either with a lighter-weight ngx_alloc/ngx_free pair, or even with stack allocations, but I wanted to keep this PR straight-to-the-point and not go hunting for more changes.

Before and after memory graphs:

image
image

@codecov
Copy link

codecov bot commented Sep 22, 2023

Codecov Report

Merging #420 (b10014c) into main (f57914c) will decrease coverage by 0.11524%.
The diff coverage is 59.09091%.

Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##                main        #420         +/-   ##
===================================================
- Coverage   90.26802%   90.15278%   -0.11525%     
===================================================
  Files             46          46                 
  Lines           8395        8378         -17     
===================================================
- Hits            7578        7553         -25     
- Misses           817         825          +8     
Files Coverage Δ
src/common/proxy_wasm/ngx_proxy_wasm.h 91.42857% <ø> (ø)
src/common/proxy_wasm/ngx_proxy_wasm.c 92.45283% <87.50000%> (-0.38425%) ⬇️
src/wasm/wrt/ngx_wrt_wasmtime.c 81.94444% <42.85714%> (-2.45272%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Member

@thibaultcha thibaultcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! And also good updates overall.

Turns out that ngx_pfree, which was being used in the code to free args and rets, does not free small allocations. It only frees large allocations; small allocations are only freed when the pool itself is destroyed.

Hmm... Welp, that much I knew (that's also why the nopool patch exists), so my first thought upon reading your summary was to think that request pools are also destroyed... 🤔 So far its been the intention to attach everything to the request pool. So, checking on ngx_http_proxy_wasm.c:361 says that for main requests, we allocate the filter chain in r->connection->pool; I think that was a wrong choice, those pools might not get destroyed, pretty sure; overall I don't think this ever changed since the prototype... When subrequest filter chains became tested I fixed it for them but left main requests on the connection pool, probably not wanting to change something that at the time, seemed to work.

Maybe using r->pool in all cases would have been a sufficient fix, but I do like better giving filter chains their own pools.

@@ -242,7 +242,8 @@ ngx_proxy_wasm_ctx_alloc(ngx_pool_t *pool)
return NULL;
}

pwctx->pool = pool;
pwctx->pool = ngx_create_pool(128, pool->log);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call can fail, let's catch the error; perhaps also make 128 a #define? It probably is a little low too, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note too, I think we could first create the pool, then allocate pwctx inside of it? That way no need to keep track of the parent pool at all, and it could even disappear from this function's arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call can fail, let's catch the error

Will do!

perhaps also make 128 a #define? It probably is a little low too, no?

It was a coin toss, as both 128 and 512 are used in other pools:

] g -r create_pool
common/lua/ngx_wasm_lua.c 128:    pool = ngx_create_pool(512, log);
common/ngx_wasm_socket_tcp.c 573:        c->pool = ngx_create_pool(128, sock->log);
http/proxy_wasm/ngx_http_proxy_wasm_dispatch.c 210:    call->pool = ngx_create_pool(512, r->connection->log);
http/ngx_http_wasm_util.c 626:        c->pool = ngx_create_pool(128, c->log);

ok, so I actually took the time now to understand how this size argument for ngx_create_pool works (it's the memory block size that holds the ngx_pool_s plus an allotment for small allocations, and when that gets full another one of the same size gets allocated, and it's intended to be at most ngx_pagesize - 1, usually 4095). So yeah, 512 should be fine.

I didn't make it a #define because none of the others did, so I was following the existing patterns — was there a criteria for picking 128 or 512, or should we #define them all to the same number?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is how it works. Maybe we can have NGX_WASM_POOL_SIZE_S and NGX_WASM_POOL_SIZE_M with 128 and 512? Fine to hard-code 512 as well given we already do it, we can #define later.

Copy link
Member

@thibaultcha thibaultcha Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning for not giving pwctx its own pool was to avoid an additional memory allocation for each request (which nginx tries to avoid in general) -> everything goes into the request memory pool. For dispatch calls or fake connections, we give them their own pool since not all filters will use those operations. But all requests will have to allocate a filter chain, hence r->pool... All-in-all, I think we should keep the r->pool option open (hence the #if 0 block comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could first create the pool, then allocate pwctx inside of it?

Can do, if you prefer this pattern. Personally, I always try to organize memory allocations as an ownership tree, that helps me track the lifetimes of allocations by looking at the structure definitions rather than by tracking the execution flow. Having the structure that owns the pool stored inside the pool itself makes the ownership relationship circular, which takes more attentive coding as it gets harder and harder to track all flows in your head as things grow, so it's a pattern I've always avoided. (I believe it's the kind of pattern you cannot even represent (easily?) in Rust, for example — I guess I've been "thinking in Rust" before Rust existed 😅 )

I wanted to avoid refactors to keep the PR small and the review easier, but if it were up to me, I'd prefer instead to make ngx_proxy_wasm_ctx_alloc and ngx_proxy_wasm_ctx_destroy more symmetrical not by removing the parent pool argument, but by passing it to the destroy function as well, making them more like ngx_pcalloc and ngx_pfree (which would then remove the parent_pool field from the struct).

That, of course, requires that the owner of the parent pool is reachable at destruction time. From a quick look at the code it seems that those alloc and destroy functions are called at pretty different places (one under http/, another under common/), so a change to make things more symmetrical would be more code surgery than I think should happen in this PR (though I still think that it could have potential benefits, at the very least to force a rethink/clarification of the pool lifetimes). I'm not a big fan of the parent_pool field either (I try to avoid "backlinks" to the owner structure whenever possible), but at least I like it better than the apparently-circular ownership. But I can go either way, no strong feelings here, just talking shop to share my thought process.

src/wasm/wrt/ngx_wrt_wasmtime.c Outdated Show resolved Hide resolved
wargs = &swargs[0];

} else {
wargs = ngx_pcalloc(instance->pool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to keep both of these cases covered via the ngx_wasm_debug module with a dummy host function interface perhaps.

Copy link
Contributor Author

@hishamhm hishamhm Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look but I wasn't able to find my way right away... I'm assuming the idea is to add a new ngx_command_t from which to call ngx_wavm_instance_call_func (or funcref?) on a Wasm function with 11 args and 11 rets.

  1. How do I get a ngx_wavm_instance_t * from the debug module?
  2. And how do I get an ngx_wavm_func_t * (or ngx_wavm_funcref_t *?) For this, I think I can do ngx_wavm_module_func_lookup by name if I have a ngx_wavm_module_t; and to get the module I can do ngx_wavm_module_lookup. But then the question becomes: how do I get a ngx_wavm_t * from the debug module?

I'm assuming I'll need to add a .wat file in the test that declares the 11-arg and 11-ret function (since neither Rust or Go do multi-ret functions); but that part I think I can figure out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the instance creation and the dummy host function interface will have to be coded into the debug module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to declare the host function interface and write the test case, but I tried to access the wcf->vm from ngx_wasm_core module to link the module, to avoid having to recreate the whole VM setup, and I think I ended up lost in module initialization ordering shenanigans. I think I might have ended up sinking more time trying to get this test case running than fixing the memory issue. If the lack of coverage is a blocker, I'm inclined to drop the perf(wasmtime) commit from this PR to get the fix(proxy-wasm) merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will do it later then.

@thibaultcha
Copy link
Member

thibaultcha commented Sep 25, 2023

I think the 2 fixes are the same one; they don't really make a lot of sense individually since in & out themselves they don't fix the memory growth issue.

We were observing linear memory growth when running requests in low-isolation
modes.

The root cause comes from the fact that small allocations had instance-long
lifetimes (including, for example, the arrays for `args` and `rets` for each
Wasm call, causing the observable linear growth when running wrk).

Turns out that `ngx_pfree`, which was being used in the code to free `args` and
`rets`, does not free small allocations. It only frees large allocations; small
allocations are only freed when the pool itself is destroyed.

The solution is to use a request-lived pool for `pwctx`, so we don't hoard
small request-specific allocations over the lifetime of an instance.

We also change the setup of the `pwexec` pool so that it uses the `pwctx`
pool under all isolation modes, like it does for logs. Wasm isolation mode
should pertain only to the Wasm-accessible store, not to the temporary C-side
data structures, which we now free on each request.
Perform adjustment of arguments and return values in the stack
for the common case of functions with fewer than 10 arguments
and/or 10 returns.
@hishamhm
Copy link
Contributor Author

This Valgrind report looks legitimate:

==44506== 48 bytes in 1 blocks are definitely lost in loss record 5 of 33
==44506==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==44506==    by 0x178092: ngx_alloc (ngx_alloc.c:22)
==44506==    by 0x1526A1: ngx_create_pool (ngx_palloc.c:21)
==44506==    by 0x2AAD1C: ngx_proxy_wasm_ctx_alloc (ngx_proxy_wasm.c:245)
==44506==    by 0x2BCA46: ngx_http_proxy_wasm_ctx (ngx_http_proxy_wasm.c:361)
==44506==    by 0x2B69E6: ngx_http_wasm_ffi_get_property (ngx_wasm_lua_ffi.c:218)
==44506==    by 0x4E94FE1: lj_vm_ffi_call (in /home/runner/work/ngx_wasm_module/ngx_wasm_module/work/buildroot/prefix/luajit/lib/libluajit-5.1.so.2.1.0)
==44506==    by 0x4EF44EF: lj_ccall_func (lj_ccall.c:1382)
==44506==    by 0x4F101C4: lj_cf_ffi_meta___call (lib_ffi.c:230)
==44506==    by 0x4E92B92: lj_BC_FUNCC (in /home/runner/work/ngx_wasm_module/ngx_wasm_module/work/buildroot/prefix/luajit/lib/libluajit-5.1.so.2.1.0)
==44506==    by 0x4EADD08: lua_pcall (lj_api.c:1145)
==44506==    by 0x238BA6: ngx_http_lua_do_call (ngx_http_lua_util.c:4159)
==44506==    by 0x250BCA: ngx_http_lua_init_worker_by_inline (ngx_http_lua_initworkerby.c:333)
==44506==    by 0x250A80: ngx_http_lua_init_worker (ngx_http_lua_initworkerby.c:298)
==44506==    by 0x17E385: ngx_single_process_cycle (ngx_process_cycle.c:299)
==44506==    by 0x15009C: main (nginx.c:383)

From what I could tell, it seems to happen because when calling an FFI function from init_worker, apwctx gets created "on-demand" by ngx_http_proxy_wasm_ctx, but apparently the correspondent ngx_proxy_wasm_ctx_destroy never gets called (it is only called in a DONE step), so pwctx->pool never gets freed (destroying the parent_pool is not sufficient because pools have internal allocations).

I think before the lack of a ngx_proxy_wasm_ctx_destroy was harmless before because all it did was call various ngx_pfree on structures that ultimate were also freed when the whole pool was destroyed, but this might have uncovered an issue in the pwctx lifecycle management when using the FFI.

@thibaultcha
Copy link
Member

thibaultcha commented Sep 27, 2023

Patch with fix in #421. I will merge tomorrow but if you have more to squeeze in feel free to apply the patch here; it will be a little tricky rebasing on my proxy_wasm.c refactor which also fixes some lingering memory issues with instances, most of the file has changed but it is almost ready as well.

Such as fake request passed to the FFI via the init_worker_by_lua fake
request.
@hishamhm
Copy link
Contributor Author

Awesome, thanks! I've cherry-picked your commit. No further changes from my part unless you have further comments, so if you think it's ready, feel free to merge!

@thibaultcha
Copy link
Member

Merged!

@thibaultcha thibaultcha deleted the fix/memory-pool branch September 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants