fix(proxy-wasm) fix memory growth on low-isolation modes #420

hishamhm · 2023-09-22T21:31:02Z

We were observing linear memory growth when running requests in low-isolation modes.

The root cause comes from the fact that small allocations had instance-long lifetimes (including, for example, the arrays for args and rets for each Wasm call, causing the observable linear growth when running wrk).

Turns out that ngx_pfree, which was being used in the code to free args and rets, does not free small allocations. It only frees large allocations; small allocations are only freed when the pool itself is destroyed.

I observed an initial improvement by moving args and rets allocations to the stack. I added this as a perf() commit here since I think this is still a worthwhile change, given these are usually very small arrays.

The general solution, though, is to use a request-lived pool for pwctx, so we don't hoard small request-specific allocations over the lifetime of an instance.

A third commit changes the setup of the pwexec pool so that it uses the pwctx pool under all isolation modes, like it does for logs. Wasm isolation mode should pertain only to the Wasm-accessible store, not to the temporary C-side data structures, which we now free on each request.

Given our new understanding of ngx_pfree, I think there are probably more opportunities in the codebase for replacing known-small data which is deterministically freed either with a lighter-weight ngx_alloc/ngx_free pair, or even with stack allocations, but I wanted to keep this PR straight-to-the-point and not go hunting for more changes.

Before and after memory graphs:

codecov · 2023-09-22T22:26:16Z

Codecov Report

Merging #420 (b10014c) into main (f57914c) will decrease coverage by 0.11524%.
The diff coverage is 59.09091%.

Additional details and impacted files

@@                 Coverage Diff                 @@
##                main        #420         +/-   ##
===================================================
- Coverage   90.26802%   90.15278%   -0.11525%     
===================================================
  Files             46          46                 
  Lines           8395        8378         -17     
===================================================
- Hits            7578        7553         -25     
- Misses           817         825          +8

Files	Coverage Δ
src/common/proxy_wasm/ngx_proxy_wasm.h	`91.42857% <ø> (ø)`
src/common/proxy_wasm/ngx_proxy_wasm.c	`92.45283% <87.50000%> (-0.38425%)`	⬇️
src/wasm/wrt/ngx_wrt_wasmtime.c	`81.94444% <42.85714%> (-2.45272%)`	⬇️

... and 1 file with indirect coverage changes

thibaultcha

Very nice! And also good updates overall.

Turns out that ngx_pfree, which was being used in the code to free args and rets, does not free small allocations. It only frees large allocations; small allocations are only freed when the pool itself is destroyed.

Hmm... Welp, that much I knew (that's also why the nopool patch exists), so my first thought upon reading your summary was to think that request pools are also destroyed... 🤔 So far its been the intention to attach everything to the request pool. So, checking on ngx_http_proxy_wasm.c:361 says that for main requests, we allocate the filter chain in r->connection->pool; I think that was a wrong choice, those pools might not get destroyed, pretty sure; overall I don't think this ever changed since the prototype... When subrequest filter chains became tested I fixed it for them but left main requests on the connection pool, probably not wanting to change something that at the time, seemed to work.

Maybe using r->pool in all cases would have been a sufficient fix, but I do like better giving filter chains their own pools.

thibaultcha · 2023-09-23T02:12:22Z

src/common/proxy_wasm/ngx_proxy_wasm.c

@@ -242,7 +242,8 @@ ngx_proxy_wasm_ctx_alloc(ngx_pool_t *pool)
        return NULL;
    }

-    pwctx->pool = pool;
+    pwctx->pool = ngx_create_pool(128, pool->log);


The call can fail, let's catch the error; perhaps also make 128 a #define? It probably is a little low too, no?

Another note too, I think we could first create the pool, then allocate pwctx inside of it? That way no need to keep track of the parent pool at all, and it could even disappear from this function's arguments.

The call can fail, let's catch the error

Will do!

perhaps also make 128 a #define? It probably is a little low too, no?

It was a coin toss, as both 128 and 512 are used in other pools:

] g -r create_pool common/lua/ngx_wasm_lua.c 128: pool = ngx_create_pool(512, log); common/ngx_wasm_socket_tcp.c 573: c->pool = ngx_create_pool(128, sock->log); http/proxy_wasm/ngx_http_proxy_wasm_dispatch.c 210: call->pool = ngx_create_pool(512, r->connection->log); http/ngx_http_wasm_util.c 626: c->pool = ngx_create_pool(128, c->log);

ok, so I actually took the time now to understand how this size argument for ngx_create_pool works (it's the memory block size that holds the ngx_pool_s plus an allotment for small allocations, and when that gets full another one of the same size gets allocated, and it's intended to be at most ngx_pagesize - 1, usually 4095). So yeah, 512 should be fine.

I didn't make it a #define because none of the others did, so I was following the existing patterns — was there a criteria for picking 128 or 512, or should we #define them all to the same number?

Yes, it is how it works. Maybe we can have NGX_WASM_POOL_SIZE_S and NGX_WASM_POOL_SIZE_M with 128 and 512? Fine to hard-code 512 as well given we already do it, we can #define later.

The reasoning for not giving pwctx its own pool was to avoid an additional memory allocation for each request (which nginx tries to avoid in general) -> everything goes into the request memory pool. For dispatch calls or fake connections, we give them their own pool since not all filters will use those operations. But all requests will have to allocate a filter chain, hence r->pool... All-in-all, I think we should keep the r->pool option open (hence the #if 0 block comment).

I think we could first create the pool, then allocate pwctx inside of it?

Can do, if you prefer this pattern. Personally, I always try to organize memory allocations as an ownership tree, that helps me track the lifetimes of allocations by looking at the structure definitions rather than by tracking the execution flow. Having the structure that owns the pool stored inside the pool itself makes the ownership relationship circular, which takes more attentive coding as it gets harder and harder to track all flows in your head as things grow, so it's a pattern I've always avoided. (I believe it's the kind of pattern you cannot even represent (easily?) in Rust, for example — I guess I've been "thinking in Rust" before Rust existed 😅 )

I wanted to avoid refactors to keep the PR small and the review easier, but if it were up to me, I'd prefer instead to make ngx_proxy_wasm_ctx_alloc and ngx_proxy_wasm_ctx_destroy more symmetrical not by removing the parent pool argument, but by passing it to the destroy function as well, making them more like ngx_pcalloc and ngx_pfree (which would then remove the parent_pool field from the struct).

That, of course, requires that the owner of the parent pool is reachable at destruction time. From a quick look at the code it seems that those alloc and destroy functions are called at pretty different places (one under http/, another under common/), so a change to make things more symmetrical would be more code surgery than I think should happen in this PR (though I still think that it could have potential benefits, at the very least to force a rethink/clarification of the pool lifetimes). I'm not a big fan of the parent_pool field either (I try to avoid "backlinks" to the owner structure whenever possible), but at least I like it better than the apparently-circular ownership. But I can go either way, no strong feelings here, just talking shop to share my thought process.

src/wasm/wrt/ngx_wrt_wasmtime.c

thibaultcha · 2023-09-23T02:25:34Z

src/wasm/wrt/ngx_wrt_wasmtime.c

+        wargs = &swargs[0];
+
+    } else {
+        wargs = ngx_pcalloc(instance->pool,


It would be great to keep both of these cases covered via the ngx_wasm_debug module with a dummy host function interface perhaps.

I took a look but I wasn't able to find my way right away... I'm assuming the idea is to add a new ngx_command_t from which to call ngx_wavm_instance_call_func (or funcref?) on a Wasm function with 11 args and 11 rets.

How do I get a ngx_wavm_instance_t * from the debug module?

And how do I get an ngx_wavm_func_t * (or ngx_wavm_funcref_t *?) For this, I think I can do ngx_wavm_module_func_lookup by name if I have a ngx_wavm_module_t; and to get the module I can do ngx_wavm_module_lookup. But then the question becomes: how do I get a ngx_wavm_t * from the debug module?

I'm assuming I'll need to add a .wat file in the test that declares the 11-arg and 11-ret function (since neither Rust or Go do multi-ret functions); but that part I think I can figure out.

Both the instance creation and the dummy host function interface will have to be coded into the debug module.

I managed to declare the host function interface and write the test case, but I tried to access the wcf->vm from ngx_wasm_core module to link the module, to avoid having to recreate the whole VM setup, and I think I ended up lost in module initialization ordering shenanigans. I think I might have ended up sinking more time trying to get this test case running than fixing the memory issue. If the lack of coverage is a blocker, I'm inclined to drop the perf(wasmtime) commit from this PR to get the fix(proxy-wasm) merged.

We will do it later then.

src/common/proxy_wasm/ngx_proxy_wasm.c

thibaultcha · 2023-09-25T15:52:41Z

I think the 2 fixes are the same one; they don't really make a lot of sense individually since in & out themselves they don't fix the memory growth issue.

We were observing linear memory growth when running requests in low-isolation modes. The root cause comes from the fact that small allocations had instance-long lifetimes (including, for example, the arrays for `args` and `rets` for each Wasm call, causing the observable linear growth when running wrk). Turns out that `ngx_pfree`, which was being used in the code to free `args` and `rets`, does not free small allocations. It only frees large allocations; small allocations are only freed when the pool itself is destroyed. The solution is to use a request-lived pool for `pwctx`, so we don't hoard small request-specific allocations over the lifetime of an instance. We also change the setup of the `pwexec` pool so that it uses the `pwctx` pool under all isolation modes, like it does for logs. Wasm isolation mode should pertain only to the Wasm-accessible store, not to the temporary C-side data structures, which we now free on each request.

Perform adjustment of arguments and return values in the stack for the common case of functions with fewer than 10 arguments and/or 10 returns.

hishamhm · 2023-09-26T19:55:05Z

This Valgrind report looks legitimate:

==44506== 48 bytes in 1 blocks are definitely lost in loss record 5 of 33
==44506==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==44506==    by 0x178092: ngx_alloc (ngx_alloc.c:22)
==44506==    by 0x1526A1: ngx_create_pool (ngx_palloc.c:21)
==44506==    by 0x2AAD1C: ngx_proxy_wasm_ctx_alloc (ngx_proxy_wasm.c:245)
==44506==    by 0x2BCA46: ngx_http_proxy_wasm_ctx (ngx_http_proxy_wasm.c:361)
==44506==    by 0x2B69E6: ngx_http_wasm_ffi_get_property (ngx_wasm_lua_ffi.c:218)
==44506==    by 0x4E94FE1: lj_vm_ffi_call (in /home/runner/work/ngx_wasm_module/ngx_wasm_module/work/buildroot/prefix/luajit/lib/libluajit-5.1.so.2.1.0)
==44506==    by 0x4EF44EF: lj_ccall_func (lj_ccall.c:1382)
==44506==    by 0x4F101C4: lj_cf_ffi_meta___call (lib_ffi.c:230)
==44506==    by 0x4E92B92: lj_BC_FUNCC (in /home/runner/work/ngx_wasm_module/ngx_wasm_module/work/buildroot/prefix/luajit/lib/libluajit-5.1.so.2.1.0)
==44506==    by 0x4EADD08: lua_pcall (lj_api.c:1145)
==44506==    by 0x238BA6: ngx_http_lua_do_call (ngx_http_lua_util.c:4159)
==44506==    by 0x250BCA: ngx_http_lua_init_worker_by_inline (ngx_http_lua_initworkerby.c:333)
==44506==    by 0x250A80: ngx_http_lua_init_worker (ngx_http_lua_initworkerby.c:298)
==44506==    by 0x17E385: ngx_single_process_cycle (ngx_process_cycle.c:299)
==44506==    by 0x15009C: main (nginx.c:383)

From what I could tell, it seems to happen because when calling an FFI function from init_worker, apwctx gets created "on-demand" by ngx_http_proxy_wasm_ctx, but apparently the correspondent ngx_proxy_wasm_ctx_destroy never gets called (it is only called in a DONE step), so pwctx->pool never gets freed (destroying the parent_pool is not sufficient because pools have internal allocations).

I think before the lack of a ngx_proxy_wasm_ctx_destroy was harmless before because all it did was call various ngx_pfree on structures that ultimate were also freed when the whole pool was destroyed, but this might have uncovered an issue in the pwctx lifecycle management when using the FFI.

thibaultcha · 2023-09-27T04:49:41Z

Patch with fix in #421. I will merge tomorrow but if you have more to squeeze in feel free to apply the patch here; it will be a little tricky rebasing on my proxy_wasm.c refactor which also fixes some lingering memory issues with instances, most of the file has changed but it is almost ready as well.

Such as fake request passed to the FFI via the init_worker_by_lua fake request.

hishamhm · 2023-09-27T14:34:34Z

Awesome, thanks! I've cherry-picked your commit. No further changes from my part unless you have further comments, so if you think it's ready, feel free to merge!

thibaultcha · 2023-09-27T18:21:24Z

Merged!

hishamhm force-pushed the fix/memory-pool branch from 7a0b343 to 7fb5bd2 Compare September 22, 2023 21:53

thibaultcha reviewed Sep 23, 2023

View reviewed changes

thibaultcha reviewed Sep 25, 2023

View reviewed changes

src/common/proxy_wasm/ngx_proxy_wasm.c Show resolved Hide resolved

hishamhm force-pushed the fix/memory-pool branch from 7fb5bd2 to 58f8960 Compare September 25, 2023 17:53

hishamhm force-pushed the fix/memory-pool branch from 58f8960 to 11ed65c Compare September 25, 2023 19:15

perf(wasmtime) avoid heap allocations in most calls

b10014c

Perform adjustment of arguments and return values in the stack for the common case of functions with fewer than 10 arguments and/or 10 returns.

hishamhm force-pushed the fix/memory-pool branch from 11ed65c to b10014c Compare September 25, 2023 19:32

thibaultcha mentioned this pull request Sep 27, 2023

[CI] fix(*) ensure fake requests filter chains are freed #421

Closed

fix(*) ensure fake requests filter chains are freed

47d8507

Such as fake request passed to the FFI via the init_worker_by_lua fake request.

thibaultcha closed this Sep 27, 2023

thibaultcha deleted the fix/memory-pool branch September 27, 2023 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy-wasm) fix memory growth on low-isolation modes #420

fix(proxy-wasm) fix memory growth on low-isolation modes #420

hishamhm commented Sep 22, 2023

codecov bot commented Sep 22, 2023 •

edited

Loading

thibaultcha left a comment •

edited

Loading

thibaultcha Sep 23, 2023

thibaultcha Sep 23, 2023

hishamhm Sep 25, 2023

thibaultcha Sep 25, 2023

thibaultcha Sep 25, 2023 •

edited

Loading

hishamhm Sep 25, 2023

thibaultcha Sep 23, 2023

hishamhm Sep 25, 2023 •

edited

Loading

thibaultcha Sep 25, 2023

hishamhm Sep 26, 2023

thibaultcha Sep 26, 2023

thibaultcha commented Sep 25, 2023 •

edited

Loading

hishamhm commented Sep 26, 2023

thibaultcha commented Sep 27, 2023 •

edited

Loading

hishamhm commented Sep 27, 2023

thibaultcha commented Sep 27, 2023

fix(proxy-wasm) fix memory growth on low-isolation modes #420

fix(proxy-wasm) fix memory growth on low-isolation modes #420

Conversation

hishamhm commented Sep 22, 2023

codecov bot commented Sep 22, 2023 • edited Loading

Codecov Report

thibaultcha left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thibaultcha Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hishamhm Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thibaultcha commented Sep 25, 2023 • edited Loading

hishamhm commented Sep 26, 2023

thibaultcha commented Sep 27, 2023 • edited Loading

hishamhm commented Sep 27, 2023

thibaultcha commented Sep 27, 2023

codecov bot commented Sep 22, 2023 •

edited

Loading

thibaultcha left a comment •

edited

Loading

thibaultcha Sep 25, 2023 •

edited

Loading

hishamhm Sep 25, 2023 •

edited

Loading

thibaultcha commented Sep 25, 2023 •

edited

Loading

thibaultcha commented Sep 27, 2023 •

edited

Loading