Skip to content
This repository has been archived by the owner on Jul 28, 2024. It is now read-only.

Performance degradation and crashes due to malloc/gc #450

Open
martijneken opened this issue Jul 24, 2024 · 6 comments
Open

Performance degradation and crashes due to malloc/gc #450

martijneken opened this issue Jul 24, 2024 · 6 comments

Comments

@martijneken
Copy link

Hi folks, I'm looking for advice. We are building a product around ProxyWasm and we'd love to support as many languages/SDKs as possible. We've been doing benchmarking and stress testing on various plugins, including those built with TinyGo / ProxyWasmGoSdk. In benchmarking TinyGo we are seeing high malloc performance costs and even crashes.

My core questions are:

  • what we can do to improve/tune malloc/gc in TinyGo?
  • how can we debug GC behavior in TinyGo Wasm?
  • how can we diagnose whether there are memory leaks / fragmentation?

This TinyGo documentation is worrisome:

AdXgb4EodZE5gbc

Describe the bug / error

We see significant costs in malloc(), presumably due to GC. Here are CPU profiles of 2 plugin versions:

  1. A plugin that compiles and executes Regexp in HTTP handlers. The full cost profile here is unclear but we expect there's a lot of GC happening because -print-allocs=. shows loads of heap allocations in the Regexp libs. If we run this plugin often enough (100 seconds, ~40k iterations), it crashes on malloc (stack below)... Crash stack:
ERROR from integration: Function: malloc failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
  0:  0x6f19 - runtime.runtimePanicAt
  1:  0xbc9 - runtime.alloc
  2:  0x7eb6 - runtime.hashmapSet
  3:  0x745c - runtime.hashmapBinarySet
  4:  0x741f - malloc
  5:  0x14d7a - github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm.GetHttpRequestHeader
  6:  0x138d1 - proxy_on_request_headers

7JFUyYkY5x2CaKJ

  1. The same plugin, but optimized so Regexp compilation is moved to plugin init (outside the profile). The CPU cost here is dominated by malloc during a GetHttpRequestHeader call. This should not be expensive (compared to executing plugin logic including Regexps) so we suspect a memory management issue (presumably GC). The longer we run this plugin, the higher the per-invocation costs... Fragmentation? Memory leak?

8hxmKEupq8HPa3x

What is your Envoy/Istio version?

N/A, benchmarks run with a shim ProxyWasm host implementation and v8 wasm engine.

What is the SDK version?

Go 1.22.5

What is your TinyGo version?

TinyGo 0.32.0

URL or snippet of your code including Envoy configuration

Customer proprietary, can't share exact code. I can create a min repro plugin if needed.

@mathetake
Copy link
Member

Unfortunately there's nothing we can do for this - TinyGo is fundamentally flawed with -scheduler=none required environment like Proxy-Wasm. In fact, there was a solution (workaround) for this called nottinygc but not maintained anymore: af31621

@mathetake
Copy link
Member

anyway this is a dup of #349 and in any case, this is not the issue of Go SDK, but more about TinyGo itself.

@martijneken
Copy link
Author

Thanks @mathetake for the quick replies! Out of curiosity how do you view the current/future utility of proxy-wasm-go-sdk in this context? Is it for low-traffic plugins? Is it waiting on full Golang wasm support?

I'm wondering if there are other, less complex use cases where we should still recommend proxy-wasm-go-sdk.

@mathetake
Copy link
Member

IMHO, I wouldn't recommend Proxy-Wasm to anyone regardless of languages, too fragile with little to zero benefit if you don't need sandbox. That's my personal take, not our company's though.

@anuraaga
Copy link
Contributor

anuraaga commented Jul 27, 2024

I happened to find this and thought I'd add some background. Maybe it's a useful brain dump if any motivated individual appears.

As you've found, the TinyGo GC implementation is not very performant, which the project honestly describes. nottinygc linked in bdwgc into TinyGo's garbage collector interface. It actually seemed to work quite well, before completely unusable performance became usable, in coraza-proxy-wasm we saw something like 300ms per request shrink to 30ms, with several hundred ms GC pauses vs 5-10s pauses. However, we found that some workloads were ok but many would run with unbounded memory usage. This isn't surprising - a 32-bit, non-randomized address space can cause conservative GC to perform poorly because pointers and normal math values overlap a lot. So I tried using bdwgc precise GC, which again brought reasonable performance to some failing workloads. But there were still many reports of memory leaks, and one thing I noticed is that the TinyGo compiler only populates information for precise GC in some cases, not all, which looked worrisome due to the fundamental issues with conservative GC. At that point, I decided the unpredictable stability is a cure worse than the disease and nottinygc is not viable. Though to go to the point of if there are any use cases where proxy-wasm-go-sdk could be recommended, maybe with nottinygc if it weren't archived, I would say they do seem to exist but it's not really possible to predict what will work until trying it in production, which IMO makes it not possible to recommend at all.

I think someone could bring bdwgc directly into TinyGo with a proper native integration and there has been interest from maintainers. It would take quite some effort from a dedicated person though (I moved on from compiling Go to Wasm).- one important point would be reworking TinyGo's LLVM to generate descriptors for precise GC in bdwgc's format (I had to fragily map them in nottinygc) and to do it for all allocations. If this were done, I think runtime performance for individual TinyGo wasm programs can be reasonable.

Unfortunately, that leaves another problem, where for example one Envoy plugin would have reasonable memory behavior, but if you had multiple plugins written in Go, they would all have independent GC heaps and be wasting a lot of memory. This can be solved for GC languages using the wasm-gc proposal, but as it doesn't support inner pointers, it cannot be used with Go - the proposal was designed mostly to get kotlin working in wasm on the browser. The proposal can be adjusted, but presumably from starting the conversation to actually having compiler and runtime support for a wasm-gc driven Go, it would take ~2 years which is a long time to invest.

So particularly because of the second point, I personally don't believe it's worth investing in compiling Go to Wasm for any real solutions and think currently Rust and C++ are the most suited - I suspect kotlin will support WASI at some point and will become viable as well. Others may have the resources to push above changes through though and could make Go to Wasm also viable.

PS: If you do still want to try Regexp's with TinyGo, you probably want to use https://github.com/wasilibs/go-re2.

@mathetake
Copy link
Member

mathetake commented Jul 27, 2024

Thanks Rag for the good explanation and 100% matches my stance. I think it’s worth reopening this and pin it, and meanwhile I am going to update README that we are no longer recommending this entire Proxy-Wasm project for any use case unless you need to execute untrusted binaries which almost no one cares if they are not doing Envoy based cloud service where it needs to take client binary like you @martijneken

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants