add benchmark tooling for rama-cli and profile rama #340

GlenDC · 2024-10-21T12:18:59Z

document this approach and references in the rama book
mention the current results
make follow up tasks for areas where we need to improve prior to releasing 0.2

Current request throughput is pretty sloppy...
Some benchmarks that came in measure at <400req/sec... That's embarrassingly slow.

GlenDC · 2024-10-21T12:19:45Z

For now we can just benchmark:

http/1.1+h2 MITM Proxy (with CA cert)
http/1.1+h2 Echo Server

as these scenarios are probably going to be the best ones supported for v0.2.

GlenDC · 2024-10-22T19:55:03Z

I forgot that we in the current http-backend require a Mutex for the http client, e.g.:

#[derive(Debug)]
// TODO: once we have hyper as `rama_core` we can
// drop this mutex as there is no inherint reason for `sender` to be mutable...
pub(super) enum SendRequest<Body> {
    Http1(Mutex<hyper::client::conn::http1::SendRequest<Body>>),
    Http2(Mutex<hyper::client::conn::http2::SendRequest<Body>>),
}

I bet that this is already a big explanation on why it is so slow. For sure there is other stuff that can be improved, haven't profiled yet. But let's first get the fork+embed work of hyper going and done, so that we can start from a benchmark without this mutex still in place. As that will then no longer be required.

GlenDC · 2024-10-22T20:40:00Z

To test the theory I ran against a rama-based http server. And yeah it works a lot faster... Still not as fast as I would hope, but this is better. We can circle back into this issue after hyper migration has happened.

GlenDC · 2024-11-08T16:38:52Z

Started doing some profiling. Seems not as much to do with the Mutex (which we no longer do for h2 but only for h1).
Seems that a lot of time is spend because we just use a connection for 1 request, this is costly as it means setting up the entire tls stuff...

Connection pooling is gonna have to be done in 0.3 for sure, and decently so. After that is done we can also see what we can improve around the TLS usage.

GlenDC · 2024-12-28T21:29:10Z

No special tooling required, our bench setup using divan seems fine. We do however not export it yet. Might need to do that somehow.

What can still be added is some benchmarks of a full rama stack (e.g. https traffic from a client over a proxy to a server. These benchmarks give a nice overview of the allocations as well as the performance. Once such full picture benchmarks are added I think we can close it.

GlenDC added documentation Improvements or additions to documentation enhancement New feature or request report labels Oct 21, 2024

GlenDC added this to the v0.2 milestone Oct 21, 2024

GlenDC self-assigned this Oct 21, 2024

GlenDC mentioned this issue Dec 30, 2024

add full-stack rama benchmarks #374

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add benchmark tooling for rama-cli and profile rama #340

add benchmark tooling for rama-cli and profile rama #340

GlenDC commented Oct 21, 2024

GlenDC commented Oct 21, 2024 •

edited

Loading

GlenDC commented Oct 22, 2024

GlenDC commented Oct 22, 2024 •

edited

Loading

GlenDC commented Nov 8, 2024

GlenDC commented Dec 28, 2024

add benchmark tooling for rama-cli and profile rama #340

add benchmark tooling for rama-cli and profile rama #340

Comments

GlenDC commented Oct 21, 2024

GlenDC commented Oct 21, 2024 • edited Loading

GlenDC commented Oct 22, 2024

GlenDC commented Oct 22, 2024 • edited Loading

GlenDC commented Nov 8, 2024

GlenDC commented Dec 28, 2024

GlenDC commented Oct 21, 2024 •

edited

Loading

GlenDC commented Oct 22, 2024 •

edited

Loading