Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmark tooling for rama-cli and profile rama #340

Open
GlenDC opened this issue Oct 21, 2024 · 5 comments
Open

add benchmark tooling for rama-cli and profile rama #340

GlenDC opened this issue Oct 21, 2024 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request report
Milestone

Comments

@GlenDC
Copy link
Member

GlenDC commented Oct 21, 2024

  • document this approach and references in the rama book
  • mention the current results
  • make follow up tasks for areas where we need to improve prior to releasing 0.2

Current request throughput is pretty sloppy...
Some benchmarks that came in measure at <400req/sec... That's embarrassingly slow.

@GlenDC GlenDC added documentation Improvements or additions to documentation enhancement New feature or request report labels Oct 21, 2024
@GlenDC GlenDC added this to the v0.2 milestone Oct 21, 2024
@GlenDC GlenDC self-assigned this Oct 21, 2024
@GlenDC
Copy link
Member Author

GlenDC commented Oct 21, 2024

For now we can just benchmark:

  • http/1.1+h2 MITM Proxy (with CA cert)
  • http/1.1+h2 Echo Server

as these scenarios are probably going to be the best ones supported for v0.2.

@GlenDC
Copy link
Member Author

GlenDC commented Oct 22, 2024

I forgot that we in the current http-backend require a Mutex for the http client, e.g.:

#[derive(Debug)]
// TODO: once we have hyper as `rama_core` we can
// drop this mutex as there is no inherint reason for `sender` to be mutable...
pub(super) enum SendRequest<Body> {
    Http1(Mutex<hyper::client::conn::http1::SendRequest<Body>>),
    Http2(Mutex<hyper::client::conn::http2::SendRequest<Body>>),
}

I bet that this is already a big explanation on why it is so slow. For sure there is other stuff that can be improved, haven't profiled yet. But let's first get the fork+embed work of hyper going and done, so that we can start from a benchmark without this mutex still in place. As that will then no longer be required.

@GlenDC
Copy link
Member Author

GlenDC commented Oct 22, 2024

To test the theory I ran against a rama-based http server. And yeah it works a lot faster... Still not as fast as I would hope, but this is better. We can circle back into this issue after hyper migration has happened.

@GlenDC
Copy link
Member Author

GlenDC commented Nov 8, 2024

Started doing some profiling. Seems not as much to do with the Mutex (which we no longer do for h2 but only for h1).
Seems that a lot of time is spend because we just use a connection for 1 request, this is costly as it means setting up the entire tls stuff...

Connection pooling is gonna have to be done in 0.3 for sure, and decently so. After that is done we can also see what we can improve around the TLS usage.

@GlenDC
Copy link
Member Author

GlenDC commented Dec 28, 2024

No special tooling required, our bench setup using divan seems fine. We do however not export it yet. Might need to do that somehow.

What can still be added is some benchmarks of a full rama stack (e.g. https traffic from a client over a proxy to a server. These benchmarks give a nice overview of the allocations as well as the performance. Once such full picture benchmarks are added I think we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request report
Projects
None yet
Development

No branches or pull requests

1 participant