Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dynamic gRPC tables #1530

Merged
merged 4 commits into from
Jan 15, 2025

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Jan 15, 2025

Our current kprobes gRPC support lacked hpack dynamic tables support. I had postponed the implementation of this, but it essentially prevents us from finding the path of the request if a connection is reused.

Essentially, the gRPC implementation will internally use one hpack Decoder/Encoder instance per connection, which is what we do in http2grpc_transform.go. Once a key value pair of the gRPC headers is sent, the client will create an index internally to represent the key value pair in the future, and it will only send the index if the key value pair is repeated. The server will also create an index on the decoder side, the first time a key value pair is received. Since the communication is one at a time, even though there are multiple streams, the two tables will always be in sync.

This poses a problem for our decoding logic for the :path field. Namely the :path will be paired with the RPC function to be called, e.g. [:path]:[/someGRPCMethod]. Once this pair is sent, the combination will be encoded in a dynamic table, with an index, for example 234. Next time around a request to /someGRPCMethod is made, the client will send 234 and the server will look up its own index 234 to decipher that this means [:path]:[/someGRPCMethod].

Until now we simply resolved to * (asterisk) when we couldn't find the index.

With this change we start tracking the dynamic table indices. We already used per connection decoders, so it was a matter of detecting when we can start tracking the indices and to store them in the LRU table in our userspace code.

Essentially the logic boils down to this:

  1. If we see the HTTP2 preamble PRI * HTTP/2.0... we know that this is a new connection established. Therefore we can start tracking the indices in the same way the client and the server do. We only need one hpack per connection, since both the client and the server store the same information.
  2. If we detected gRPC connection without seeing the connection start (this is done through our TCP packet detection), we don't use a dynamic table. This is typical for connections that were established before Beyla started instrumentation. At this point, when we detect it in the middle we can't tell how many indicies they've all stored already, even if we see a path we want to store. Essentially, we don't know where to count from.
  3. Very rarely, maybe Beyla can miss a uprobe and we'll miss an index bump. In this case we detect that we've seen an index we can't decipher and we mark the decoder as broken. We don't allow new dynamic table indices to be added and we only return the deciphered data for indices we know were verified before.

TODO:

  • Unit tests for 3
  • Integration tests

@grcevski grcevski requested a review from a team as a code owner January 15, 2025 00:23
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.39%. Comparing base (da5252f) to head (dcc641b).
Report is 3 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (da5252f) and HEAD (dcc641b). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (da5252f) HEAD (dcc641b)
unittests 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1530      +/-   ##
==========================================
- Coverage   72.19%   64.39%   -7.80%     
==========================================
  Files         194      194              
  Lines       19239    19412     +173     
==========================================
- Hits        13889    12501    -1388     
- Misses       4679     6116    +1437     
- Partials      671      795     +124     
Flag Coverage Δ
integration-test 53.69% <100.00%> (-0.46%) ⬇️
k8s-integration-test 54.66% <13.33%> (-0.40%) ⬇️
oats-test 32.72% <66.66%> (+1.47%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if !ok {
// If we've failed once to find an index, don't allow us to find
// a value for index that's greater than the last successful one
if !ok || (d.failedToIndex && idx > d.lastGoodIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean !d.failedToIndex here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be d.failedToIndex, but I caught a bug with that. I have a new version which changes how this is done now. All I block is adding new entries to the dynamic table. I'll push an update soon once I have integration tests

@grcevski grcevski merged commit 5a281d0 into grafana:main Jan 15, 2025
15 checks passed
@grcevski grcevski deleted the better_grpc_kprobes branch January 15, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants