Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tcp long connections metrics #1249

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

yp969803
Copy link
Contributor

@yp969803 yp969803 commented Feb 24, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:
The pr introduces new feature of tcp_long_conn metrics
Which issue(s) this PR fixes:

Fixes #1211

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
Yes

Tcp long connections metrics

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nlgwcy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yp969803
Copy link
Contributor Author

yp969803 commented Feb 24, 2025

@nlgwcy @LemmyHuang @xiangxinyong can you review the pr, have i writen correct ebpf code, i have done the changes which has been told in previous comments in the proposal.

Copy link
Collaborator

@LiZhenCheng9527 LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please present the results of the long connection metrics
  2. The original destination address problem is still unhandle. If you plan to optimize later, you can create a sub-issue under the existing lfx issue to track the task.

@@ -55,6 +55,7 @@ type MetricController struct {
EnableAccesslog atomic.Bool
EnableMonitoring atomic.Bool
EnableWorkloadMetric atomic.Bool
EnableLongTCPMetric atomic.Bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to distinguish between metrics for long connections and short connections.
As long as metrics is turned on, we should handle all types of connections.

#define kmesh_perf_map km_perf_map
#define kmesh_perf_info km_perf_info
#define map_of_long_tcp_conns km_longtcpconns_map
#define long_tcp_conns_events km_longtcpconns_events
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BPF_OBJ_NAME_LEN = 16
So I think, this name of map is too long.

Signed-off-by: Yash Patel <[email protected]>
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from b79a168 to a698cac Compare February 27, 2025 08:53
Signed-off-by: Yash Patel <[email protected]>

rfac: bpf2go.go

Signed-off-by: Yash Patel <[email protected]>
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from a3aa24e to 37fa818 Compare March 19, 2025 03:45
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from 99d0e84 to dfea824 Compare March 20, 2025 06:43
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from dfea824 to 7fc5113 Compare March 20, 2025 06:53
Signed-off-by: Yash Patel <[email protected]>

rfac: rmoved workload/tc.go

Signed-off-by: Yash Patel <[email protected]>

chore: generated bpf2go

Signed-off-by: Yash Patel <[email protected]>

rfac: tc generated files

Signed-off-by: Yash Patel <[email protected]>
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from 76a940e to e4fb65d Compare March 20, 2025 16:37
Signed-off-by: Yash Patel <[email protected]>

rfac: removed is_managed_by_kmesh_skb func

Signed-off-by: Yash Patel <[email protected]>
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch 5 times, most recently from c274149 to 3df32ec Compare March 21, 2025 06:39
Signed-off-by: Yash Patel <[email protected]>

rfac: sendmsg.c

Signed-off-by: Yash Patel <[email protected]>

chore: generated bpf2go

Signed-off-by: Yash Patel <[email protected]>

rfac: added is_managed_by_kmesh to sendmsg.c

Signed-off-by: Yash Patel <[email protected]>
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from 3df32ec to e26073e Compare March 21, 2025 06:44
@yp969803
Copy link
Contributor Author

    bpf_map.go:60: bpf init failed: bpf Load failed: load program: permission denied:
        	; int sockops_active_prog(struct bpf_sock_ops *skops)
        	0: (bf) r6 = r1
        	; __u64 sock_cookie = bpf_get_socket_cookie(skops);
        	1: (85) call bpf_get_socket_cookie#46
        	2: (bf) r7 = r0
        	; if (skops->family != AF_INET && skops->family != AF_INET6)
        	3: (61) r1 = *(u32 *)(r6 +20)
        	; if (skops->family != AF_INET && skops->family != AF_INET6)
        	4: (bf) r2 = r1
        	5: (47) r2 |= 8
        	6: (15) if r2 == 0xa goto pc+1

@LiZhenCheng9527 @nlgwcy can you look at the above error during bpf_compaitliblity test, why is this happening, i have also shorten the program length so that number of instruction sets not exceeds

@yp969803
Copy link
Contributor Author

yp969803 commented Mar 22, 2025

so the problem arrises in the line of code, with the error

	303: (18) r1 = 0xffff8ff1b4577000     ; R1_w=map_ptr(map=km_orig_dst,ks=8,vs=36)
        	305: (85) call bpf_map_lookup_elem#1
        	R2 type=sock_or_null expected=fp, pkt, pkt_meta, map_key, map_value, mem, ringbuf_mem, buf, trusted_ptr_
        	processed 390 insns (limit 1000000) max_states_per_insn 1 total_states 33 peak_states 33 mark_read 5

when i try to comment these lines and run unit tests, verifier accept it. these lines are written previously in the sockops_prog

     __u64 *current_sk = (__u64 *)skops->sk;
        struct bpf_sock_tuple *dst = bpf_map_lookup_elem(&map_of_orig_dst, current_sk);

in the program

SEC("sockops_active")
int sockops_active_prog(struct bpf_sock_ops *skops)
{
    __u64 sock_cookie = bpf_get_socket_cookie(skops);

    if (skops->family != AF_INET && skops->family != AF_INET6)
        return 0;

    switch (skops->op) {
    case BPF_SOCK_OPS_TCP_CONNECT_CB:
        skops_handle_kmesh_managed_process(skops);
        break;

    case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
        if (!is_managed_by_kmesh(skops))
            break;
        observe_on_connect_established(skops->sk, sock_cookie, OUTBOUND);
        if (bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG) != 0
            || bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RETRANS_CB_FLAG) != 0
            || bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RTT_CB_FLAG) != 0) {
            BPF_LOG(ERR, SOCKOPS, "set sockops cb failed!\n");
        }
        __u64 *current_sk = (__u64 *)skops->sk;
        struct bpf_sock_tuple *dst = bpf_map_lookup_elem(&map_of_orig_dst, current_sk);
        if (dst != NULL)
            enable_encoding_metadata(skops);
        break;

    default:
        break;
    }
    return 0;
}

@nlgwcy @LiZhenCheng9527

@yp969803 yp969803 marked this pull request as ready for review March 24, 2025 08:51
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from d240f27 to 9a01a5a Compare March 24, 2025 09:00
@yp969803 yp969803 force-pushed the tcp_long_connection_code branch from 9a01a5a to 1ebbcb3 Compare March 24, 2025 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lfx-mentorship-2025-Mar-May] Metrics for TCP Long Connection
4 participants