Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNI filter sni.lua can't filter chrome browser with reassembled TCP segments? #193

Open
vincentmli opened this issue Sep 19, 2024 · 56 comments

Comments

@vincentmli
Copy link
Contributor

when use chrome browser, the ssl client hello is sent in two TCP segments, and reassembled, it appears sni.lua can't handle TCP reassembled clienthello SNI?

Screenshot 2024-09-19 at 7 38 38 AM

@vincentmli
Copy link
Contributor Author

the first TCP segments contains handshake and clienthello and up to extension key_share which has 1263 bytes. server_name extension is in the second TCP segments, remove the handshake and clienthello check in sni.lua seems not enough to get server_name, the offset could be incorrect still.

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

wrapping up our discussion on Matrix.. it might be resolved on kernel 6.9 by this:

https://elixir.bootlin.com/linux/v6.9/source/net/core/dev.c#L4914

see also: https://lore.kernel.org/bpf/[email protected]/

@vincentmli
Copy link
Contributor Author

@lneto I tried upstream stable kernel 6.10.11 release, same problem

[root@bpfire-3 ~]# uname -a
Linux bpfire-3.localdomain 6.10.11-ipfire #1 SMP PREEMPT_DYNAMIC Fri Sep 20 15:26:24 GMT 2024 x86_64 Intel(R) Celeron(R) CPU G1820 @ 2.70GHz GenuineIntel GNU/Linux

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

@vincentmli
Copy link
Contributor Author

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..

do_xdp_generic
netif_receive_generic_xdp
netif_skb_check_for_xdp
skb_linearize

however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

@vincentmli
Copy link
Contributor Author

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..

do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize

however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

I am learning and I like to understand it better too :). if the skb_linearize is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/

also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf

I still need to digest.. but I think it should be the path to follow..

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..
do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize
however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

I am learning and I like to understand it better too :). if the skb_linearize is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?

precisely.. we have had the same issue with luaxtable.. you might want to take a look..

#107 (comment)
a147522

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/

also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf

I still need to digest.. but I think it should be the path to follow..

these helpers doesn't seem to be merged in the tree..

UPDATE: sorry, only adjust_data that isn't.. we'll keep digging later..

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

@lneto
Copy link
Contributor

lneto commented Sep 20, 2024

a silly question.. you are using -skb on xdp-loader, right? e.g.,

sudo xdp-loader load -m skb <ifname> https.o

@vincentmli
Copy link
Contributor Author

a silly question.. you are using -skb on xdp-loader, right? e.g.,

sudo xdp-loader load -m skb <ifname> https.o

yes

@vincentmli
Copy link
Contributor Author

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6

can you try to add bpf_xdp_adjust_tail(ctx, 0); in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?

I added bpf_xdp_adjust_tail(ctx, 0);, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset 0 meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?

@vincentmli
Copy link
Contributor Author

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6
can you try to add bpf_xdp_adjust_tail(ctx, 0); in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?

I added bpf_xdp_adjust_tail(ctx, 0);, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset 0 meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?

I added bpf_xdp_adjust_tail(ctx, 2048);, no effect

@vincentmli
Copy link
Contributor Author

I think we could call bpf_xdp_get_buff_len(ctx) and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

I think we could call bpf_xdp_get_buff_len(ctx) and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?

yup, I would print it as well..

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

perhaps we need this flag.. can you try this out? https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#xdp-fragments

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

can you try s/SEC("xdp")/SEC("xdp.frags")/?

more info here.. https://lore.kernel.org/netdev/[email protected]/

perhaps we need to use bpf_xdp_load_bytes as well..

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Can you check your MTU config as well?

@vincentmli
Copy link
Contributor Author

I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym;

static char runtime[] = "examples/filter/sni";

struct bpf_luaxdp_arg {
        __u16 offset;
} __attribute__((packed));

SEC("xdp.frags")
int filter_https(struct xdp_md *ctx)
{
        int data_len = bpf_xdp_get_buff_len(ctx);
        struct bpf_luaxdp_arg arg;
        //bpf_xdp_adjust_tail(ctx, 2048);
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct iphdr *ip = data + sizeof(struct ethhdr);

        if (ip + 1 > (struct iphdr *)data_end)
                goto pass;

        if (ip->protocol != IPPROTO_TCP)
                goto pass;

        struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
        if (tcp + 1 > (struct tcphdr *)data_end)
                goto pass;

        if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
                goto pass;

        bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr);
        void *payload = (void *)tcp + (tcp->doff * 4);
        if (payload > data_end)
                goto pass;

        arg.offset = bpf_htons((__u16)(payload - data));

        int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
        return action < 0 ? XDP_PASS : action;
pass:
        return XDP_PASS;
}

char _license[] SEC("license") = "Dual MIT/GPL";


@vincentmli
Copy link
Contributor Author

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 


02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
	Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
		Product Name: Broadcom NetXtreme Gigabit Ethernet

I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok? since we are using XDP generic skb mode, does it matter which NIC?

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

@vincentmli
Copy link
Contributor Author

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 


02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
	Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
		Product Name: Broadcom NetXtreme Gigabit Ethernet

I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok?

Yes, please..

since we are using XDP generic skb mode, does it matter which NIC?

I assumed it doesn't as well.. but I'm not sure regarding the multi-buf support

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine

I think you can just use nc to send the client-hello you captured from chrome..

@vincentmli
Copy link
Contributor Author

vincentmli commented Sep 21, 2024

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

That's a good question.. I'm not sure..

@lneto
Copy link
Contributor

lneto commented Sep 21, 2024

I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym;

static char runtime[] = "examples/filter/sni";

struct bpf_luaxdp_arg {
        __u16 offset;
} __attribute__((packed));

SEC("xdp.frags")
int filter_https(struct xdp_md *ctx)
{
        int data_len = bpf_xdp_get_buff_len(ctx);
        struct bpf_luaxdp_arg arg;
        //bpf_xdp_adjust_tail(ctx, 2048);
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct iphdr *ip = data + sizeof(struct ethhdr);

        if (ip + 1 > (struct iphdr *)data_end)
                goto pass;

        if (ip->protocol != IPPROTO_TCP)
                goto pass;

        struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
        if (tcp + 1 > (struct tcphdr *)data_end)
                goto pass;

        if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
                goto pass;

        bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr);
        void *payload = (void *)tcp + (tcp->doff * 4);
        if (payload > data_end)
                goto pass;

        arg.offset = bpf_htons((__u16)(payload - data));

        int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
        return action < 0 ? XDP_PASS : action;
pass:
        return XDP_PASS;
}

char _license[] SEC("license") = "Dual MIT/GPL";

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

@vincentmli
Copy link
Contributor Author

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

That's a good question.. I'm not sure..

now I got BPFire in KVM running and the green0 virtual interface is virtio_net, still no success

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:62:45:3e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 9 name xdp_dispatcher tag 90f686eb86991928 jited 

[root@bpfire-8 lua]# lspci -vvv | grep -i eth
01:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)
07:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)

[root@bpfire-8 lua]# dmesg | grep green0
[    7.821719] 8021q: adding VLAN 0 to HW filter on device green0
[  276.029437] virtio_net virtio6 green0: XDP request 5 queues but max is 1. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.

@vincentmli
Copy link
Contributor Author

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

@lneto
Copy link
Contributor

lneto commented Sep 22, 2024

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

You can pass them to Lua adding two new __u16 fields in the bpf_luaxdp_arg struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?

@lneto
Copy link
Contributor

lneto commented Sep 22, 2024

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 


02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
	Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
		Product Name: Broadcom NetXtreme Gigabit Ethernet

1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..

ip link set dev green0 mtu 3500

@vincentmli
Copy link
Contributor Author

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 


02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
	Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
		Product Name: Broadcom NetXtreme Gigabit Ethernet

1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..

ip link set dev green0 mtu 3500

yes I already tried larger MTU, no success

@vincentmli
Copy link
Contributor Author

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

You can pass them to Lua adding two new __u16 fields in the bpf_luaxdp_arg struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?

I think I need to understand and read more about the xdp frags so I know what I am trying :)

@lneto
Copy link
Contributor

lneto commented Sep 22, 2024

@vincentmli my understanding is that you need

  1. MTU bigger or equal than the ETH frame that will carry the TCP frags;
  2. load eBPF program in SEC("XDP.frags");
  3. make your TCP frag request using chrome;
  4. now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than data_end - data. This means that part of of the frame is linearized, part is not.
  5. then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it.
  6. after that, data_end - data should have the length of the whole TCP segment.

@vincentmli
Copy link
Contributor Author

@vincentmli my understanding is that you need

  1. MTU bigger or equal than the ETH frame that will carry the TCP frags;
  2. load eBPF program in SEC("XDP.frags");
  3. make your TCP frag request using chrome;
  4. now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than data_end - data. This means that part of of the frame is linearized, part is not.
  5. then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it.
  6. after that, data_end - data should have the length of the whole TCP segment.

thanks! I will try it, I am following the initial XDP multi buffer support and reading all the discussion involved and hasn't finished yet, it takes time but I am getting better understanding :) vincentmli/BPFire#44

@vincentmli
Copy link
Contributor Author

I followed your suggestion to set the MTU to 4096, and adjusted the code below to print true_len, old_data_len, new_data_len, send chrome request, but they are always the same, still no success, I am expecting true_len > old_data_len, but it never occurs

MTU

[root@bpfire-5 ~]# ip l list dev green0
5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited 

code

 18 SEC("xdp.frags")
 19 int filter_https(struct xdp_md *ctx)
 20 {
 21         struct bpf_luaxdp_arg arg;
 22 
 23         void *old_data_end = (void *)(long)ctx->data_end;
 24         void *old_data = (void *)(long)ctx->data;
 25 
 26         int true_len = bpf_xdp_get_buff_len(ctx);
 27 
 28         int old_data_len = old_data_end - old_data;
 29 
 30         bpf_xdp_adjust_tail(ctx, 0);
 31 
 32         void *data_end = (void *)(long)ctx->data_end;
 33         void *data = (void *)(long)ctx->data;
 34 
 35         int new_data_len = data_end - data;
 36 
 37         struct iphdr *ip = data + sizeof(struct ethhdr);
 38 
 39         if (ip + 1 > (struct iphdr *)data_end)
 40                 goto pass;
 41 
 42         if (ip->protocol != IPPROTO_TCP)
 43                 goto pass;
 44 
 45         struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
 46         if (tcp + 1 > (struct tcphdr *)data_end)
 47                 goto pass;
 48 
 49         if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
 50                 goto pass;
 51 
 52         bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr);
 53         bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr);
 54         bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr);
 55 
 56         void *payload = (void *)tcp + (tcp->doff * 4);
 57         if (payload > data_end)
 58                 goto pass;
 59 
 60         arg.offset = bpf_htons((__u16)(payload - data));
 61 
 62         int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
 63         return action < 0 ? XDP_PASS : action;
 64 pass:
 65         return XDP_PASS;
 66 }
 67 

trace log

          <idle>-0       [001] ..s21  2038.226457: bpf_trace_printk:  true_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226464: bpf_trace_printk:  old_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226465: bpf_trace_printk:  new_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268004: bpf_trace_printk:  true_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268012: bpf_trace_printk:  old_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268013: bpf_trace_printk:  new_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268509: bpf_trace_printk:  true_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268514: bpf_trace_printk:  old_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268515: bpf_trace_printk:  new_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268595: bpf_trace_printk:  true_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  old_data_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  new_data_len: 647 from 172.16.1.9

@vincentmli
Copy link
Contributor Author

this tcpdump capture from the chrome client, so the clienthello is already segmented by the chrome client due to client MTU, even if BPFire green0 MTU is 4096, the incoming packet is smaller than standard 1500 MTU, so the generic xdp actually never received > 1500 packets, is that the reason ? I tried to set the chrome client MTU to 4096 too, but still it sends smaller packet < 1500. the chrome client is iMac by the way.

17:58:15.505326 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [.], seq 1:1229, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 1228

17:58:15.505327 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [P.], seq 1229:1934, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 705

@vincentmli
Copy link
Contributor Author

and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)

@vincentmli
Copy link
Contributor Author

I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes

 hping3 -c 1 -d 3000 -p 999 172.16.1.5
HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes

--- 172.16.1.5 hping statistic ---
1 packets transmitted, 0 packets received, 100% packet loss

bpfire trace log:

         <idle>-0       [000] ..s21  9697.400542: bpf_trace_printk:  true_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400551: bpf_trace_printk:  old_data_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400552: bpf_trace_printk:  new_data_len: 3054 from 172.16.1.11

it looks to me generic xdp driver already handles linearized skb since the true_len equals old_data_len to 3054 bytes, right? so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to call bpf_xdp_adjust_tail(ctx, 0);, just curious, where you get calling bpf_xdp_adjust_tail(ctx, 0); with offset 0 will linearize skb?

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

I followed your suggestion to set the MTU to 4096, and adjusted the code below to print true_len, old_data_len, new_data_len, send chrome request, but they are always the same, still no success, I am expecting true_len > old_data_len, but it never occurs

MTU

[root@bpfire-5 ~]# ip l list dev green0
5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited 

code

 18 SEC("xdp.frags")
 19 int filter_https(struct xdp_md *ctx)
 20 {
 21         struct bpf_luaxdp_arg arg;
 22 
 23         void *old_data_end = (void *)(long)ctx->data_end;
 24         void *old_data = (void *)(long)ctx->data;
 25 
 26         int true_len = bpf_xdp_get_buff_len(ctx);
 27 
 28         int old_data_len = old_data_end - old_data;
 29 
 30         bpf_xdp_adjust_tail(ctx, 0);
 31 
 32         void *data_end = (void *)(long)ctx->data_end;
 33         void *data = (void *)(long)ctx->data;
 34 
 35         int new_data_len = data_end - data;
 36 
 37         struct iphdr *ip = data + sizeof(struct ethhdr);
 38 
 39         if (ip + 1 > (struct iphdr *)data_end)
 40                 goto pass;
 41 
 42         if (ip->protocol != IPPROTO_TCP)
 43                 goto pass;
 44 
 45         struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
 46         if (tcp + 1 > (struct tcphdr *)data_end)
 47                 goto pass;
 48 
 49         if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
 50                 goto pass;
 51 
 52         bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr);
 53         bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr);
 54         bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr);
 55 
 56         void *payload = (void *)tcp + (tcp->doff * 4);
 57         if (payload > data_end)
 58                 goto pass;
 59 
 60         arg.offset = bpf_htons((__u16)(payload - data));
 61 
 62         int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
 63         return action < 0 ? XDP_PASS : action;
 64 pass:
 65         return XDP_PASS;
 66 }
 67 

trace log

          <idle>-0       [001] ..s21  2038.226457: bpf_trace_printk:  true_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226464: bpf_trace_printk:  old_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226465: bpf_trace_printk:  new_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268004: bpf_trace_printk:  true_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268012: bpf_trace_printk:  old_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268013: bpf_trace_printk:  new_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268509: bpf_trace_printk:  true_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268514: bpf_trace_printk:  old_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268515: bpf_trace_printk:  new_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268595: bpf_trace_printk:  true_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  old_data_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  new_data_len: 647 from 172.16.1.9

so, none exceeded the "default" MTU size.. my understanding is that wouldn't be non-linear anyway..

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)

my understanding (from the issue description) was that chrome client was already sending TCP frags into a large ETH frame.. if it's not the case, I don't think XDP will be aware of the frags because it runs before the network stack.. in this case, I see two paths:

  1. handle frags in our eBPF/Lua program;
  2. let the frags go through the network stack and handle this in luanetfilter (who is already linearizing the skb).

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes

 hping3 -c 1 -d 3000 -p 999 172.16.1.5
HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes

--- 172.16.1.5 hping statistic ---
1 packets transmitted, 0 packets received, 100% packet loss

bpfire trace log:

         <idle>-0       [000] ..s21  9697.400542: bpf_trace_printk:  true_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400551: bpf_trace_printk:  old_data_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400552: bpf_trace_printk:  new_data_len: 3054 from 172.16.1.11

it looks to me generic xdp driver already handles linearized skb since the true_len equals old_data_len to 3054 bytes, right?
so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to call bpf_xdp_adjust_tail(ctx, 0);,

yup, indeed I've the same understanding.. it looks like in the case of generic XDP it's resolved..

just curious, where you get calling bpf_xdp_adjust_tail(ctx, 0); with offset 0 will linearize skb?

I think I got it wrongly.. I thought offset referred to the beginning of the portion of the buffer you wanted to "linearize".. I guess it's actually the end of it and we should use true_len here to linearize everything.. however, the frags would need to fit in the tail room.. I'm not sure it would be a solution anyway..

the case I was expecting for this to take effect was when true_len is actually greater than old_data_len, but if I got correctly it never happens on XDP generic, right? at least in your tests..

I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

@vincentmli
Copy link
Contributor Author

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.

@vincentmli
Copy link
Contributor Author

vincentmli commented Sep 23, 2024

I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)

XDP multi buffer /jumbo support would be unrealistic for consumer network environment because consumer computer does not generate jumbo frames. it would be beneficial in enterprise grade data center. I think probably move SNI parsing to lunanetfilter is ok for my bpfire home firewall use case regardless of the performance :)

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.

wondering if you have actually reached the condition of TCP frags, then..

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)

XDP multi buffer /jumbo support would be unrealistic for consumer network environment because consumer computer does not generate jumbo frames.

indeed.. in my particular case, I'm concerned about running this in network operators' edge..

it would be beneficial in enterprise grade data center. I think probably move SNI parsing to lunanetfilter is ok for my bpfire home firewall use case regardless of the performance :)

I think on combining both.. handling non-frag on XPD and frag on luanetfilter =).. so we speedup whatever we can.. otherwise, we let it go through the network stack and handle it on netfilter..

btw, @jperon has already ported the SNI filter to luanetfilter.. but he's using moonscript.. hehe.. you can take a look at https://github.com/luainkernel/snihook

@vincentmli
Copy link
Contributor Author

btw, @jperon has already ported the SNI filter to luanetfilter.. but he's using moonscript.. hehe.. you can take a look at https://github.com/luainkernel/snihook

could snihook be integrated to be part of lunatik and included in lunatik source so I don't have to build separate snihook addon for bpfire?

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

btw, @jperon has already ported the SNI filter to luanetfilter.. but he's using moonscript.. hehe.. you can take a look at https://github.com/luainkernel/snihook

could snihook be integrated to be part of lunatik and included in lunatik source so I don't have to build separate snihook addon for bpfire?

I'm thinking on extending the filter example to run on both XDP and netfilter.. I referred snihook because we can borrow the offsets adjustments from there.. not sure if @jperon would be interested on porting back it to the filter example himself.. you are welcome to do as well.. I just don't have the time to do this promptly.. but it should be quite straightforward.. perhaps @sheharyaar can help with that too..

@jperon
Copy link
Contributor

jperon commented Sep 23, 2024

I’ll be glad to help if I understand well what I should port back!

@vincentmli
Copy link
Contributor Author

I like the idea of extending filter with both XDP and netfilter too, I prefer less software dependency since snihook requires other luarock and moonscript

@lneto
Copy link
Contributor

lneto commented Sep 23, 2024

I’ll be glad to help if I understand well what I should port back!

Thank you! It's basically to add a netfilter hook on filter/sni.lua.. so we can have handlers both to XDP and Netfilter calling the same parser.. I can help on Matrix if you need =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants