From fcb6e19d89eaec471663cfe15532aeb45c86e106 Mon Sep 17 00:00:00 2001 From: kangmingfa <1640528278@qq.com> Date: Wed, 29 May 2024 14:34:41 +0800 Subject: [PATCH] add consistent hash proposal Signed-off-by: kangmingfa <1640528278@qq.com> --- docs/proposal/consistent_hash.md | 158 +++++++++++++++++++ docs/proposal/pics/kmesh_consistent_hash.svg | 1 + 2 files changed, 159 insertions(+) create mode 100644 docs/proposal/consistent_hash.md create mode 100644 docs/proposal/pics/kmesh_consistent_hash.svg diff --git a/docs/proposal/consistent_hash.md b/docs/proposal/consistent_hash.md new file mode 100644 index 000000000..3d6cd4727 --- /dev/null +++ b/docs/proposal/consistent_hash.md @@ -0,0 +1,158 @@ +--- +title: Consistent hash LB +authors: +- "@bfforever" # Authors' GitHub accounts here. +reviewers: +- "@supercharge-xsy" +- "@hzxuzhonghu" +- "@nlwcy" +- TBD +approvers: +- "@robot" +- TBD + +creation-date: 2024-05-29 + +--- + +## Your short, descriptive title + +Add Consistent hash based LB in ebpf prog + +### Summary + +
    +
  1. Maglev consistent hash algorithm
  2. +
  3. IP tuple info based hash
  4. +
  5. L7 http header based hash
  6. +
+ +#### Goals + +
    +
  1. Networking load balance
  2. +
  3. Guarrant a client conn always send request to same backend
  4. +
  5. Minimize remapping
  6. +
+ +### Proposal + +#### maglev +Algorithm main purpose: map all backends into a fixed look up table, require even mapping, and try to minimize mapping changes when backend change (remove or add a backend), avoid global remapping. + + +#### ebpf implement situation or point +
    +
  1. Cgroup level
  2. +In cgroup socket implement l4 and l7 header based consistent hash. +
  3. Container veth tc level
  4. +In tc level implement l4 based and l7 based consistent hash. +
+ + +### Design Details + +![consistent_hash](pics/kmesh_consistent_hash.svg) +#### maglev +main steps: + +
    +
  1. Map endpoints of Cluster to a fixed table
  2. +Based on Cluster config from istiod to generate a integer table, the value of table is endpoints index; + + +
  3. Base l4 or l7 info compute a hash value
  4. +use hash value get an table index, which is index=hash%len(table); +
  5. Use index to access table
  6. +ep=tables[index] +
+ + +#### L4 implement +
    +
  1. (saddr,daddr,sport,dport,protocol)-->hash
  2. +
  3. then execute above second step.
  4. +
+ +#### L7 implement +``` +apiVersion: networking.istio.io/v1alpha3 +kind: DestinationRule +metadata: + name: helloworld +spec: + host: helloworld + trafficPolicy: + loadBalancer: + consistentHash: + httpHeaderName: testHeader + maglev: + tableSize: +``` +
    +
  1. specify which l7 header is selected to do hash
  2. +
  3. extract l7 header info from http header
  4. +
  5. compute a hash value base header value
  6. +
  7. then execute above second step
  8. +
+ +##### Risk in l7 header value to compute a hash +Because in ebpf prog, there is a lack of string-based hash compute. +
+Below is a simple string-based hash method.Probability we need an effective hash method. +``` +static inline __u32 lb_hash_based_header(unsigned char *header_name) { + __u32 hash = 0; + struct bpf_mem_ptr *msg_header = NULL; + char *msg_header_v = NULL; + __u32 msg_header_len = 0; + __u32 k; + __u32 c; + char msg_header_cp[KMESH_PER_HASH_POLICY_MSG_HEADER_LEN] = {'\0'}; + + if (!header_name) + return hash; + // when header name is not null, compute a hash value. + BPF_LOG(INFO, ROUTER_CONFIG, "Got a header name:%s\n", header_name); + msg_header = (struct bpf_mem_ptr *)bpf_get_msg_header_element(header_name); + if (!msg_header) + return hash; + + msg_header_v = _(msg_header->ptr); + msg_header_len = _(msg_header->size); + if (!msg_header_len) + return hash; + + BPF_LOG(INFO, ROUTER_CONFIG, "Got a header value len:%u\n", msg_header_len); + if (!bpf_strncpy(msg_header_cp, msg_header_len, msg_header_v)) + return hash; + BPF_LOG(INFO, ROUTER_CONFIG, "Got a header value:%s\n", msg_header_cp); + // Currently a simple hash method + hash = 5318; +# pragma unroll + for (k = 0; k < KMESH_PER_HASH_POLICY_MSG_HEADER_LEN;k++) { + if (!msg_header_v || k >= msg_header_len) + break; + c = *((unsigned char *)msg_header_cp + k); + hash = ((hash << 5) + hash) + c; + } + + return hash; +} +``` + + +#### Cgroup level +Directly througth lb logic, to change the daddr and dport of socket. + +#### Tc level +Do lb algorithm based on network packet or message, to change the daddr and dport of packet. +
+This way can get full ip address info to do lb. + +#### Test Plan + +
    +
  1. Table mapping is even distribution
  2. +
  3. Add or remove a endpoint only affect few connections
  4. +
diff --git a/docs/proposal/pics/kmesh_consistent_hash.svg b/docs/proposal/pics/kmesh_consistent_hash.svg new file mode 100644 index 000000000..a40d4b250 --- /dev/null +++ b/docs/proposal/pics/kmesh_consistent_hash.svg @@ -0,0 +1 @@ +
out map
out map
inner map
inner map
cluster name(192B)
cluster nam...
inner map fd
inner ma...
ep_ids
ep_ids
0
0
0
0
1
1
2
2
1
1
0
0
2
2
1
1
table
table
ebpf prog lb logicĀ 
ebpf prog lb logicĀ 
hash % len(table)
hash % len(table)
istiod
istiod
Kmesh Daemon
Kmesh Daemon
0
0
2
2
endpoints
endpoi...
table generate by endpoints with maglev algorithm
table generate by endpoints with maglev algorithm
1
1
sock_addr
sock_ad...
sock_addr
sock_ad...
sock_addr
sock_ad...
Text is not SVG - cannot display
\ No newline at end of file