Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Connectivity yet, logs attached #24

Open
laxmanvallandas opened this issue Sep 30, 2021 · 11 comments
Open

No Connectivity yet, logs attached #24

laxmanvallandas opened this issue Sep 30, 2021 · 11 comments

Comments

@laxmanvallandas
Copy link

laxmanvallandas commented Sep 30, 2021

Config used:
{
"local": {
"name": "local"
},
"remotes": [
{
"name": "remot",
"kubeConfigPath": "/config", # kubeconfig config of remot cluster
"podSubnet": "1.1.128.0/17",
"remoteSATokenPath": "/etc/semaphore-wireguard/tokens/remote1/token",
"wgListenPort": 51824
}
]
}
Logs of the service:
[INFO] semaphore-wireguard: No key found, generating a new private key: path=/var/lib/semaphore-wireguard/wireguard.remot.key
semaphore-wireguard: Configuring wireguard: device=wireguard.remot port=51824 pubKey=<an encrypted key>
1 shared_informer.go:240] Waiting for caches to sync for nodeWatcher
[INFO] semaphore-wireguard: starting node watcher
[WARN] semaphore-wireguard: Cannot sync peers while canSync flag is not set
1 shared_informer.go:247] Caches are synced for nodeWatcher

Route after the service is up:
[~]$ route -n
1.1.128.0 0.0.0.0 255.255.128.0 U 0 0 0 wireguard.remot
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0

nodes on remot cluster has route similar route similar to above but with interface being wireguard.local and for destination 2.2.128.0(pod network of local)

--- Connectivity check from one of the node(10.10.12.125) hosting pods:
[~]$ ping 1.1.133.154
PING 1.1.133.154 (1.1.133.154) 56(84) bytes of data.
From 10.10.12.125 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available

---- route get from node(10.10.12.125)
[~]$ ip route get 1.1.133.154
1.1.133.154 dev wireguard.remot src 20.20.133.5
cache

---- Remote key present
[ ~]$ ls /var/lib/semaphore-wireguard/wireguard.remot.key
/var/lib/semaphore-wireguard/wireguard.remot.key

----- tcpdump on the remote node(20.20.133.5):
[ ~]$ sudo tcpdump -n -i eth0 'src or dst 10.10.12.125'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:27:23.649518 IP 20.20.133.5.51824 > 10.10.12.125.51824: UDP, length 32
09:27:48.666056 IP 10.10.12.125.51824 > 20.20.133.5.51824: UDP, length 32

---- Communication between nodes in two clusters exist

Note: Pods were up on both clusters before semaphore wireguard is launched. Hope this should not be an issue.

@george-angel @ffilippopoulos Can I get some help? May be I am something?

@ffilippopoulos
Copy link
Member

Hey, can you share the output of sudo wg show on nodes in both clusters? The private keys should be hidden so this could be suitable for copy-paste.
Also, can you please share the output of: kubectl --context <context> describe node <node> | grep wireguard.semaphore.uw.io to check your annotations in both clusters?

@ffilippopoulos
Copy link
Member

ffilippopoulos commented Sep 30, 2021

One thing to note here is that Calico will not accept any traffic from/to ip subnets it does not know about. You need to define an ippool on each cluster for each remote pod subnet. In your case it would look like:

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: remot-pods
spec:
  cidr: 1.1.128.0/20
  ipipMode: CrossSubnet
  disabled: true

for the "local" cluster, and the respective one for the remot containing the local pod ip range. Make sure that this pool is disabled as per the example above ^.

Regarding your ping, you should be trying that experiment from inside a pod, following the above logic. Also, make sure you do not have any network policies deployed which block the remote traffic.

I appreciate that even though this is calico related config, it should be documented under our instructions here as well.

@laxmanvallandas
Copy link
Author

  • Old annotations and the wglistenaddress is in general not getting released after the daemonset is deleted.
    Made sure they were newly added.
    On Local:
    remot.wireguard.semaphore.uw.io/endpoint: 10.10.12.125:51825
    remot.wireguard.semaphore.uw.io/pubKey:
    On Remote:
    local.wireguard.semaphore.uw.io/endpoint: 20.20.133.137:51825
    local.wireguard.semaphore.uw.io/pubKey:

  • Ensured the IPPool exists as suggested.
    output of sudo wg show on node:
    [10.10.12.125 ~]$ sudo wg show
    interface: wireguard.remot
    public key:
    private key: (hidden)
    listening port: 51825 # Changed the port on both clusters as the previously used port is not cleaned on host.

peer:
endpoint: 20.20.133.5:51825
allowed ips: 1.1.203.0/24
latest handshake: 22 seconds ago
transfer: 2.25 KiB received, 1.41 KiB sent
persistent keepalive: every 25 seconds

peer:
endpoint: 20.20.133.137:51825
allowed ips: 1.1.158.0/24
latest handshake: 26 seconds ago
transfer: 1.59 KiB received, 2.36 KiB sent
persistent keepalive: every 25 seconds

peer:
endpoint: 20.20.133.177:51825
allowed ips: 1.1.142.0/24
latest handshake: 1 minute, 40 seconds ago
transfer: 1.62 KiB received, 2.30 KiB sent
persistent keepalive: every 25 seconds

peer:
endpoint: 20.20.133.153:51825
allowed ips: 1.1.207.0/24
latest handshake: 2 minutes, 6 seconds ago
transfer: 1.56 KiB received, 2.39 KiB sent
persistent keepalive: every 25 seconds

--
From inside the pod on [10.10.12.125 ~]$
/ # curl 1.1.133.154:80 => IP OF THE NEW POD CREATED ON REMOTE CLUSTER
curl: (7) Failed to connect to 1.1.133.154 port 80 after 1002 ms: Host is unreachable

--
From the host on which above pod is running:
[10.10.12.125 ~]$ sudo tcpdump -n -i wireguard.remot
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wireguard.remot, link-type RAW (Raw IP), capture size 262144 bytes
12:43:33.726299 IP 2.2.183.182.60010 > 1.1.207.11.http: Flags [S], seq 953866, win 28000, options [mss 1400,nop,nop,sackOK,nop,wscale 9], length 0

From output of wg show, shouldn't atleast one peer contain the allowedips to have a subnet to which 1.1.133.154 belong?
(I havent used wireguard before)

Networkpolicies were all deleted at the time of testing above.

Also, note that my cluster has route reflectors running within the clusters. No peering between route-reflectors of each cluster though.

@ffilippopoulos
Copy link
Member

Old annotations and the wglistenaddress is in general not getting released after the daemonset is deleted.

Yes, the daemonset pods should be able to update them when it is running. They should be harmless and you shouldn't try to delete those, they might actually allow wireguard devices to be successflly paired even in times where your semaphore-wireguard is not running (for example during a rollout).

Ensured the IPPool exists as suggested.

ok that is a vital thing for calico to allow the traffic here.

From output of wg show, shouldn't atleast one peer contain the allowedips to have a subnet to which 1.1.133.154 belong?

yes, one of the peers should be allowing the 1.1.133.0/24 subnet. This value is derived from the nodes' PodCIDR field. You can check like:
kubectl --context <context> describe node <node> | grep PodCIDR
What ipam are you configuring calico to use. I think for that value to be consistent we need to use the host-local one:

            "ipam": {                                                             
              "type": "host-local",                                               
              "subnet": "usePodCidr"                                              
            },  

Apart from that it looks like the wireguard peers are able to communicate fine, so traffic should be able to flow throw for the allowed subnets there?

@laxmanvallandas
Copy link
Author

Hmm.
node config show Node.Spec.PodCIDR field being set in all nodes and exactly these fields are populated to remote wireguard peers.
As per below and I hope, if this field is set, node use Host-local IPAM.
image. Not sure, there is any explicit way to find out the plugin in use. 😑
What confusing me is ip assigned to pods(1.1.133.0/24) is not in the range of any of the nodes PodCIDR. But, my IPPool have a broader range 1.1.128.0/17. Wondering if there any rule(somewhere) like IPPool takes priority while assigning IP's over nodes PodCIDR. 🤔

@ffilippopoulos
Copy link
Member

Not sure, there is any explicit way to find out the plugin in use.

@laxmanvallandas calico config, which includes the ipam, is usually passed using a configMap:
kubectl --namespace kube-system get configmaps calico-config -o yaml

@laxmanvallandas
Copy link
Author

We are not using host-local. As I see that PodCIDR is the one which is propagated to remote cluster through wireguard, Looks like using a host-local is a prerequisite for this utility to give expected results. Other random IP's allocated from the broader range might not be able to reach.

@ffilippopoulos
Copy link
Member

@laxmanvallandas I see that the Kubernetes Nodes have a PodCIDRs field which could be potentially used for that. We follow the same approach as Calico for host-local ipam, which you posted above, and throw this field directly as allowed ips for our wireguard peers: https://github.com/utilitywarehouse/semaphore-wireguard/blob/main/runner.go#L199. If PodCIDRs is sufficient for both cases, this could be an easy and quick fix. What ipam are you using? Looks like everything else is working for you as expected.

@laxmanvallandas
Copy link
Author

@ffilippopoulos I am using calico ipam. I noticed that you have added a limitation section now. Thanks for that. And, really appreciate your prompt responses.
Based on the routes propagated in wireguard, I can say all that, what this service promised is working and its an easy solve compared to other solutions i gone through in the last few days. Interested to know if you have any plans to enhance or provide a workaround for non host-local ipams?

@ffilippopoulos
Copy link
Member

@laxmanvallandas I am sorry this is not working with your setup as is, I do not know if switching to host-local ipam is an option for you. It is not ideal for us to support many different ipams as we are developing against our environment. This is why I asked if you could provide the value of PodCIDRs in one of your nodes, to see if there is a cheap win there and we can switch to using that field (from what I've seen so far this will continue to work with host-local ipam). Apart from that, I'll have a look on to see if I can find any nice documentation on calico-ipam and evaluate our options.

@ffilippopoulos
Copy link
Member

I'looked briefly on calico-ipam and it looks like we'd have to start watching ipamblocks.crd.projectcalico.org resources in order to figure out the allowed_ips per wg peer, and it will include a lot more updates on peers config, since the list of ips will be dynamic. So supporting calico-ipam could be implemented but it will not be trivial, and would require some design for the added complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants