Skip to content

Commit c20afa6

Browse files
author
Will Daly
committed
feat: add documentation for retina-shell
Signed-off-by: Will Daly <[email protected]>
1 parent 381f4eb commit c20afa6

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed

docs/06-Troubleshooting/shell.md

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Shell TSG
2+
3+
**EXPERIMENTAL: `retina shell` is an experimental feature, so the flags and behavior may change in future versions.**
4+
5+
The `retina shell` command allows you to start an interactive shell on a Kubernetes node or pod. This runs a container image with many common networking tools installed (`ping`, `curl`, etc.).
6+
7+
## Testing connectivity
8+
9+
Start a shell on a node or inside a pod
10+
11+
```bash
12+
# To start a shell in a node (root network namespace):
13+
kubectl retina shell aks-nodepool1-15232018-vmss000001
14+
15+
# To start a shell inside a pod (pod network namespace):
16+
kubectl retina shell -n kube-system pods/coredns-d459997b4-7cpzx
17+
```
18+
19+
Check connectivity using `ping`:
20+
21+
```text
22+
root [ / ]# ping 10.224.0.4
23+
PING 10.224.0.4 (10.224.0.4) 56(84) bytes of data.
24+
64 bytes from 10.224.0.4: icmp_seq=1 ttl=64 time=0.964 ms
25+
64 bytes from 10.224.0.4: icmp_seq=2 ttl=64 time=1.13 ms
26+
64 bytes from 10.224.0.4: icmp_seq=3 ttl=64 time=0.908 ms
27+
64 bytes from 10.224.0.4: icmp_seq=4 ttl=64 time=1.07 ms
28+
64 bytes from 10.224.0.4: icmp_seq=5 ttl=64 time=1.01 ms
29+
30+
--- 10.224.0.4 ping statistics ---
31+
5 packets transmitted, 5 received, 0% packet loss, time 4022ms
32+
rtt min/avg/max/mdev = 0.908/1.015/1.128/0.077 ms
33+
```
34+
35+
Check DNS resolution using `dig`:
36+
37+
```text
38+
root [ / ]# dig example.com +short
39+
93.184.215.14
40+
```
41+
42+
The tools `nslookup` and `drill` are also available if you prefer those.
43+
44+
Check connectivity to apiserver using `nc` and `curl`:
45+
46+
```text
47+
root [ / ]# nc -zv 10.0.0.1 443
48+
Ncat: Version 7.95 ( https://nmap.org/ncat )
49+
Ncat: Connected to 10.0.0.1:443.
50+
Ncat: 0 bytes sent, 0 bytes received in 0.06 seconds.
51+
52+
root [ / ]# curl -k https://10.0.0.1
53+
{
54+
"kind": "Status",
55+
"apiVersion": "v1",
56+
"metadata": {},
57+
"status": "Failure",
58+
"message": "Unauthorized",
59+
"reason": "Unauthorized",
60+
"code": 401
61+
}
62+
```
63+
64+
### nftables and iptables
65+
66+
Accessing nftables and iptables rules requires `NET_RAW` and `NET_ADMIN` capabilities.
67+
68+
```bash
69+
kubectl retina shell aks-nodepool1-15232018-vmss000002 --capabilities NET_ADMIN,NET_RAW
70+
```
71+
72+
Then you can run `iptables` and `nft`:
73+
74+
```text
75+
root [ / ]# iptables -nvL | head -n 2
76+
Chain INPUT (policy ACCEPT 1191K packets, 346M bytes)
77+
pkts bytes target prot opt in out source destination
78+
root [ / ]# nft list ruleset | head -n 2
79+
# Warning: table ip filter is managed by iptables-nft, do not touch!
80+
table ip filter {
81+
```
82+
83+
**If you see the error "Operation not permitted (you must be root)", check that your `kubectl retina shell` command sets `--capabilities NET_RAW,NET_ADMIN`.**
84+
85+
`iptables` in the shell image uses `iptables-legacy`, which may or may not match the configuration on the node. For example, Ubuntu maps `iptables` to `iptables-nft`. To use the exact same `iptables` binary as installed on the node, you will need to `chroot` into the host filesystem (see below).
86+
87+
## Accessing the host filesystem
88+
89+
On nodes, you can mount the host filesystem to `/host`:
90+
91+
```bash
92+
kubectl retina shell aks-nodepool1-15232018-vmss000002 --mount-host-filesystem
93+
```
94+
95+
This mounts the host filesystem (`/`) to `/host` in the debug pod:
96+
97+
```text
98+
root [ / ]# ls /host
99+
NOTICE.txt bin boot dev etc home lib lib64 libx32 lost+found media mnt opt proc root run sbin srv sys tmp usr var
100+
```
101+
102+
The host filesystem is mounted read-only by default. If you need write access, use the `--allow-host-filesystem-write` flag.
103+
104+
Symlinks between files on the host filesystem may not resolve correctly. If you see "No such file or directory" errors for symlinks, try following the instructions below to `chroot` to the host filesystem.
105+
106+
## Chroot to the host filesystem
107+
108+
`chroot` requires the `SYS_CHROOT` capability:
109+
110+
```bash
111+
kubectl retina shell aks-nodepool1-15232018-vmss000002 --mount-host-filesystem --capabilities SYS_CHROOT
112+
```
113+
114+
Then you can use `chroot` to switch to start a shell inside the host filesystem:
115+
116+
```text
117+
root [ / ]# chroot /host bash
118+
root@aks-nodepool1-15232018-vmss000002:/# cat /etc/resolv.conf | tail -n 2
119+
nameserver 168.63.129.16
120+
search shncgv2kgepuhm1ls1dwgholsd.cx.internal.cloudapp.net
121+
```
122+
123+
`chroot` allows you to:
124+
125+
* Execute binaries installed on the node.
126+
* Resolve symlinks that point to files in the host filesystem (such as /etc/resolv.conf -> /run/systemd/resolve/resolv.conf)
127+
* Use `sysctl` to view or modify kernel parameters.
128+
* Use `journalctl` to view systemd unit and kernel logs.
129+
* Use `ip netns` to view network namespaces. (However, `ip netns exec` does not work.)
130+
131+
## Systemctl
132+
133+
`systemctl` commands require both `chroot` to the host filesystem and host PID:
134+
135+
```bash
136+
kubectl retina shell aks-nodepool1-15232018-vmss000002 --mount-host-filesystem --capabilities SYS_CHROOT --host-pid
137+
```
138+
139+
Then `chroot` to the host filesystem and run `systemctl status`:
140+
141+
```text
142+
root [ / ]# chroot /host systemctl status | head -n 2
143+
● aks-nodepool1-15232018-vmss000002
144+
State: running
145+
```
146+
147+
**If `systemctl` shows an error "Failed to connect to bus: No data available", check that the `retina shell` command has `--host-pid` set and that you have chroot'd to /host.**
148+
149+
## Troubleshooting
150+
151+
### Timeouts
152+
153+
If `kubectl retina shell` fails with a timeout error, then:
154+
155+
1. Increase the timeout by setting `--timeout` flag.
156+
2. Check the pod using `kubectl describe pod` to determine why retina shell is failing to start.
157+
158+
Example:
159+
160+
```bash
161+
kubectl retina shell --timeout 10m node001 # increase timeout to 10 minutes
162+
```
163+
164+
### Firewalls and ImagePullBackoff
165+
166+
Some clusters are behind a firewall that blocks pulling the retina-shell image. To workaround this:
167+
168+
1. Replicate the retina-shell images to a container registry accessible from within the cluster.
169+
2. Override the image used by Retina CLI with the environment variable `RETINA_SHELL_IMAGE_REPO`.
170+
171+
Example:
172+
173+
```bash
174+
export RETINA_SHELL_IMAGE_REPO="example.azurecr.io/retina/retina-shell"
175+
export RETINA_SHELL_IMAGE_VERSION=v0.0.1 # optional, if not set defaults to the Retina CLI version.
176+
kubectl retina shell node0001 # this will use the image "example.azurecr.io/retina/retina-shell:v0.0.1"
177+
```
178+
179+
## Limitations
180+
181+
* Windows nodes and pods are not yet supported.
182+
* `bpftool` and `bpftrace` are not supported.
183+
* The shell image link `iptables` commands to `iptables-legacy`, even if the node itself links to `iptables-nft`.
184+
* `nsenter` is not supported.
185+
* `ip netns` will not work without `chroot` to the host filesystem.
186+

0 commit comments

Comments
 (0)