Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount NFS41/42: protocol not supported #1226

Closed
nschad opened this issue Nov 3, 2023 · 10 comments
Closed

Mount NFS41/42: protocol not supported #1226

nschad opened this issue Nov 3, 2023 · 10 comments
Labels
area/kernel Issues related to kernel kind/bug Something isn't working

Comments

@nschad
Copy link

nschad commented Nov 3, 2023

Description

Upon trying to mount a volume with NFS41/NFS42 in Flatcar stable 3602.2.1 the user receives an error with message
"protocol not supported". This did not happen in an older flatcar version 3510.2.7.

Our problem seems to be exactly the same as described in #711

NFS4.0 mounting works as expected.

Impact

Unable to mount share with nfs version 4.1 or 4.2

Environment and steps to reproduce

  1. Set-up:
  • flatcar 3602.2.1 as a node within the kubernetes cluster
  • See the following files on how-to reproduce our testing setup...
Create the nfs ganesha server
---
# Source: nfs-server-provisioner/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
  name: my-release-nfs-server-provisioner
---
# Source: nfs-server-provisioner/templates/storageclass.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nfs
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
provisioner: cluster.local/my-release-nfs-server-provisioner
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- hard
- retrans=3
- proto=tcp
- nfsvers=4.2
- rsize=4096
- wsize=4096
- noatime
- nodiratime
---
# Source: nfs-server-provisioner/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-release-nfs-server-provisioner
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services", "endpoints"]
    verbs: ["get"]
  - apiGroups: ["extensions"]
    resources: ["podsecuritypolicies"]
    resourceNames: ["nfs-provisioner"]
    verbs: ["use"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
  - apiGroups: ["security.openshift.io"]
    resources: ["securitycontextconstraints"]
    resourceNames: ["privileged"]
    verbs: ["use"]
---
# Source: nfs-server-provisioner/templates/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
  name: my-release-nfs-server-provisioner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: my-release-nfs-server-provisioner
subjects:
  - kind: ServiceAccount
    name: my-release-nfs-server-provisioner
    namespace: default
---
# Source: nfs-server-provisioner/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-release-nfs-server-provisioner
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
spec:
  type: ClusterIP
  ports:
    - port: 2049
      targetPort: nfs
      protocol: TCP
      name: nfs
    - port: 2049
      targetPort: nfs-udp
      protocol: UDP
      name: nfs-udp
    - port: 32803
      targetPort: nlockmgr
      protocol: TCP
      name: nlockmgr
    - port: 32803
      targetPort: nlockmgr-udp
      protocol: UDP
      name: nlockmgr-udp
    - port: 20048
      targetPort: mountd
      protocol: TCP
      name: mountd
    - port: 20048
      targetPort: mountd-udp
      protocol: UDP
      name: mountd-udp
    - port: 875
      targetPort: rquotad
      protocol: TCP
      name: rquotad
    - port: 875
      targetPort: rquotad-udp
      protocol: UDP
      name: rquotad-udp
    - port: 111
      targetPort: rpcbind
      protocol: TCP
      name: rpcbind
    - port: 111
      targetPort: rpcbind-udp
      protocol: UDP
      name: rpcbind-udp
    - port: 662
      targetPort: statd
      protocol: TCP
      name: statd
    - port: 662
      targetPort: statd-udp
      protocol: UDP
      name: statd-udp
  selector:
    app: nfs-server-provisioner
    release: my-release
---
# Source: nfs-server-provisioner/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-release-nfs-server-provisioner
  labels:
    app: nfs-server-provisioner
    chart: nfs-server-provisioner-1.8.0
    heritage: Helm
    release: my-release
spec:
  # TODO: Investigate how/if nfs-provisioner can be scaled out beyond 1 replica
  replicas: 1
  selector:
    matchLabels:
      app: nfs-server-provisioner
      release: my-release
  serviceName: my-release-nfs-server-provisioner
  template:
    metadata:
      labels:
        app: nfs-server-provisioner
        chart: nfs-server-provisioner-1.8.0
        heritage: Helm
        release: my-release
    spec:
      # NOTE: This is 10 seconds longer than the default nfs-provisioner --grace-period value of 90sec
      terminationGracePeriodSeconds: 100
      serviceAccountName: my-release-nfs-server-provisioner
      containers:
        - name: nfs-server-provisioner
          image: "registry.k8s.io/sig-storage/nfs-provisioner:v4.0.8"
          imagePullPolicy: IfNotPresent
          ports:
            - name: nfs
              containerPort: 2049
              protocol: TCP
            - name: nfs-udp
              containerPort: 2049
              protocol: UDP
            - name: nlockmgr
              containerPort: 32803
              protocol: TCP
            - name: nlockmgr-udp
              containerPort: 32803
              protocol: UDP
            - name: mountd
              containerPort: 20048
              protocol: TCP
            - name: mountd-udp
              containerPort: 20048
              protocol: UDP
            - name: rquotad
              containerPort: 875
              protocol: TCP
            - name: rquotad-udp
              containerPort: 875
              protocol: UDP
            - name: rpcbind
              containerPort: 111
              protocol: TCP
            - name: rpcbind-udp
              containerPort: 111
              protocol: UDP
            - name: statd
              containerPort: 662
              protocol: TCP
            - name: statd-udp
              containerPort: 662
              protocol: UDP
          securityContext:
            capabilities:
              add:
              - DAC_READ_SEARCH
              - SYS_RESOURCE
          args:
            - "-provisioner=cluster.local/my-release-nfs-server-provisioner"
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: SERVICE_NAME
              value: my-release-nfs-server-provisioner
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          volumeMounts:
            - name: data
              mountPath: /export
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: "1Gi"
Create 2 Pods
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-1
spec:
  containers:
    - name: test
      image: nginx
      volumeMounts:
        - name: config
          mountPath: /test
  volumes:
    - name: config
      persistentVolumeClaim:
        claimName: test-dynamic-volume-claim
---
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-2
spec:
  containers:
    - name: test
      image: nginx
      volumeMounts:
        - name: config
          mountPath: /test
  volumes:
    - name: config
      persistentVolumeClaim:
        claimName: test-dynamic-volume-claim
Create the PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-dynamic-volume-claim
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi
  1. Action(s):
  • Create Pod with volume and try to mount nfs volume OR
  • Try to mount a volume "manually" without the help of kubernetes (also doesn't work)
  1. Error:
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=4.2,noatime,nodiratime,proto=tcp,retrans=3,rsize=4096,wsize=4096 100.66.22.242:/export/pvc-073d5904-3423-4206-8062-fbfd1b56bcba /var/lib/kubelet/pods/80ef75b6-a18f-4260-a14a-ef6d557fb1e8/
volumes/kubernetes.io~nfs/pvc-073d5904-3423-4206-8062-fbfd1b56bcba
Output: mount.nfs: Protocol not supported

Expected behavior

Succesfull mount.

Additional information

/

@jepio
Copy link
Member

jepio commented Nov 3, 2023

Thanks for the report. Checking this out and since this is not the first time we'll need to add a test case to our test suite (greatly appreciate you posting your whole setup code).

@jepio
Copy link
Member

jepio commented Nov 3, 2023

This seems to break between 3602.2.0 and 3602.2.1, so it's most definitely a kernel regression from 5.15.133 to 5.15.136.
I can repro with a nfs-ganesha 4.3, but not with the kernel nfsd.

@jepio
Copy link
Member

jepio commented Nov 3, 2023

https://elixir.bootlin.com/linux/v5.15.136/source/fs/nfs/nfs4client.c#L1070:

static int nfs4_server_common_setup(struct nfs_server *server,
		struct nfs_fh *mntfh, bool auth_probe)
{
	struct nfs_fattr *fattr;
	int error;

	/* data servers support only a subset of NFSv4.1 */
	if (is_ds_only_client(server->nfs_client))
		return -EPROTONOSUPPORT;

	fattr = nfs_alloc_fattr();
	if (fattr == NULL)
		return -ENOMEM;

	/* We must ensure the session is initialised first */
	error = nfs4_init_session(server->nfs_client);
	if (error < 0)
		goto out;

placing the following kprobes at entry and return to nfs4_server_common_setup and nfs4_init_session:

p:kprobes/p_nfs4_server_common_setup_0 nfs4_server_common_setup
r:kprobes/r_nfs4_server_common_setup_0 nfs4_server_common_setup ret=$retval:s32
p:kprobes/p_nfs4_init_session_0 nfs4_init_session
r:kprobes/r_nfs4_init_session_0 nfs4_init_session ret=$retval:s32

shows that for vers=4.1 we hit that first return:

# nfs vers=4.0
mount.nfs-1485    [003] .....   493.559783: p_nfs4_server_common_setup_0: (nfs4_server_common_setup+0x0/0x180 [nfsv4])
mount.nfs-1485    [003] .....   493.559818: p_nfs4_init_session_0: (nfs4_init_session+0x0/0x70 [nfsv4])
mount.nfs-1485    [003] .....   493.559820: r_nfs4_init_session_0: (nfs4_server_common_setup+0x33/0x180 [nfsv4] <- nfs4_init_session) ret=0
mount.nfs-1485    [003] .....   493.594372: r_nfs4_server_common_setup_0: (nfs4_create_server+0x1ca/0x360 [nfsv4] <- nfs4_server_common_setup) ret=0
# nfs vers=4.1
mount.nfs-1505    [006] .....   511.516355: p_nfs4_server_common_setup_0: (nfs4_server_common_setup+0x0/0x180 [nfsv4])
mount.nfs-1505    [006] .....   511.516400: r_nfs4_server_common_setup_0: (nfs4_create_server+0x1ca/0x360 [nfsv4] <- nfs4_server_common_setup) ret=-93

@jepio
Copy link
Member

jepio commented Nov 3, 2023

$ git log --oneline  v5.15.136..v5.15.137 fs/nfs*
431a5010bce2 NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server
5762e72ef1b0 pNFS: Fix a hang in nfs4_evict_inode()

gregkh/linux@431a501 seems most likely to be related. 5.15.137 is already part of nightlies for the flatcar-3602 branch, I just tested with https://bincache.flatcar-linux.net/images/amd64/3602.2.1+nightly-20231102-2100/ and this bug is fixed there.

Mystery solved.

@jepio
Copy link
Member

jepio commented Nov 3, 2023

@tormath1 we could use a mantle test case for nfs-ganesha, either with the CSI integration above or more simply with this config:

EXPORT
{
    # Export Id (mandatory, each EXPORT must have a unique Export_Id)
    Export_Id = 1;

    # Exported path (mandatory)
    Path = /tmp;

    # Pseudo Path (required for NFS v4)
    Pseudo = /tmp;

    # Access control options
    Access_Type = RW;
    Squash = No_Root_Squash;

    # NFS protocol options
    Transports = TCP;
    Protocols = 4;

    # Exporting FSAL
    FSAL {
        Name = VFS;
    }
}

@mnbro
Copy link

mnbro commented Nov 11, 2023

$ git log --oneline  v5.15.136..v5.15.137 fs/nfs*
431a5010bce2 NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server
5762e72ef1b0 pNFS: Fix a hang in nfs4_evict_inode()

gregkh/linux@431a501 seems most likely to be related. 5.15.137 is already part of nightlies for the flatcar-3602 branch, I just tested with https://bincache.flatcar-linux.net/images/amd64/3602.2.1+nightly-20231102-2100/ and this bug is fixed there.

Mystery solved.

@jepio Do you know/estimate when this fix will be available in Stable channel ?

@jepio jepio moved this from 📝 Needs Triage to Implemented in Flatcar tactical, release planning, and roadmap Nov 13, 2023
@jepio
Copy link
Member

jepio commented Nov 13, 2023

With the next patch release in the stable channel, which according to our release tracking board is scheduled for next week.
https://github.com/orgs/flatcar/projects/7/views/8

@mnbro
Copy link

mnbro commented Nov 13, 2023

With the next patch release in the stable channel, which according to our release tracking board is scheduled for next week. https://github.com/orgs/flatcar/projects/7/views/8

Thank you!

@t-lo
Copy link
Member

t-lo commented Nov 13, 2023

We've also added a ganesha NFS server smoke test to our release test suite so we'll be able to catch issues like this one before releasing updates in the future.

@nschad
Copy link
Author

nschad commented Dec 15, 2023

Issue resolved with 3602.2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel Issues related to kernel kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants