Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovery can't be ready. failed to read cpufreq directory #1116

Open
loprx opened this issue Nov 17, 2024 · 2 comments
Open

discovery can't be ready. failed to read cpufreq directory #1116

loprx opened this issue Nov 17, 2024 · 2 comments

Comments

@loprx
Copy link

loprx commented Nov 17, 2024

Env

  1. K8s version: 1.23.6-0.x86_64
  2. Docker version: 20.10.0-3.el7.x86_64
  3. OS version: Centos7.9
  4. CPU: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
  5. gpu-operator: gpu-operator-v24.9.0

Desc

The many pod keep running, but ready is none.

pod/gpu-operator-1731802759-node-feature-discovery-master-5bcbfw57w   0/1     Running     0          18m
pod/gpu-operator-1731802759-node-feature-discovery-worker-cb9n5       0/1     Running     0          14m
pod/gpu-operator-1731802759-node-feature-discovery-worker-ht7xc       0/1     Running     0          18m

See logs, discovery "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory", but my OS this folder in the /sys/devices/system/cpu/cpu0/cpufreq.

The pod discovery-worker logs:

[root@master k8s]# kubectl logs pod/gpu-operator-1731802759-node-feature-discovery-worker-cb9n5  -n gpu-operator
I1117 00:24:45.367644       1 nfd-worker.go:293] "Node Feature Discovery Worker" version="v0.16.6" nodeName="node" namespace="gpu-operator"
I1117 00:24:45.369355       1 nfd-worker.go:622] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
I1117 00:24:45.369745       1 nfd-worker.go:654] "configuration successfully updated" configuration={"Core":{"Klog":{},"LabelWhiteList":"","NoPublish":false,"FeatureSources":["all"],"Sources":null,"LabelSources":["all"],"SleepInterval":{"Duration":60000000000}},"Sources":{"cpu":{"cpuid":{"attributeBlacklist":["AVX10","BMI1","BMI2","CLMUL","CMOV","CX16","ERMS","F16C","HTT","LZCNT","MMX","MMXEXT","NX","POPCNT","RDRAND","RDSEED","RDTSCP","SGX","SGXLC","SSE","SSE2","SSE3","SSE4","SSE42","SSSE3","TDX_GUEST"]}},"custom":[],"fake":{"labels":{"fakefeature1":"true","fakefeature2":"true","fakefeature3":"true"},"flagFeatures":["flag_1","flag_2","flag_3"],"attributeFeatures":{"attr_1":"true","attr_2":"false","attr_3":"10"},"instanceFeatures":[{"attr_1":"true","attr_2":"false","attr_3":"10","attr_4":"foobar","name":"instance_1"},{"attr_1":"true","attr_2":"true","attr_3":"100","name":"instance_2"},{"name":"instance_3"}]},"kernel":{"KconfigFile":"","configOpts":["NO_HZ","NO_HZ_IDLE","NO_HZ_FULL","PREEMPT"]},"local":{},"pci":{"deviceClassWhitelist":["02","0200","0207","0300","0302"],"deviceLabelFields":["vendor"]},"usb":{"deviceClassWhitelist":["0e","ef","fe","ff"],"deviceLabelFields":["class","vendor","device"]}}}
I1117 00:24:45.390700       1 metrics.go:44] "metrics server starting" port=":8081"
E1117 00:24:45.391236       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:24:45.495558       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:24:45.496304       1 nfd-worker.go:677] "feature discovery completed"
I1117 00:24:45.499327       1 nfd-worker.go:805] "creating NodeFeature object" nodefeature="node"
I1117 00:24:45.671650       1 component.go:36] [core][Server #1]Server created
I1117 00:24:45.671720       1 nfd-worker.go:247] "gRPC health server serving" port=8082
I1117 00:24:45.671854       1 component.go:36] [core][Server #1 ListenSocket #2]ListenSocket created
E1117 00:25:45.399456       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:25:45.496530       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:25:45.497350       1 nfd-worker.go:677] "feature discovery completed"
I1117 00:25:45.601673       1 nfd-worker.go:826] "updating NodeFeature object" nodefeature="gpu-operator/node"
E1117 00:26:45.399680       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:26:45.495297       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:26:45.495969       1 nfd-worker.go:677] "feature discovery completed"
E1117 00:27:45.399216       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:27:45.494992       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:27:45.495830       1 nfd-worker.go:677] "feature discovery completed"
E1117 00:28:45.374658       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:28:45.475723       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:28:45.476515       1 nfd-worker.go:677] "feature discovery completed"
E1117 00:29:45.384286       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:29:45.480716       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:29:45.481588       1 nfd-worker.go:677] "feature discovery completed"
E1117 00:30:45.384435       1 pstate_amd64.go:75] "failed to read cpufreq directory" err="open /host-sys/devices/system/cpu/cpufreq: no such file or directory"
I1117 00:30:45.480722       1 nfd-worker.go:664] "starting feature discovery..."
I1117 00:30:45.481500       1 nfd-worker.go:677] "feature discovery completed"

Thank you.

@loprx loprx changed the title "failed to read cpufreq directory discovery-worker can't be ready. failed to read cpufreq directory Nov 17, 2024
@loprx
Copy link
Author

loprx commented Nov 17, 2024

Supplement:
This is cpupower result.

cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.20 GHz - 3.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1.20 GHz and 3.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: 1.80 GHz (asserted by call to hardware)
  boost state support:
    Supported: yes
    Active: yes

@loprx loprx changed the title discovery-worker can't be ready. failed to read cpufreq directory discovery can't be ready. failed to read cpufreq directory Nov 27, 2024
@loprx
Copy link
Author

loprx commented Nov 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@loprx and others