Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to wait for hpa-observer-controller caches to sync:(Support k8s version >= 1.26) #828

Open
maxiaoyin opened this issue Jun 25, 2023 · 5 comments
Labels
kind/bug Something isn't working

Comments

@maxiaoyin
Copy link

maxiaoyin commented Jun 25, 2023

No description provided.

@maxiaoyin maxiaoyin added the kind/bug Something isn't working label Jun 25, 2023
@maxiaoyin
Copy link
Author

maxiaoyin commented Jun 25, 2023

craned: 0.10

kube-service version:
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.5", GitCommit:"890a139214b4de1f01543d15003b5bda71aae9c7", GitTreeState:"clean", BuildDate:"2023-05-17T14:08:49Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

craned exception
precondition environment:

image

full log:

I0625 15:06:09.201423 1 nodelocal.go:25] Registering node local metrics collector cpu
I0625 15:06:09.201485 1 nodelocal.go:25] Registering node local metrics collector cpuLoad
I0625 15:06:09.201493 1 nodelocal.go:25] Registering node local metrics collector diskio
I0625 15:06:09.201496 1 nodelocal.go:25] Registering node local metrics collector memory
I0625 15:06:09.201499 1 nodelocal.go:25] Registering node local metrics collector netio
I0625 15:06:09.238990 1 predictor.go:121] predictors map[dsp:0xc00074b860 percentile:0xc0001b86c0]
I0625 15:06:09.247019 1 webhook.go:103] Succeed to setup autoscaling webhook
I0625 15:06:09.247433 1 config.go:59] Load recommendation framework configuration set successfully.
I0625 15:06:09.247453 1 manager.go:142] Recommendation Config updated.
I0625 15:06:09.247648 1 manager.go:98] Start watching /tmp/recommendation-config/config.yaml for update.
I0625 15:06:09.251416 1 predictor.go:141] predictor manager started, all predictors started
I0625 15:06:09.251443 1 prediction.go:151] predictor Periodic started
I0625 15:06:09.251493 1 leaderelection.go:248] attempting to acquire leader lease crane-system/craned...
I0625 15:06:09.251727 1 prediction.go:302] predictor Percentile started
I0625 15:06:09.252360 1 reflector.go:219] Starting reflector *v1.Pod (9h47m21.860077463s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:09.252372 1 reflector.go:255] Listing and watching *v1.Pod from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:09.255728 1 server.go:94] install crane api server middleware: log
I0625 15:06:09.255741 1 server.go:94] install crane api server middleware: cors
I0625 15:06:09.255748 1 server.go:94] install crane api server middleware: recovery
I0625 15:06:09.255967 1 server.go:149] Start to listening on http address: 0.0.0.0:8082
I0625 15:06:09.259736 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired
I0625 15:06:09.260490 1 leaderelection.go:253] failed to acquire lease crane-system/craned
I0625 15:06:09.352812 1 shared_informer.go:270] caches populated
I0625 15:06:12.141707 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired
I0625 15:06:12.141722 1 leaderelection.go:253] failed to acquire lease crane-system/craned
I0625 15:06:15.283816 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired
I0625 15:06:15.283836 1 leaderelection.go:253] failed to acquire lease crane-system/craned
I0625 15:06:18.609192 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired
I0625 15:06:18.609212 1 leaderelection.go:253] failed to acquire lease crane-system/craned
I0625 15:06:20.949090 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired
I0625 15:06:20.949107 1 leaderelection.go:253] failed to acquire lease crane-system/craned
I0625 15:06:23.294204 1 reflector.go:219] Starting reflector *v1alpha1.EffectiveHorizontalPodAutoscaler (9h25m32.854325822s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:23.294379 1 reflector.go:255] Listing and watching *v1alpha1.EffectiveHorizontalPodAutoscaler from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:23.394964 1 shared_informer.go:270] caches populated
I0625 15:06:24.968867 1 leaderelection.go:258] successfully acquired lease crane-system/craned
I0625 15:06:24.969457 1 reflector.go:219] Starting reflector *v1alpha1.Recommendation (10h15m41.897892678s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.969470 1 reflector.go:255] Listing and watching *v1alpha1.Recommendation from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.969719 1 reflector.go:219] Starting reflector *v1alpha1.TimeSeriesPrediction (9h11m14.838150609s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.969728 1 reflector.go:255] Listing and watching *v1alpha1.TimeSeriesPrediction from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.969883 1 reflector.go:219] Starting reflector *v1alpha1.RecommendationRule (10h32m7.567628063s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.969894 1 reflector.go:255] Listing and watching *v1alpha1.RecommendationRule from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.970076 1 reflector.go:219] Starting reflector *v1alpha1.Substitute (10h5m54.19539413s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.970152 1 reflector.go:255] Listing and watching *v1alpha1.Substitute from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.997040 1 shared_informer.go:270] caches populated
I0625 15:06:24.997294 1 reflector.go:219] Starting reflector *v1alpha1.EffectiveVerticalPodAutoscaler (9h48m20.736954602s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.997337 1 reflector.go:255] Listing and watching v1alpha1.EffectiveVerticalPodAutoscaler from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.997628 1 reflector.go:219] Starting reflector v1alpha1.Analytics (9h44m11.990737419s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:24.997683 1 reflector.go:255] Listing and watching v1alpha1.Analytics from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:06:25.069290 1 shared_informer.go:270] caches populated
I0625 15:06:25.069344 1 shared_informer.go:270] caches populated
I0625 15:06:25.069354 1 shared_informer.go:270] caches populated
I0625 15:06:25.069366 1 shared_informer.go:270] caches populated
I0625 15:06:25.069389 1 shared_informer.go:270] caches populated
I0625 15:06:25.069427 1 shared_informer.go:270] caches populated
I0625 15:06:25.069525 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-replicas-4ckp5
I0625 15:06:25.070209 1 shared_informer.go:270] caches populated
I0625 15:06:25.070231 1 shared_informer.go:270] caches populated
I0625 15:06:25.070251 1 shared_informer.go:270] caches populated
I0625 15:06:25.070326 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule"
I0625 15:06:25.070375 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule"
I0625 15:06:25.070406 1 shared_informer.go:270] caches populated
I0625 15:06:25.070426 1 shared_informer.go:270] caches populated
I0625 15:06:25.070503 1 shared_informer.go:270] caches populated
I0625 15:06:25.070544 1 shared_informer.go:270] caches populated
I0625 15:06:25.073513 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-resource-cw4n6
I0625 15:06:25.097859 1 shared_informer.go:270] caches populated
I0625 15:06:25.097859 1 shared_informer.go:270] caches populated
I0625 15:06:25.098317 1 shared_informer.go:270] caches populated
I0625 15:06:25.098357 1 shared_informer.go:270] caches populated
I0625 15:07:36.000394 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule"
I0625 15:07:36.008390 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008390 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008402 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008411 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008422 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008430 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008496 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service
I0625 15:07:36.008577 1 recommendation_rule_controller.go:317] Updating RecommendationRule services-rule status
I0625 15:07:36.019167 1 recommendation_rule_controller.go:341] Update RecommendationRule status successful, RecommendationRule services-rule
I0625 15:07:36.019185 1 recommendation_rule_controller.go:100] "Will re-sync" after="5m0s"
I0625 15:07:36.019208 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule"
I0625 15:07:36.035067 1 manager.go:147] RecommendationRule(workloads-rule) Target(default/nginx-deployment): start to run recommender "Resource".
I0625 15:07:36.035219 1 manager.go:147] RecommendationRule(workloads-rule) Target(default/nginx-deployment): start to run recommender "Replicas".
I0625 15:07:36.041595 1 reflector.go:219] Starting reflector v1.ConfigMap (9h18m32.280147413s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:07:36.041737 1 reflector.go:255] Listing and watching v1.ConfigMap from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:07:36.041979 1 prepare.go:38] RecommendationRule(workloads-rule) Target(default/nginx-deployment): Replicas CpuQuery ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx
I0625 15:07:36.042249 1 ctx.go:54] "Prom query directly" query="sum(irate(container_cpu_usage_seconds_total{namespace="default",pod=~"^nginx-deployment-.
",container!=""}[3m]))"
I0625 15:07:36.085756 1 prepare.go:51] RecommendationRule(workloads-rule) Target(default/nginx-deployment): Replicas MemoryQuery ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_memory_Deployment_apps/v1_default_nginx-deployment_app=nginx
I0625 15:07:36.086067 1 ctx.go:54] "Prom query directly" query="sum(container_memory_working_set_bytes{namespace="default",pod=~"^nginx-deployment-.
",container!=""})"
I0625 15:07:36.117458 1 generic.go:80] "Put tuple{query,caller,config} into with channel." query="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" caller="Replicas%!(EXTRA klog.ObjectRef=workloads-rule, types.UID=feefa461-68a3-400b-bbc8-adb5bf57951a)"
I0625 15:07:36.117806 1 ctx.go:54] "Prom query directly" query="sum(irate(container_cpu_usage_seconds_total{namespace="default",pod=~"^nginx-deployment-.
",container!=""}[3m]))"
I0625 15:07:36.141539 1 shared_informer.go:270] caches populated
I0625 15:07:36.141594 1 recommend.go:112] RecommendationRule(workloads-rule) Target(default/nginx-deployment): CPU query for resource request recommendation: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_cpu_default_nginx-deployment_nginx_
I0625 15:07:36.141651 1 config.go:159] "Made an internal config." internalConfig="{aggregated: true, historyLength: 168h0m0s, sampleInterval: 1m0s, histogramDecayHalfLife: 24h0m0s, minSampleWeight: 1e-05, marginFraction: 0.15, percentile: 0.99, targetUtilization: 1}"
I0625 15:07:36.141776 1 prediction.go:166] process analyzing metric namer: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_cpu_default_nginx-deployment_nginx_, config: {aggregated:true historyLength:604800000000000 sampleInterval:60000000000 histogramOptions:0xc00082c3f0 histogramDecayHalfLife:86400000000000 minSampleWeight:1e-05 marginFraction:0.15 percentile:0.99 targetUtilization:1 initMode:history}
I0625 15:07:36.142083 1 ctx.go:54] "Prom query directly" query="irate(container_cpu_usage_seconds_total{container!="POD",namespace="default",pod=~"^nginx-deployment-.
",container="nginx"}[3m])"
I0625 15:07:36.167625 1 recommend.go:123] RecommendationRule(workloads-rule) Target(default/nginx-deployment): container nginx recommended cpu 114m
I0625 15:07:36.167673 1 recommend.go:127] RecommendationRule(workloads-rule) Target(default/nginx-deployment) Memory query for resource request recommendation: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_memory_default_nginx-deployment_nginx_
I0625 15:07:36.167981 1 config.go:159] "Made an internal config." internalConfig="{aggregated: true, historyLength: 168h0m0s, sampleInterval: 1m0s, histogramDecayHalfLife: 48h0m0s, minSampleWeight: 1e-05, marginFraction: 0.15, percentile: 0.99, targetUtilization: 1}"
I0625 15:07:36.168002 1 prediction.go:166] process analyzing metric namer: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_memory_default_nginx-deployment_nginx_, config: {aggregated:true historyLength:604800000000000 sampleInterval:60000000000 histogramOptions:0xc00082c5a0 histogramDecayHalfLife:172800000000000 minSampleWeight:1e-05 marginFraction:0.15 percentile:0.99 targetUtilization:1 initMode:history}
I0625 15:07:36.168491 1 ctx.go:54] "Prom query directly" query="container_memory_working_set_bytes{container!="POD",namespace="default",pod=~"^nginx-deployment-.
",container="nginx"}"
I0625 15:07:36.195154 1 recommend.go:141] RecommendationRule(workloads-rule) Target(default/nginx-deployment): container nginx recommended memory 120586239
I0625 15:07:36.195671 1 manager.go:205] RecommendationRule(workloads-rule) Target(default/nginx-deployment): finish to run recommender "Resource".
I0625 15:07:36.195746 1 recommendation_rule_controller.go:461] Update recommendation default/workloads-rule-resource-cw4n6
I0625 15:07:36.204426 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-resource-cw4n6
I0625 15:07:36.205839 1 recommendation_rule_controller.go:467] Successfully to update Recommendation default/workloads-rule-resource-cw4n6
I0625 15:07:36.205918 1 recommendation_rule_controller.go:416] Mission message: Success
I0625 15:07:36.562205 1 prediction.go:230] "This is not a periodic time series." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" labels=[]
W0625 15:07:36.618335 1 recommend.go:61] RecommendationRule(workloads-rule) Target(default/nginx-deployment): prediction metrics data is unexpected, List length is 0
I0625 15:07:36.618557 1 prediction.go:137] "Unregister a query expression from prediction." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" caller="Replicas%!(EXTRA klog.ObjectRef=workloads-rule, types.UID=feefa461-68a3-400b-bbc8-adb5bf57951a)"
I0625 15:07:36.618698 1 prediction.go:123] "Prediction routine stopped." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx"
I0625 15:07:36.619219 1 recommend.go:159] RecommendationRule(workloads-rule) Target(default/nginx-deployment): WorkloadCpuUsage Percentile 0.001967 PodCpuRequest 150 CPUTargetUtilization 0.500000
I0625 15:07:36.619752 1 recommend.go:181] RecommendationRule(workloads-rule) Target(default/nginx-deployment): WorkloadMemoryUsage Percentile 57456640.000000 PodMemoryRequest 524288000.000000 MemTargetUtilization 0.500000
I0625 15:07:36.619916 1 manager.go:205] RecommendationRule(workloads-rule) Target(default/nginx-deployment): finish to run recommender "Replicas".
I0625 15:07:36.619928 1 recommendation_rule_controller.go:461] Update recommendation default/workloads-rule-replicas-4ckp5
I0625 15:07:36.626356 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-replicas-4ckp5
I0625 15:07:36.627378 1 recommendation_rule_controller.go:467] Successfully to update Recommendation default/workloads-rule-replicas-4ckp5
I0625 15:07:36.627395 1 recommendation_rule_controller.go:416] Mission message: Success
I0625 15:07:36.627417 1 recommendation_rule_controller.go:317] Updating RecommendationRule workloads-rule status
I0625 15:07:36.638573 1 recommendation_rule_controller.go:341] Update RecommendationRule status successful, RecommendationRule workloads-rule
I0625 15:07:36.638595 1 recommendation_rule_controller.go:100] "Will re-sync" after="5m0s"
I0625 15:07:36.638625 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule"
I0625 15:07:36.638723 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule"
I0625 15:08:24.970904 1 reflector.go:225] Stopping reflector *v1alpha1.EffectiveHorizontalPodAutoscaler (9h25m32.854325822s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.970968 1 reflector.go:225] Stopping reflector *v1.ConfigMap (9h18m32.280147413s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971038 1 reflector.go:225] Stopping reflector *v1.Pod (9h47m21.860077463s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971076 1 reflector.go:225] Stopping reflector *v1alpha1.RecommendationRule (10h32m7.567628063s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971112 1 reflector.go:225] Stopping reflector *v1alpha1.Recommendation (10h15m41.897892678s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971150 1 reflector.go:225] Stopping reflector *v1alpha1.TimeSeriesPrediction (9h11m14.838150609s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971169 1 reflector.go:225] Stopping reflector *v1alpha1.Analytics (9h44m11.990737419s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971200 1 reflector.go:225] Stopping reflector *v1alpha1.EffectiveVerticalPodAutoscaler (9h48m20.736954602s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
I0625 15:08:24.971223 1 reflector.go:225] Stopping reflector *v1alpha1.Substitute (10h5m54.19539413s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167
E0625 15:08:24.972469 1 manager.go:430] "problem running crane manager" err="failed to wait for hpa-observer-controller caches to sync: timed out waiting for cache to be synced"
F0625 15:08:24.972482 1 manager.go:431] failed to wait for hpa-observer-controller caches to sync: timed out waiting for cache to be synced

@maxiaoyin
Copy link
Author

I am so worry, I need to know what happen, and how can i to fix it , thank you very mush

@qmhu
Copy link
Member

qmhu commented Jun 26, 2023

in 1.26+ version, you should disable autoscaling when starting craned.

@Leynakindle
Copy link

Leynakindle commented Jul 6, 2023

in 1.26+ version, you should disable autoscaling when starting craned.

Thank you for the reminder. Additionally, I encountered some issues while attempting to disable autoscaling. I'm using the RKE2 cluster (v1.27.2+rke2r1) for configuration, and I would like to seek your advice on this matter. Thank you very much. Reference links: https://docs.rke2.io/install/configuration, https://docs.rke2.io/reference/server_config.

Some screenshots below:
craned log
截屏2023-07-06 18 30 08

cluster info
WechatIMG400

config file for rke2-server
WechatIMG427

@qmhu
Copy link
Member

qmhu commented Jul 24, 2023

we are maintaining versions that supporting v1.26+ in branch https://github.com/gocrane/crane/tree/support-1-26. You can build image from that branch.

@qmhu qmhu changed the title failed to wait for hpa-observer-controller caches to sync: failed to wait for hpa-observer-controller caches to sync:(Support k8s version >= 1.26) Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants