executor pod schedule stucked with enough resource #152

askeySnip · 2020-10-28T13:44:59Z

when i submit a batch of spark jobs, it runs doesn't like the expection.
Some executor pods stucking although there are enough resources in each node for it to run.
It annoyed me, and I wonder if there is something that I don't considered.
ps. I run these spark jobs like the example and it works ok for running a single job

onursatici · 2020-12-10T11:50:19Z

@askeySnip can you describe the stuck driver pods and share the scheduling errors?

chia7712 · 2021-07-10T12:06:02Z

I encountered similar issue.

There are two nodes in my cluster. Also, there are two spark jobs and the total resources they required are larger than the cluster (i.e the k8s cluster can't run both jobs concurrently). If I submit second job after all executors of first job are running, it works well. However, some pods get hang (see following screenshot) if I submit two jobs at the same time. I traced the log and it seems that scheduler predicate the node (assign the resource) for both jobs at the same time. Hence, some pods can't get enough resources.

Is it the expected behavior? Can it be configured that scheduler predicates second job only if first job has been scheduled successfully? Or we should NOT submit jobs at the same time?

chia7712 mentioned this issue Jul 12, 2021

add more explaination about memory annotations #173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor pod schedule stucked with enough resource #152

executor pod schedule stucked with enough resource #152

askeySnip commented Oct 28, 2020

onursatici commented Dec 10, 2020

chia7712 commented Jul 10, 2021

executor pod schedule stucked with enough resource #152

executor pod schedule stucked with enough resource #152

Comments

askeySnip commented Oct 28, 2020

onursatici commented Dec 10, 2020

chia7712 commented Jul 10, 2021