configuration options from the install.yml does not seem to work #128

gouthamreddykotapalle · 2020-06-24T05:56:00Z

Hello team,

I added binpack: tightly-pack config to the install.yml and rebuilt the image. The newly built image still does not seem to support this binpacking option. The same happens with fifo: true

I currently have a 4 node cluster with each node containining 4 allocatable cores and 2g or allocatable memory. I submitted my spark job with the following configirations -

spark.driver.memory=1g
spark.executor.memory=512m

spark.driver.cores=1
spark.executor.cores=1

spark.executor.instances=2

Ideally, with the binpack:tightly-pack, all the executors need to be scheduled to the same node which does not seem to happen.

onursatici · 2020-06-30T10:01:53Z

Hello @Gouthamkreddy1234 , how are you mounting the config file?
by default the built container will use the config located at /opt/palantir/services/spark-scheduler/var/conf/install.yml
checkout the example podspec here:
https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml#L127

gouthamreddykotapalle · 2020-07-01T05:36:28Z

Hi @onursatici, I am modifying this file k8s-spark-scheduler/docker/var/conf/install.yml in my local repo as well as the configMap https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml#L66 and both seem point to the same mount path - /opt/palantir/services/spark-scheduler/var/conf/install.yml.

The configMap data is what I think is finally is used in my case when I use the Dockerfile present in the repo and then use this manifest file - https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml, but both the FIFO and the binpacking features do not seem to work for me.

Am I missing anything else here?

onursatici · 2020-07-07T12:15:22Z

thanks for the info @Gouthamkreddy1234. Strange, altering the configmap should work. Can you verify that /opt/palantir/services/spark-scheduler/var/conf/install.yml has your changes when you do kubectl apply -f examples/extender.yml by ssh'ing into the created pod?

how are you validating that the config changes are not taking effect? for FIFO one way to test this is to submit a large application that won't be able to fit to your cluster followed by a smaller one, and if FIFO is enabled, the smaller application should be stuck in pending until you remove the larger application.

One thing to note for selecting the binpack algorithm, anything else than the accepted values would default to distribute-evenly.

Ideally we warn in these scenarios: #131

gouthamreddykotapalle · 2020-07-07T20:03:47Z

Yes, I ssh'd to the container and could see the configMap mounted there which has fifo enabled. Also, I did re-create the FIFO scenario you mentioned and I can see the smaller pod being scheduled while the bigger one is still in pending state (which is not expected behaviour). I am not sure what I am missing. I am running the latest version of the scheduler-extender images.

Am I missing anything here?

P.S: I am working with the master branch.

onursatici · 2020-07-09T15:23:16Z

got it, finally, can you check if the pods you are creating have spark-scheduler as the schedulerName in their spec?

you can also use this script to simulate a spark application launch: https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/submit-test-spark-app.sh#L31

gouthamreddykotapalle · 2020-07-09T17:16:45Z

Yes, the pods are being scheduled by spark-scheduler.

I am not sure what else I might be missing here. Any thoughts?

onursatici · 2020-07-20T16:14:40Z

Hey @Gouthamkreddy1234 had a look at this. So within the extender, it is assumed that nodes have a configurable label, and the value of this label dictates which group a node is in. FIFO order is preserved for applications waiting to be scheduled for the same group. I have updated the examples to include that label here: #134 .
you would also need to add this label (instance-group for the example by default) to the nodes that you are planning to schedule spark applications to, and set the nodeSelector for the spark pods to match that

onursatici · 2020-07-20T16:18:21Z

I can confirm that with that change, if I submit a large application with:

# 10^10 cpu requests for the driver, this will be stuck Pending
./submit-test-spark-app.sh 1 2 100G 100 100m 200

then submit a smaller application

# smaller app with 2 executors that can fit
./submit-test-spark-app.sh 2 2 100m 100 100m 200

the smaller application will be blocked in Pending until I kill the larger application.

gouthamreddykotapalle changed the title ~~binpack configuration option does not seem to work~~ configuration options from the install.yml does not seem to work Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configuration options from the install.yml does not seem to work #128

configuration options from the install.yml does not seem to work #128

gouthamreddykotapalle commented Jun 24, 2020 •

edited

Loading

onursatici commented Jun 30, 2020

gouthamreddykotapalle commented Jul 1, 2020

onursatici commented Jul 7, 2020

gouthamreddykotapalle commented Jul 7, 2020

onursatici commented Jul 9, 2020

gouthamreddykotapalle commented Jul 9, 2020 •

edited

Loading

onursatici commented Jul 20, 2020

onursatici commented Jul 20, 2020

configuration options from the install.yml does not seem to work #128

configuration options from the install.yml does not seem to work #128

Comments

gouthamreddykotapalle commented Jun 24, 2020 • edited Loading

onursatici commented Jun 30, 2020

gouthamreddykotapalle commented Jul 1, 2020

onursatici commented Jul 7, 2020

gouthamreddykotapalle commented Jul 7, 2020

onursatici commented Jul 9, 2020

gouthamreddykotapalle commented Jul 9, 2020 • edited Loading

onursatici commented Jul 20, 2020

onursatici commented Jul 20, 2020

gouthamreddykotapalle commented Jun 24, 2020 •

edited

Loading

gouthamreddykotapalle commented Jul 9, 2020 •

edited

Loading