Compare runtime/cost between high-cpu and standard cluster #46

tomcordruw · 2024-08-27T08:26:35Z

Run argo workflow processing significant chunk of a dataset (~3 million events) and compare the results and runtime for a standard cluster and one with higher vCPU count.

tomcordruw · 2024-09-16T04:43:18Z

Edit:
Took into account the time for the plotting step in GCS workflows, which is missing in the NFS version.
The time shown is now the duration without the plotting step, the total with the plotting step included is shown in brackets.

Configuration:

events: 3000000
jobs: 48
recid: 30544
region: "europe-north1-b"
nodes: 12
disk type: pd-standard

Results:
NFS:

e2-standard-4: 4 hours 37 minutes
Cost: 7.81 CHF
e2-highcpu-16: 3 hours 16 minutes
Cost: 16.38 CHF

GCS Bucket:

argo_bucket_run.yaml:

e2-standard-4: 4 hours 33 minutes (4 hours 57 minutes)
Cost: 9.01 CHF
e2-highcpu-16: 3 hours 24 minutes (3 hours 46 minutes)
Cost: 18.72 CHF

argo_bucket_upload.yaml:

e2-standard-4: 4 hours 55 minutes (5 hours 17 minutes)
Cost: 10.67 CHF
e2-highcpu-16: 3 hours 36 minutes (3 hours 59 minutes)
Cost: 18.81 CHF

katilp · 2024-09-19T07:27:38Z

@tomcordruw Is the time of the bucket workflows without the final plotting step? If not, can you see from the outputs, how long did it take?

tomcordruw · 2024-09-19T07:37:49Z

@tomcordruw Is the time of the bucket workflows without the final plotting step? If not, can you see from the outputs, how long did it take?

Oh, that would explain it, I didn't realise that step was missing in the nfs workflow.
The plotting step is included in the total runtime here, and in the tests it took between 20-25 minutes, which pretty much accounts for the difference.

katilp · 2024-09-20T09:13:56Z

@tomcordruw Did these jobs run with the image on the node already or does the time include the image pull?
We need to have the time without the image pull for a scalable comparison. Currently, the image pull is more than 30 mins and may vary so it can distort the comparison.

tomcordruw · 2024-09-20T09:27:52Z

@tomcordruw Did these jobs run with the image on the node already or does the time include the image pull? We need to have the time without the image pull for a scalable comparison. Currently, the image pull is more than 30 mins and may vary so it can distort the comparison.

The time unfortunately includes the image pull, but I am currently testing the script after some modifications to initially run the start job and pull the images.
From what I can tell, image pulling/pod initialisation takes 31-32 minutes in these configurations which is in line with the difference between the workflows I have been running with/without previously pulling the images.

But of course there can be errors and other things prolonging the image pulling step, so it will be accounted for from now on.

katilp · 2024-09-20T09:52:58Z

@tomcordruw Is this a fair comparison?

e2-standard-4: 4 vCPUs, 16 GB mem
e2-highcpu-16: 16 vCPUs, 16 GB mem.

If N jobs is 48, a 12-node e2-highcpu-16 cluster is mostly idle.

CPU-wise it could have had 12 jobs on each node (0.8 * 16 because we requested 800m CPU)
Memory-wise 6.
Now it most likely had 4 only (if the 48 jobs were evenly distributed to nodes), or many nodes were idle.
And the cost goes with the time, not with the occupancy.

A fair comparison would be how many events / hour we can get with the maximum occupancy.

For memory requests, as seen in #49 (comment) we could most likely set it lower than 2.3GB, e.g. 1.5GB would allow ~ 10 jobs/node

tomcordruw · 2024-09-20T11:20:54Z

@katilp
Indeed, what I'm seeing supports what you're writing.
And yes, the cost is based on time, not resource usage, so I will try lowering the resource requests and see how well the highcpu-clusters can be utilised that way.

The resource usage I'm getting so far indicates 1:2 ratio for vCPU (800m) to memory (~1.6GB) for each job.
While there is no fitting e2 machine type (standard is 1:4 and highcpu is 1:1), it can be achieved with custom machine types, so that way we could lower the amount of unused resources.

katilp · 2024-09-20T11:25:06Z

Right, but the first thing is to have a big enough number of jobs so that it really fills the cluster. The number of jobs might need to be different to compare the two types of clusters. Or less nodes in the high-CPU cluster. What matters is the total number of CPUs.

tomcordruw · 2024-09-20T11:29:13Z

Right, so e.g. for 12 e2-highcpu-16 nodes, after adjusting resource requests, it should allow 10 jobs per node, meaning 120 jobs total, or alternatively 5 nodes for 48 jobs to have a fair comparison?

katilp · 2024-09-20T11:34:38Z

Right, so e.g. for 12 e2-highcpu-16 nodes, after adjusting resource requests, it should allow 10 jobs per node, meaning 120 jobs total, or alternatively 5 nodes for 48 jobs to have a fair comparison?

Yes, something like this. It probably requires some manual inspection. Best to start a workflow and see how they go. If there are "left-overs", i.e. jobs that do not fit running parallel, then decrease the number of jobs so that all pfnano steps go at the same time.

tomcordruw · 2024-09-20T11:43:00Z

Okay, seems clear to me now!
I will do some runs and inspect how things behave and update the comparisons accordingly.

tomcordruw self-assigned this Aug 27, 2024

tomcordruw mentioned this issue Sep 20, 2024

Compare streaming vs file upload #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare runtime/cost between high-cpu and standard cluster #46

Compare runtime/cost between high-cpu and standard cluster #46

tomcordruw commented Aug 27, 2024

tomcordruw commented Sep 16, 2024 •

edited

Loading

katilp commented Sep 19, 2024

tomcordruw commented Sep 19, 2024

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024

katilp commented Sep 20, 2024 •

edited

Loading

tomcordruw commented Sep 20, 2024 •

edited

Loading

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024 •

edited

Loading

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024

Compare runtime/cost between high-cpu and standard cluster #46

Compare runtime/cost between high-cpu and standard cluster #46

Comments

tomcordruw commented Aug 27, 2024

tomcordruw commented Sep 16, 2024 • edited Loading

katilp commented Sep 19, 2024

tomcordruw commented Sep 19, 2024

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024

katilp commented Sep 20, 2024 • edited Loading

tomcordruw commented Sep 20, 2024 • edited Loading

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024 • edited Loading

katilp commented Sep 20, 2024

tomcordruw commented Sep 20, 2024

tomcordruw commented Sep 16, 2024 •

edited

Loading

katilp commented Sep 20, 2024 •

edited

Loading

tomcordruw commented Sep 20, 2024 •

edited

Loading

tomcordruw commented Sep 20, 2024 •

edited

Loading