-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare runtime/cost between high-cpu and standard cluster #46
Comments
Edit: Configuration:
Results:
GCS Bucket: argo_bucket_run.yaml:
argo_bucket_upload.yaml:
|
@tomcordruw Is the time of the bucket workflows without the final plotting step? If not, can you see from the outputs, how long did it take? |
Oh, that would explain it, I didn't realise that step was missing in the nfs workflow. |
@tomcordruw Did these jobs run with the image on the node already or does the time include the image pull? |
The time unfortunately includes the image pull, but I am currently testing the script after some modifications to initially run the start job and pull the images. But of course there can be errors and other things prolonging the image pulling step, so it will be accounted for from now on. |
@tomcordruw Is this a fair comparison? e2-standard-4: 4 vCPUs, 16 GB mem If N jobs is 48, a 12-node e2-highcpu-16 cluster is mostly idle. CPU-wise it could have had 12 jobs on each node (0.8 * 16 because we requested 800m CPU) A fair comparison would be how many events / hour we can get with the maximum occupancy. For memory requests, as seen in #49 (comment) we could most likely set it lower than 2.3GB, e.g. 1.5GB would allow ~ 10 jobs/node |
@katilp The resource usage I'm getting so far indicates 1:2 ratio for vCPU (800m) to memory (~1.6GB) for each job. |
Right, but the first thing is to have a big enough number of jobs so that it really fills the cluster. The number of jobs might need to be different to compare the two types of clusters. Or less nodes in the high-CPU cluster. What matters is the total number of CPUs. |
Right, so e.g. for 12 e2-highcpu-16 nodes, after adjusting resource requests, it should allow 10 jobs per node, meaning 120 jobs total, or alternatively 5 nodes for 48 jobs to have a fair comparison? |
Yes, something like this. It probably requires some manual inspection. Best to start a workflow and see how they go. If there are "left-overs", i.e. jobs that do not fit running parallel, then decrease the number of jobs so that all pfnano steps go at the same time. |
Okay, seems clear to me now! |
Run argo workflow processing significant chunk of a dataset (~3 million events) and compare the results and runtime for a standard cluster and one with higher vCPU count.
The text was updated successfully, but these errors were encountered: