-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to the automation scripts #49
Comments
For the record, some values:
This indicates that the requested value for the memory 2.3 GB is too high, we could lower that. This was a short 1000 event job. However, I believe that in most of the tested cases, the number of jobs running parallel was limited by the CPU request. Also, for longer jobs, the memory consumption rises but then stays constant at ~1.4 GB:
Confirming with a longer job that the memory consumption stays within these numbers, after 40k events:
|
1. Start workflow
As discussed, add a function to run the start workflow, it must be run to get the images on the node before the actual run to avoid several simultaneous image pulls on the same node.
Add a monitor_start function that samples the resource usage values of nodes and all
runpfnano
pods.For nodes it is:
kubectl get nodes
For pods, you could do something like
kubectl get pods -n argo | grep runpfnano
That will allow us (or users) to understand the unconstrained CPU and memory needs of the jobs.
2. Command-line inputs to argo submit and terraform apply to avoid sed
As discussed,
sed
is a bit brutal. Better use arguments when possible2.1 Argo submit
Argo submit can take the global workflow parameters with
where stringArray would be e.g. nJobs="6".
Careful with quotes in the script, note that:
Edit: However, this does not matter:
-p nJobs=3
works as well2.2 Terraform apply
You can pass the variables to terraform with the
-var
flag, e.g.and in the script
To be confirmed that it works properly for strings (machine type) and numerical values (n nodes)
This will avoid modifying the tfvars file.
The text was updated successfully, but these errors were encountered: