Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instances with requested resources not detected #79

Open
olgabot opened this issue Oct 1, 2018 · 2 comments
Open

Instances with requested resources not detected #79

olgabot opened this issue Oct 1, 2018 · 2 comments
Assignees

Comments

@olgabot
Copy link
Contributor

olgabot commented Oct 1, 2018

Hello!
I'm running a batch job which requires a lot of memory so I added x1e.2xlarge (244 GiB memory, 8 CPUs) to my config.yml:

screen shot 2018-10-01 at 10 28 40 am

 Mon  1 Oct - 10:22  ~/code/tick-genome/pre_assembly_qc/full_workflow   origin ☊ master ✔ 2☀ 
  grep x1e.2xlarge ~/.reflow/config.yaml
  - x1e.2xlarge

But when I run the batch job, it claims that there aren't any available instances of that type! And there's no "allowable instances exceeded" error, either, so I don't think it's because someone else in the org is using x1e.2xlarge.

 Mon  1 Oct - 10:22  ~/code/tick-genome/pre_assembly_qc/full_workflow   origin ☊ master ✔ 2☀ 
  reflow runbatch -retry     
reflow: batch program ../../reflow/pre-assembly.rf runsfile samples.csv
retrying run Undetermined_S0
retrying run tick_1_S1
retrying run tick_2_S2
reflow: run tick_2_S2: error: resources exhausted: requested resources {mem:240.0GiB cpu:8 disk:1.0GiB} not satisfiable by any available instance type
reflow: run Undetermined_S0: error: resources exhausted: requested resources {mem:240.0GiB cpu:8 disk:1.0GiB} not satisfiable by any available instance type
reflow: run tick_1_S1: error: resources exhausted: requested resources {mem:240.0GiB cpu:8 disk:1.0GiB} not satisfiable by any available instance type

EDIT: If I add x1e.4xlarge (488 GiB ram, 16 CPUs) to my list of instances, the job goes through, but I'm wondering why does it need more resources when the x1e.2xlarge instance should be sufficient?

Do you know what may be happening?
Warmest,
Olga

@olgabot
Copy link
Contributor Author

olgabot commented Oct 5, 2018

The other question I have is, isn't Reflow supposed to launch separate instances for the separate steps of the pipeline? Why is it trying to launch the most expensive instance now to run everything in one?

@mariusae
Copy link
Collaborator

Reflow generally tries to optimize for cost, so if it's cheaper to run one large instance that can fit everything, it will do that.
However, @swami-m is looking at ways to improve how instances are allocated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants