-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dxCompiler possibly incorrect localizes input file expressions #417
Comments
The inputs of the WDL pipelines on the DNAnexus platform is not meant be local path (it will be compiled to and run on the platform, using the objects from the project of the platform). In order to provide the acceptable inputs, the dx URI syntax is required. See examples in the following documentation: |
Hi @sclan, apologies for the confusion, that is what I'm doing. I always set script_dir to So I think my description of the problem above still stands. Can you take another look at this? (The reason I'm not hardcoding script_dir and instead taking it as an input is that I'd like this WDL to work both locally with Cromwell and my downloaded SNP-array genotype files and also on the cloud with dxCompiler. And the base directory for those two locations is not the same). |
Thanks for the additional information. All inputs are processed at the same time meaning that the dxCompiler would not have the value of "~{script_dir}" at the time when the File inputs "script" or "python_array_utils" were parsed. Have you ran your use case through cromwell? I used a simplified version below with cromwell v83 and execution failed.
input.json
But if I spelled out the File path ("./file.py"), then the result.txt file shows the file.py file has been localized correctly. This is not a unique behavior to the dxCompiler / dxExecutor. The alternative way to do this is to write a submission script and have the input string "script_dir" as the variable in the script so when the workflow is submitted, all values of the workflow inputs have been filled with literal values rather than variables that needed parsing. |
@sclan, thank you for helping with this. My pipeline routinely succeeds on Cromwell for me - that's where I started my development. It seems there's an issue with doing this while trying to run a task directly which is exposed by your example. But if you wrap the task in a workflow then it runs just fine in Cromwell. So I think there is precedent for handling this case according to the WDL spec. If I've got that right, is this something that dxCompiler/executor could support? test.wdl
input.json
|
dxCompiled workflow using the above example you provided also worked. The json input (input.json) used was
The compile command: The dx style input json (after the conversion during compilation) was
|
@sclan Thank you for taking the time to walk through this with me. It seems you're right, dxCompiler is doing this appropriately. I ran the following test to confirm: test2.wdl
compiled and ran it with the same commands (substituting result.txt
Since this example was working, I went back to recompile and rerun my original workflow to reproduce the error I was getting before and see what else might be causing it. But the Again, thanks for the help. |
Nevermind, I've just run into this issue with a different task in my pipeline, so I'm reopening the issue. I can reproduce the issue with this script. No clue why our test scripts above couldn't reproduce the issue. test.wdl
hello_world.txt
This succeeds in Cromwell. But the same input and dxCompile/dx run commands as in previous posts produces the error
I'm not sure your background, but if you're a DNANexus dev and have sudo permissions, you can see the analysis I ran with the ID analysis-GPK36J0Jv7BKPVfbFk81XY38. Any idea why this is happening? |
If "text_to_read" is not mentioned / used in the command <<<>>>, the file will not be downloaded to the worker (lazy load behavior). If the file (hello_world.txt) is needed, you need to specify it in the command block, not through a script. |
AFAIK, that contradicts the WDL specification, which makes no reference to the command section when determining what inputs should be localized. If that understanding is correct, can we change the dxCompiler/dxExecutor functionality here to make it work as intended? How difficult would it be to make such a change? Just thinking about it from a logical standpoint, why would a user specify a file as an input if they didn't want it localized, and thus would have no means of accessing it? When I think about lazy localization paradigms, I think about streaming and dxFuse. This to me feels instead like a design that contradicts the code author's intent. |
Thanks for the feedback. I will create an internal feature request for adding the compilation switch to turn the file lazy loading behavior on / off. Since this is something that WDL spec did not specify (it specifies the WDL syntax rather than the execution environment behavior), and the workaround exists (spell out the file target in the command section before the actual processing takes place), the request will start out low on our priority list. If more users request the same feature, then the priority will be increased over time before it gets on the developers' todo list. |
For internal tracking: PMUX-1520 |
I'm compiling the following WDL
Inside the
load_shared_covars.py
script I have the lineimport python_array_utils
. However, this fails with the errorModuleNotFoundError: No module named 'python_array_utils'
. I'm guessing this is because thepython_array_utils
input is being mislocalized.WDL guarantees here that files that originate in the same input directory should be localized into the same runtime directory. So I should be able to rely on being able to import a script that resides in the same input directory. But I'm guessing that dxCompiler isn't properly respecting that for inputs that are assigned to expressions, as here its docs suggest that it does not consider those inputs.
Even if dxCompiler will not considered inputs with default expression values as inputs (which is fine for my use case) can dxCompiler still guarantee that these files get localized to the location that WDL requires?
Happy to provide more details or clarification (or an example run). Also happy to hear if I've misdiagnosed what's going on.
Thanks!
The text was updated successfully, but these errors were encountered: