Skip to content
Morgan Taschuk edited this page Mar 5, 2018 · 3 revisions

Welcome to the one-workflow-many-ways wiki!

As the README says, the point of this project is to get an idea of how easy or hard it is for a beginner to implement a basic workflow in different workflow systems. With the exception of bash, I am a beginner at all of these.

bash

I whipped up this script in approximately 10 minutes and then spent another 30 minutes making it nice. I consider this the 'baseline' by which other scripts are measured.

WDL

This was the first new workflow language I tried.

Thoughts:

  • The documentation is very good and all in one place. I had a practical working example very quickly
  • The WDL file almost like bash.
    • some escaping problems. basically nothing appreciates samtools flagstat $BAM 2>&1 | perl -pe 's|(\d+ \+ \d+)\s+(.*)\R|"$2": "$1",|g' | sed 's/.$//'
    • was finicky about colliding names (can't have a global bamqc variable and a bamqc task)
    • had it working pretty quickly
  • Cromwell is a little bit verbose (but this is tunable)
  • WOMtools lets you autogenerate the inputs.json file.

CWL

This was a totally different experience.

Thoughts:

  • No pipes??? NO PIPES???
    • No you can have pipes but you need to have requirements: class: ShellCommandRequirement and then {valueFrom: " | ", shellQuote: false} (see bamqc.cwl)
    • ..... ok
  • Every step in a different cwl file, joined together with a workflow cwl
  • documentation is disorganized and the 'user guide' doesn't actually show you anything useful for a really long time http://www.commonwl.org/user_guide/
    • Creating workflows (with multiple steps... remember you can't pipe so each step is very small) is buried down in lesson 20
  • Specifying inputs and outputs is a bit bonkers
    • this is how you name output files:
stdout: bamqc_result.json
outputs:
  outjson:
    type: stdout
  • yes that means the name is 'out of scope' of the actual outfile. For some reason.
  • once I had the separate steps (I re-wrote flagstat2json.sh so that it could take a file as well as stdin) then creating the workflow was very simple 👍 for reuse

Toil - WDL

Interestingly, Toil one broke on the WDL file that worked on Cromwell (commit 7bac144)

task bamqc {
    String samtools
    File bamqc_pl
    File bamfile
    File bedfile
    String outjson
    String xtra_json

    command {
        eval '${samtools} view ${bamfile} | perl ${bamqc_pl} -r ${bedfile} -j "${xtra_json}" > ${outjson}'
    }
    output {
        File out = "${outjson}"
}

It barfed with a overabundance of quotes:

  File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 122
    eval ''''
            ^
SyntaxError: EOL while scanning string literal
Traceback (most recent call last):
  File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
    sys.exit(main())
  File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1

In the python script that Toil makes toilwdl_compiled.py, the task turned into the following block (indentation preserved):

    command9 = '''
        eval ''''
    command10 = samtools
    command11 = ''' view '''
    command12 = bamfile_fs
    command13 = ''' | perl '''
    command14 = bamqc_pl_fs
    command15 = ''' -r '''
    command16 = bedfile_fs
    command17 = ''' -j "'''
    command18 = xtra_json
    command19 = '''" > '''
    command20 = outjson
    command21 = ''''
    '''

So it looks like there's a bug, which I will eventually figure out where to file. In the meantime I'm going to remove the eval statement and the 'single quotes' since that seems to be the issue.

Edit: Looks like Toil also doesn't like constants in WDL files. I set the output filename to "flagstat.json" in the flagstat task and it complains about it too.

Traceback (most recent call last):
  File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 203, in <module>
    job1 = Job.wrapJobFn(flagstat, samtools=SAMTOOLS, flagstat_to_json=flagstat_to_json, bamfile=BAMFILE, outfile=outfile)
NameError: name 'outfile' is not defined
Traceback (most recent call last):
  File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
    sys.exit(main())
  File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1

I also note that I needed to clean up my local working directory before I could try again: toil.jobStores.abstractJobStore.JobStoreExistsException: The job store '/media/mtaschuk/Data/git/one-workflow-many-ways/toilWorkflowRun' already exists. Use --restart to resume the workflow, or remove the job store with 'toil clean' to start the workflow from scratch. Which, cool.

Giving up on Toil WDL for the moment because it requires too many changes to the WDL I made for Cromwell.

Clone this wiki locally