-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added custom/restoregffids #5002
Conversation
Have you checked if there are other ways to achieve this? Generally custom modules should be applicable to a wider audience to help reduce maintenance burden and increase findability. Can this be done, for example, with some existing tool and a combination of Groovy/Channel manipulation? |
That's a good point. I have searched this on google and all the results lead me to custom scripts. Yes, we can replace it with groovy code. |
Just to be clear, I mean replace the whole module in the subworkflow, so we don't have to add a custom script to modules. |
It is possible but the way I am doing it might be flimsy. Here is, for example, an other part of the sub workflow that I have implemented in groovy. It resumes correctly. If you are happy with it, I'll also port In specific, I am not happy with the way I am extracting | collectFile(newLine:true)
| map { seqs ->
def id = seqs.name.split('.mapped.monoploid.seqs.txt')[0]
[ [ id: id ], seqs ]
} ch_short_monoploid_seqs = ch_short_ids_tsv
| join(
ch_monoploid_seqs ?: Channel.empty()
)
| map { meta, short_ids_tsv, monoploid_seqs ->
map_monoploid_seqs_to_new_ids(meta, short_ids_tsv, monoploid_seqs)
}
| collectFile(newLine:true)
| map { seqs ->
def id = seqs.name.split('.mapped.monoploid.seqs.txt')[0]
[ [ id: id ], seqs ]
}
def map_monoploid_seqs_to_new_ids(meta, short_ids_tsv, monoploid_seqs) {
def short_ids_head = short_ids_tsv.text.split('\n')[0]
if (short_ids_head == "IDs have acceptable length and character. No change required.") {
return [ "${meta.id}.mapped.monoploid.seqs.txt" ] + monoploid_seqs.text.split('\n')
}
def orig_to_new_ids = [:]
short_ids_tsv.text.eachLine { line ->
def (original_id, renamed_id) = line.split('\t')
orig_to_new_ids[original_id] = renamed_id
}
def mapped_ids = []
monoploid_seqs.text.eachLine { original_id ->
if (!orig_to_new_ids[original_id]) {
error "Faild to find $original_id in ${monoploid_seqs}" +
"The monoploid_seqs file is malformed!"
}
mapped_ids.add(orig_to_new_ids[original_id])
}
return [ "${meta.id}.mapped.monoploid.seqs.txt" ] + mapped_ids
} |
Also Nextflow has some file operators that have the same args as the equivalent channel operators. This means you can call
for example. |
Thank you @mahesh-panchal How is nf-core/website#2242 different from what I have done above? In both cases, Ideally, Channel.of( [ [ id:'test', etc:'etc' ], [ 'a', 'b', 'c', 'd' ] ] )
| collectFile { meta, data -> [ "${meta.id}.txt", data.join("\n"), meta] } // Closure returns n items: file name, data, any other items which are passed through as is
| map { file, meta -> [ meta, file ] } |
Apologies, I didn't notice the function was doing that. I'm used to seeing the renaming in a closure to collectFile so one can see quickly the file is being named from the meta. It was too hidden for me in the function. Make an issue on Nextflow for the change in behaviour of collectFile as it's a common enough use case. With records being supported soon, I wonder though if there's already a solution. |
Thank you @mahesh-panchal I have raised an issue on the NextFlow repo. In the meantime, I'll utilise your suggestions to convert this issue into a groovy function. |
I am looking at this module after a break on some other work. After converting the python code to groovy, I realised that I can not publish the
publishDir anymore. Subworkflow users must be able to control the output files through the familiar publishDir directive. Any suggestions? Thank you!
|
I've asked a question on #subworkflows. Another solution is to add a params to the workflow to where the output gets published, since one can set |
Another way to resolve this issue is to use groovy functions for this module and the other custom module (#5001), emit the output files and let the pipeline users decide how and where to publish these files. |
Sorry, I need to be reminded what we were discussing. What were the issues again?
Can you remind me what the functions did then? There's not much difference I guess between a Nextflow function and a Nextflow flow exec process, aside from caching. |
Yes, we want to avoid custom modules which may not have wider application across sub-workflows. We were discussing if we can replace the custom modules with processes or functions local to the sub-workflow. You had also created a post on slack. The functions will run on the head node where the NextFlow parent job is executed. Processes may run on the head node or other nodes depending on resource requirements. |
Replaced with Groovy code, see #4984 |
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda