-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using --vsearch_cluster
, if you have many thousands of clusters, AMPLISEQ:FILTER_CLUSTERS
will fail with an Argument list too long
error.
#696
Comments
Thanks for the report. Would you be able to open a PR (based of dev branch)? |
I would be happy to. I'm not really sure what an appropriate test would be? It only fails at large scale. How do we do a github actions test for something like that in a reasonable way? |
I think there isnt really an appropriate github actions test for that. Some things are not reasonably tested on small datasets unfortunately. |
Same error here |
filt_clusters.py.zip Attached are the two files with the necessary corrections: ampliseq/modules/local/filter_clusters.nf and ampliseq/bin/filt_clusters.py (unzip them of course) I know sharing files like this is not best practice. I will do a PR for this in the next few days unless someone beats me to it. |
Hi there, are you still planning to do a PR? If not, maybe someone else can tackle the problem in the next few days? |
I have opened a PR as linked above. Simply added your files for now. |
Ok thats in the dev branch and will be in the next release. Thanks! |
Description of the bug
When using
--vsearch_cluster
, if you have many thousands of clusters,AMPLISEQ:FILTER_CLUSTERS
will fail with anArgument list too long
error.The reason is line 27 in
ampliseq/modules/local/filter_clusters.nf
:filt_clusters.py -t ${asv} -p ${prefix} -c ${clusters}
We're passing the list of names of individual cluster files as one long, space delimited string to the
-c
argument. When there are many thousands (in my case, ~6,500) of cluster file names, this breaks the script because the argument string is just too long.My nextflow and bash scripting-foo is a bit rusty, but I did come up with a simple fix, which is to pipe in the cluster list.
Change line 27 in
ampliseq/modules/local/filter_clusters.nf
to:echo ${clusters} | filt_clusters.py -t ${asv} -p ${prefix} -c -
Then change line 33 in
ampliseq/bin/filt_clusters.py
fromtype=str,
to:This will read the cluster list from the pipe. Then, in that same file, set the
count
,prefix
, andcluster_fastas
variables directly:Use these variables throughout the script as need (lines 45, 50, 80, 110, & 111; 45 is already correct, but 44 should be changed to include
.read().rstrip()
as above; also deleted line 38).There may be a more elegant solution and setting the
count
andprefix
variables directly may be a totally unnecessary change.Command used and terminal output
Relevant files
No response
System information
nextflow version 23.04.2.5870
ampliseq version 2.8.0
singularity profile
The text was updated successfully, but these errors were encountered: