Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barrnap filtration #714

Closed
vaulot opened this issue Mar 18, 2024 · 4 comments
Closed

Barrnap filtration #714

vaulot opened this issue Mar 18, 2024 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@vaulot
Copy link
Contributor

vaulot commented Mar 18, 2024

Description of the bug

I am analysing PacBio eukaryotic metabarcodes that cover the 18S, ITS and 28S.

When filtering based on barrnap I find very few ASVs passing the filter because of the low eval assigned to euks. However this is due to the fact that barrnap detects the three genes and that the last line e-value seems to be retained

Here is the output of barrnap for euks

##gff-version 3
0009676eb56185361df109c5aab42d3a	barrnap:0.9	rRNA	1	1218	1.2e-303	+	.	Name=18S_rRNA;product=18S ribosomal RNA (partial);note=aligned only 65 percent of the 18S ribosomal RNA
0009676eb56185361df109c5aab42d3a	barrnap:0.9	rRNA	1413	4701	0	+	.	Name=28S_rRNA;product=28S ribosomal RNA
0009676eb56185361df109c5aab42d3a	barrnap:0.9	rRNA	1419	1570	1.5e-27	+	.	Name=5_8S_rRNA;product=5.8S ribosomal RNA

Now the summary looks like

ASV_ID	arc_eval	bac_eval	euk_eval	mito_eval	eval_method
0009676eb56185361df109c5aab42d3a	4.8e-84	1.7e-124	1.5e-27	1e-17	barrnap:0.9

So that this ASV is filtered out as bacteria since the e-val in this table is higher. However it is the second line of the barrnap output that should be used in this case (the minimum e-value) and the ASVs will be correctly assigned as euk.

Cheers.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@vaulot vaulot added the bug Something isn't working label Mar 18, 2024
@d4straub
Copy link
Collaborator

Thanks for the report.
As far as I remember, this was implemented having one target gene in mind, not several. With multiple genes the barrnap filtering seems to be not working as intended. This should be either made clear in the documentation or (optimally) fixed for multiple gene regions. Would you be up to that?

@d4straub d4straub added this to the 2.9.0 milestone Mar 19, 2024
@vaulot
Copy link
Contributor Author

vaulot commented Mar 19, 2024

Hi Daniel

I think a simple fix would be to take the lowest e-value. It would be useful for PacBio sequences since more and more cover the whole operon. If you point me to where the change need to be made, I can try to fix it.

@d4straub
Copy link
Collaborator

Was fixed in #722!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants