Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_taxonomy_columns() function only outputs the first 10 lines #1

Open
yamkela-mg opened this issue Feb 6, 2024 · 3 comments
Open

Comments

@yamkela-mg
Copy link

Hi there,

I am add NCBI taxonomy classifications to my DIAMOND output file. I ran PhyloR as follows:

library (phyloR)
library (readr)
library (taxize)
setwd("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor")
data <- read_tsv("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor/diamond_data.txt", show_col_types = FALSE)

add_taxonomy_columns(data, ncbi_accession_colname = "ncbi_accession", ncbi_acc_key = "98845081e276ecedd2e2b92d339fb7354108", taxonomy_level = "family", map_superkindom = FALSE, batch_size = 20)

The output file looks like this :
?^?^? Done. Time taken 6.39
?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$
?^?? Rank search begins...
?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$
?^?^? Done. Time taken 0.95

A tibble: 6,079 ?^? 4

Gene ncbi_accession taxid family

1 g2420.t1 XP_019440838.1 3871 Fabaceae
2 g20534.t1 XP_057737287.1 217475 Fabaceae
3 g37802.t1 XP_031279371.1 55513 Anacardiaceae
4 g13363.t1 QHN77035.1 3818 Fabaceae
5 g30858.t1 KAE9615640.1 3870 Fabaceae
6 g24702.t1 OIW14831.1 3871 Fabaceae
7 g17954.t1 KAE9590247.1 3870 Fabaceae
8 g20072.t1 XP_019420191.1 3871 Fabaceae
9 g12935.t1 WAX01758.1 649199 Fabaceae
10 g914.t1 XP_019444688.1 3871 Fabaceae

?^Ĺ 6,069 more rows

So it only annotated the first 10 accessions. How do I get it to process more than 10? or to print out more than 10 lines in the output file?

@cparsania
Copy link
Owner

Hi,
Cannot read some of your text. Can you please update the output in readable format ? If possible upload the query ids as well.

Chirag.

@yamkela-mg
Copy link
Author

add_tax_final_outfile.txt

I managed to get it to print more than 10 lines in the output file by including the sink() function on my r script.

Another question, what do the NAs on my output file mean? I got a lot of them and when I manually checked some of those accessions they do exist on NCBI protein database

@cparsania
Copy link
Owner

cparsania commented Feb 8, 2024

Internally It does taxonomy search using R packages taxizedb and taxize. Make sure that these packages have latest taxonomy databases downloaded in form of SQL files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants