add_taxonomy_columns() function only outputs the first 10 lines #1

yamkela-mg · 2024-02-06T10:18:06Z

Hi there,

I am add NCBI taxonomy classifications to my DIAMOND output file. I ran PhyloR as follows:

library (phyloR)
library (readr)
library (taxize)
setwd("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor")
data <- read_tsv("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor/diamond_data.txt", show_col_types = FALSE)

add_taxonomy_columns(data, ncbi_accession_colname = "ncbi_accession", ncbi_acc_key = "98845081e276ecedd2e2b92d339fb7354108", taxonomy_level = "family", map_superkindom = FALSE, batch_size = 20)

The output file looks like this :
?^?^? Done. Time taken 6.39
?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$
?^?? Rank search begins...
?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$
?^?^? Done. Time taken 0.95

A tibble: 6,079 ?^? 4

Gene ncbi_accession taxid family

1 g2420.t1 XP_019440838.1 3871 Fabaceae
2 g20534.t1 XP_057737287.1 217475 Fabaceae
3 g37802.t1 XP_031279371.1 55513 Anacardiaceae
4 g13363.t1 QHN77035.1 3818 Fabaceae
5 g30858.t1 KAE9615640.1 3870 Fabaceae
6 g24702.t1 OIW14831.1 3871 Fabaceae
7 g17954.t1 KAE9590247.1 3870 Fabaceae
8 g20072.t1 XP_019420191.1 3871 Fabaceae
9 g12935.t1 WAX01758.1 649199 Fabaceae
10 g914.t1 XP_019444688.1 3871 Fabaceae

?^Ĺ 6,069 more rows

So it only annotated the first 10 accessions. How do I get it to process more than 10? or to print out more than 10 lines in the output file?

cparsania · 2024-02-07T01:31:18Z

Hi,
Cannot read some of your text. Can you please update the output in readable format ? If possible upload the query ids as well.

Chirag.

yamkela-mg · 2024-02-07T18:48:27Z

add_tax_final_outfile.txt

I managed to get it to print more than 10 lines in the output file by including the sink() function on my r script.

Another question, what do the NAs on my output file mean? I got a lot of them and when I manually checked some of those accessions they do exist on NCBI protein database

cparsania · 2024-02-08T00:08:22Z

Internally It does taxonomy search using R packages taxizedb and taxize. Make sure that these packages have latest taxonomy databases downloaded in form of SQL files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_taxonomy_columns() function only outputs the first 10 lines #1

add_taxonomy_columns() function only outputs the first 10 lines #1

yamkela-mg commented Feb 6, 2024

cparsania commented Feb 7, 2024

yamkela-mg commented Feb 7, 2024

cparsania commented Feb 8, 2024 •

edited

Loading

add_taxonomy_columns() function only outputs the first 10 lines #1

add_taxonomy_columns() function only outputs the first 10 lines #1

Comments

yamkela-mg commented Feb 6, 2024

A tibble: 6,079 ?^? 4

?^Ĺ 6,069 more rows

cparsania commented Feb 7, 2024

yamkela-mg commented Feb 7, 2024

cparsania commented Feb 8, 2024 • edited Loading

cparsania commented Feb 8, 2024 •

edited

Loading