Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFDB contains few genes that are not part of any cluster #331

Open
PovilasMat opened this issue Feb 10, 2023 · 2 comments
Open

VFDB contains few genes that are not part of any cluster #331

PovilasMat opened this issue Feb 10, 2023 · 2 comments

Comments

@PovilasMat
Copy link

PovilasMat commented Feb 10, 2023

Hi,

ariba was running into weird issue while running on vf database:
[E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11
OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed

I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step:
_init_and_run_clusters    reference_names=self.cluster_ids[cluster_name],
KeyError: ''

Obviously, because cluster name was missing. :)

Then I started digging around and made this small test:

mkdir vftest
cd vftest
ariba getref virulencefinder out.virulencefinder
ariba prepareref -f out.virulencefinder.fa -m out.virulencefinder.tsv ./test
cd test
cat 02.cdhit.clusters.tsv | awk '{$1="";print}' | tr " " "\n" | sort | uniq > cluster_file
grep ">" 02.cdhit.all.fa | sed 's/>//g' | sort > all_file
wc -l all_file
wc -l cluster_file
diff cluster_file all_file

Output of the last three lines:

5558 all_file
5554 cluster_file //cluster file contains one empty line in the beginning
1d0 //this is the empty line
< //this is the empty line
718a718
> csnA_4_KJ922517
973a974
> eltIIAB_c8_1_AASRQF010000005
4943a4945
> stx2_122_CP022279_122
5082a5085
> stx2b_O128_24196_97_95_AJ567995_95
5157a5161
> stx2h_O102_STEC299_122_CP022279_122

So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.

ariba version
ARIBA version: 2.14.6
External dependencies:
bowtie2 2.2.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/bowtie2
cdhit 4.8.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/cd-hit-est
nucmer 3.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/nucmer
spades 3.15.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/spades.py
External dependencies OK: True
Python version:
3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:45:29)
[GCC 10.4.0]
Python packages:
ariba 2.14.6 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/ariba/init.py
bs4 4.11.1 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/bs4/init.py
dendropy 4.5.2 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/dendropy/init.py
pyfastaq 3.17.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pyfastaq/init.py
pymummer 0.11.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pymummer/init.py
pysam 0.18.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pysam/init.py
Python packages OK: True
Everything looks OK: True

@etuduri
Copy link

etuduri commented Jul 13, 2023

Hi, I have the same issue, please help!!

ARIBA version: 2.14.6

External dependencies:
bowtie2 2.3.4.1 /usr/bin/bowtie2
cdhit 4.7 /usr/bin/cd-hit-est
nucmer 3.1 /usr/bin/nucmer
spades 3.13.0 /home/inei/SPAdes-3.13.0-Linux/bin/spades.py

External dependencies OK: True

Python version:
3.6.9 (default, Mar 10 2023, 16:46:00)
[GCC 8.4.0]

Python packages:
ariba 2.14.6 /usr/local/lib/python3.6/dist-packages/ariba/init.py
bs4 4.9.2 /home/inei/.local/lib/python3.6/site-packages/bs4/init.py
dendropy 4.4.0 /home/inei/.local/lib/python3.6/site-packages/dendropy/init.py
pyfastaq 3.17.0 /home/inei/.local/lib/python3.6/site-packages/pyfastaq/init.py
pymummer 0.10.3 /home/inei/.local/lib/python3.6/site-packages/pymummer/init.py
pysam 0.16.0.1 /home/inei/.local/lib/python3.6/site-packages/pysam/init.py

Python packages OK: True

Everything looks OK: True

Thanks in advance !!!

@PovilasMat
Copy link
Author

It doesnt seem like ariba will receive any future changes. I requested DB maintainers to fix it on their end. But it is still ongoing process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants