Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to deconstruct VCF from the PGGB GFA file #429

Open
isaamael opened this issue Nov 19, 2024 · 7 comments
Open

Fail to deconstruct VCF from the PGGB GFA file #429

isaamael opened this issue Nov 19, 2024 · 7 comments

Comments

@isaamael
Copy link

isaamael commented Nov 19, 2024

Hi !
I am trying to generate VCF from the GFA produced by the PGGB pipeline.
I attempted to use deconstruct, and everything works fine on smaller GFA files, such as when I only include two species. However, when I include 11 species ,which I actually need, vg only outputs the VCF header. and does not produce any error messages.

Could it be that the task takes too long to run and gets killed by SLURM, or have I made some hidden mistake?

The command I ran:

i=1
vg deconstruct --verbose --path-prefix solyc --all-snarls --path-traversals --threads 10 --untangle-travs --contig-only-ref \
$pggbdir/sly${i}.fa.out/sly${i}.fa.bf3285f.417fcdf.7fecd6e.smooth.final.gfa > $pggbdir/pan${i}.vcf

It seems that the command runs for a long time without producing any output and is eventually stopped by SLURM?All Log from the SLURM systerm:

Warning [vg deconstruct]: -e is deprecated as it's now on default
slurmstepd: error: *** JOB 17088635 ON comput52 CANCELLED AT 2024-11-14T21:40:23 DUE TO TIME LIMIT ***

And any other vg commands, such as combine and convert, will produce the same error...

I am willing to provide any necessary information and would appreciate your help!

@isaamael isaamael changed the title Fail to deconstruct VCF from the PGGBGFA file Fail to deconstruct VCF from the PGGB GFA file Nov 19, 2024
@AndreaGuarracino
Copy link
Member

@isaamael, it looks like you're right, vg deconstruct doesn't finish on time and doesn't write anything! Can you try to request more time for the job?

@isaamael
Copy link
Author

@isaamael, it looks like you're right, vg deconstruct doesn't finish on time and doesn't write anything! Can you try to request more time for the job?

Hello sir,

Thanks for your suggestions.

In theory, there should be no time limit on the SLURM system I'm using, so I'm not sure why I'm encountering timeout issues. After manually defining the runtime, I generated some low-confidence VCF, which I attribute to the interference caused by the low-quality genome during the GFA construction.

In any case, after filtering the original input genome, I think it would be best to rerun pggb and generate the VCF directly from the pipeline. However, I tried the following command and encountered an error:

pggb -i $pggbdir/sly12.fa \
     -o $pggbdir/sly12.fa.out \
     -p 5000 -l 25000 -p 90 -n 2 -K 19 -F 0.001 -t 20 \
     -k 19 -f 0 -B 10000000 \
     -j 0 -e 0 -G 700,900,1100 -P 1,19,39,3,81,1 -O 0.001 -d 100 -Q Consensus_ \
     --vcf-spec 'solyc#0#12'

[vg::deconstruct] making VCF with reference=solyc#0#12 and delim=# xxxxxxxxxxxxx solyc#0#12 ------------ 0
Error [vg deconstruct]: No specified reference path or prefix found in graph
Command exited with non-zero status 1

My reference is indeed solyc#0#12, and the path in the output GFA is the same. I'm unclear about the correct way to specify --vcf-spec parameter and how to define LEN.

Some beginner questions and would appreciate your help : )

@AndreaGuarracino
Copy link
Member

AndreaGuarracino commented Nov 28, 2024 via email

@isaamael
Copy link
Author

i will try it !

@isaamael
Copy link
Author

Hi sir, @AndreaGuarracino
i try cmd

pggb -i $pggbdir/sly12.fa \
     -o $pggbdir/sly12.fa.out \
     -p 5000 -l 25000 -p 90 -n 10 -K 19 -F 0.001 -t 20 \
     -k 19 -f 0 -B 10000000 \
     -j 0 -e 0 -G 700,900,1100 -P 1,19,39,3,81,1 -O 0.001 -d 100 -Q Consensus_ \
     --vcf-spec solyc

and report an error

[wfmash::align::computeAlignments] aligned 78.50% @ 2.94e+05 bp/s elapsed: 00:02:17:56 remain: 00:00:37:46[E::fai_retrieve] Failed to retrieve block: unexpected end of file
Command terminated by signal 11

Did i make some mistakes ?

@AndreaGuarracino
Copy link
Member

AndreaGuarracino commented Nov 30, 2024 via email

@isaamael
Copy link
Author

yes! indeed
I denove generated the FASTA and performed pggb.
Now, GFA, VCF, and I are all as good as expected :)
Sincere thanks, thanks for your help!

Additionally, do the numbers in the GT field of the VCF point to the paths of AT?

Oh no! It seems your FASTA file is corrupted. Can you check that your input FASTA is healthy? Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

________________________________ From: Isaac @.> Sent: Saturday, November 30, 2024 3:10:09 AM To: pangenome/pggb @.> Cc: Andrea Guarracino @.>; Mention @.> Subject: Re: [pangenome/pggb] Fail to deconstruct VCF from the PGGB GFA file (Issue #429) Hi sir, @AndreaGuarracinohttps://github.com/AndreaGuarracino i try cmd pggb -i $pggbdir/sly12.fa \ -o $pggbdir/sly12.fa.out \ -p 5000 -l 25000 -p 90 -n 10 -K 19 -F 0.001 -t 20 \ -k 19 -f 0 -B 10000000 \ -j 0 -e 0 -G 700,900,1100 -P 1,19,39,3,81,1 -O 0.001 -d 100 -Q Consensus_ \ --vcf-spec solyc and report an error [wfmash::align::computeAlignments] aligned 78.50% @ 2.94e+05 bp/s elapsed: 00:02:17:56 remain: 00:00:37:46[E::fai_retrieve] Failed to retrieve block: unexpected end of file Command terminated by signal 11 Did i make some mistakes ? — Reply to this email directly, view it on GitHub<#429 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHUF463GI72J6BGUR6T2DF6HDAVCNFSM6AAAAABSBAERTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBYHA4TKNRVGE. You are receiving this because you were mentioned.Message ID: @.***>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants