Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper running using sam input file? #59

Open
desmodus1984 opened this issue Jan 21, 2022 · 1 comment
Open

Proper running using sam input file? #59

desmodus1984 opened this issue Jan 21, 2022 · 1 comment

Comments

@desmodus1984
Copy link

Hi,

I read that elprep can work with .sam file input, and since it d oes coordinate sorting, I just mapped my reads to the elfasta converted reference, and used the first .sam file as input.
I am a little confused/concerned whether the final vcf would be correct due to the job log.
I used the following code:
elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --nr-of-threads 28 --tmp-path $TMPDIR
--mark-duplicates --mark-optical-duplicates AL91.metrics
--sorting-order coordinate
--bqsr AL91.recal
--reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta
--haplotypecaller AL91.vcf.gz

and the log- I thought for proper variant calling it had to first convert/sort the .sam and then split.
It has been ~16 hours and the only output is the AL91.recal and not a AL91.metrics out.

Here is the log.
elprep version 5.1.1 compiled with go1.16.7 - see http://github.com/exascience/elprep for more information.

2022/01/20 20:44:07 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2022-01-20-20-44-07-250202704-EST.log
2022/01/20 20:44:07 Command line: [elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --nr-of-threads 28 --tmp-path /tmp/slurmtmp.17532726 --mark-duplicates --mark-optical-duplicates AL91.metrics --sorting-order coordinate --bqsr AL91.recal --reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta --haplotypecaller AL91.vcf.gz]
2022/01/20 20:44:07 Executing command:
elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --mark-duplicates --mark-optical-duplicates AL91.metrics --optical-duplicates-pixel-distance 100 --bqsr AL91.recal --reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta --quantize-levels 0 --max-cycle 500 --haplotypecaller AL91.vcf.gz --sorting-order coordinate --nr-of-threads 28 --tmp-path /tmp/slurmtmp.17532726 --intermediate-files-output-prefix AL91 --intermediate-files-output-type sam
2022/01/20 20:44:07 Splitting...
2022/01/20 21:01:22 Filtering (phase 1)...
2022/01/20 21:29:00 Filtering (phase 2) and variant calling...

Hopefully, I am doing the proper procedure and not wasting time.

Best regards;

Juan

@caherzee
Copy link
Contributor

caherzee commented Feb 9, 2022

Hi,

I do not see something incorrect wrt to the elprep command. I would maybe try to add the following option: --intermediate-files-output-type bam. Currently, the intermediate files are sam files, and if your input file is very large, this may slow down processing.

You may also want to add the --timed option to get more output where time is going.

With regard to the missing metrics file:

  • is your date single-end data? We made a bug fix in v5.1.2 to output metrics for such data correctly.
  • is it possible to check the local directory where the job runs?

It is unclear to me if you obtained the above log for an elPrep job that finished running or if the job was still running at that time?

Thanks,

Charlotte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants