-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in checkGrep(grep(".A.txt", files)) when running maegatk #5
Comments
i got the same problem.. |
Hi, I am experiencing the same problem on test data. |
Also, the definition of -mr (minimum reads) was used incorrectly. The function used to call -C in freebayes is actually minimum alternative read, which is not the same thing as minimum read. |
Here we use -mr only in the context of the fgbio consensus demultiplexing (
|
Hi @uqjlu8, your error stems from an improperly specified bam file header: Error in the scatter log (line 83):
This biostars post discusses it a bit more: https://www.biostars.org/p/50338/ In short, this is an error upstream of the maegatk took and something that would require you to check with how you processed your single-cell sequencing data. May I ask what pipeline you used to go from fastq to bam files? |
Hi Caleb
The original *.bam was generated using cellranger count to do the mapping of the fastqs
Best regards
Jennifer
…________________________________
From: Caleb Lareau ***@***.***>
Sent: Saturday, 9 April 2022 3:59 PM
To: caleblareau/maegatk ***@***.***>
Cc: Miss Jennifer Lu ***@***.***>; Mention ***@***.***>
Subject: Re: [caleblareau/maegatk] Error in checkGrep(grep(".A.txt", files)) when running maegatk (Issue #5)
Hi @uqjlu8<https://github.com/uqjlu8>, your error stems from an improperly specified bam file header:
Error in the scatter log (line 83):
ERROR::READ_GROUP_NOT_FOUND:Record 1, Read name NS500239:389:H3F5JBGXC:1:12210:7650:20169_GTCACAATCGGATGGA-2+CTGTCACATT, RG ID on SAMRecord not found in header: Gen22_LG_D22:0:1:H3F5JBGXC:1
This biostars post discusses it a bit more: https://www.biostars.org/p/50338/
In short, this is an error upstream of the maegatk took and something that would require you to check with how you processed your single-cell sequencing data. May I ask what pipeline you used to go from fastq to bam files?
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7N3ZNRQ3FEDSFDR2YEYU3VEEMDBANCNFSM5QUQIDVA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
just to clarify-- did the test data work OK for you? |
Hi Caleb, unfortunately not. I was getting the same CheckGrep error. That's why I submitted the original ticket
Regards
Jennifer
…________________________________
From: Caleb Lareau ***@***.***>
Sent: Saturday, 23 April 2022 11:39 AM
To: caleblareau/maegatk ***@***.***>
Cc: Miss Jennifer Lu ***@***.***>; Mention ***@***.***>
Subject: Re: [caleblareau/maegatk] Error in checkGrep(grep(".A.txt", files)) when running maegatk (Issue #5)
just to clarify-- did the test data work OK for you?
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7N3ZLA7LULPEJNS4DIPETVGNIGRANCNFSM5QUQIDVA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hello @caleblareau, Test data or my own data cannot proceed at making ~/final/SAMPLE_NAME.A.txt.gz part. Hope you provide any response. |
Hmmm okay— I don’t see what an obvious solution is here other than trying to manipulate the input bam file header… What is samtools view -H of the input bam file?
On Apr 9, 2022, at 1:45 AM, uqjlu8 ***@***.******@***.***>> wrote:
Hi Caleb
The original *.bam was generated using cellranger count to do the mapping of the fastqs
Best regards
Jennifer
…________________________________
From: Caleb Lareau ***@***.***>
Sent: Saturday, 9 April 2022 3:59 PM
To: caleblareau/maegatk ***@***.***>
Cc: Miss Jennifer Lu ***@***.***>; Mention ***@***.***>
Subject: Re: [caleblareau/maegatk] Error in checkGrep(grep(".A.txt", files)) when running maegatk (Issue #5)
Hi @uqjlu8<https://github.com/uqjlu8>, your error stems from an improperly specified bam file header:
Error in the scatter log (line 83):
ERROR::READ_GROUP_NOT_FOUND:Record 1, Read name NS500239:389:H3F5JBGXC:1:12210:7650:20169_GTCACAATCGGATGGA-2+CTGTCACATT, RG ID on SAMRecord not found in header: Gen22_LG_D22:0:1:H3F5JBGXC:1
This biostars post discusses it a bit more: https://www.biostars.org/p/50338/
In short, this is an error upstream of the maegatk took and something that would require you to check with how you processed your single-cell sequencing data. May I ask what pipeline you used to go from fastq to bam files?
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7N3ZNRQ3FEDSFDR2YEYU3VEEMDBANCNFSM5QUQIDVA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD32FYJTMXZZ6Z6K7NS2MSLVEE7UHANCNFSM5QUQIDVA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
version 0.1.2 was for indel calling, so I don't think it'll impact anything for you. You can try though-- just clone the repo then install it from the local branch: ``
|
Hello I am running into a similar issue. It seems like this might be related to this: I ran maegatk with --snake-sdout flag and the error stems from the following: AttributeError in line 54 of ~/venv3/lib/python3.6/site-packages/maegatk/bin/snake/Snakefile.maegatk.Gather Any idea on how to work around this? Stef |
Hi @si3 the README file is now slightly expanded and explains what intermediate output files you should expect. If there is no As in Gather, then it means that line 35 here fails and the output of Scatter is incomplete, so I suggest looking into the Scatter log. Scatter should have generated a set of files in temp_bam,ready_bam,sparse_matrices folders. If you are keeping intermediate files, you can check which files are missing, it should also be reflected in Scatter's log. -> l temp_bam/ | head total 215944 -rw-r--r-- 1 safina vangalenlab 1387286 Jan 6 15:57 CCCGTCGTGGTA-1.temp0.bam -rw-r--r-- 1 safina vangalenlab 1365836 Jan 6 15:57 CCCGTCGTGGTA-1.temp1.bam -rw-r--r-- 1 safina vangalenlab 13993678 Jan 6 15:57 CCCGTCGTGGTA-1.temp1.5.sam -rw-r--r-- 1 safina vangalenlab 1320355 Jan 6 15:57 CCCGTCGTGGTA-1.temp1.5.bam -rw-r--r-- 1 safina vangalenlab 489313 Jan 6 15:57 CCCGTCGTGGTA-1.temp2.bam -rw-r--r-- 1 safina vangalenlab 1925330 Jan 6 15:57 CCCGTCGTGGTA-1.temp0.fastq -rw-r--r-- 1 safina vangalenlab 1123027 Jan 6 15:57 GACCGTGCATTT-1.temp0.bam -rw-r--r-- 1 safina vangalenlab 1122782 Jan 6 15:57 GACCGTGCATTT-1.temp1.bam -rw-r--r-- 1 safina vangalenlab 11246101 Jan 6 15:57 GACCGTGCATTT-1.temp1.5.sam (base) -> l ready_bam/ | head total 6728 -rw-r--r-- 1 safina vangalenlab 216594 Jan 6 15:57 CCCGTCGTGGTA-1.qc.bam -rw-r--r-- 1 safina vangalenlab 96 Jan 6 15:57 CCCGTCGTGGTA-1.qc.bam.bai -rw-r--r-- 1 safina vangalenlab 219598 Jan 6 15:57 GACCGTGCATTT-1.qc.bam -rw-r--r-- 1 safina vangalenlab 96 Jan 6 15:57 GACCGTGCATTT-1.qc.bam.bai -rw-r--r-- 1 safina vangalenlab 189703 Jan 6 15:58 CCACAAAACATG-1.qc.bam -rw-r--r-- 1 safina vangalenlab 96 Jan 6 15:58 CCACAAAACATG-1.qc.bam.bai -rw-r--r-- 1 safina vangalenlab 261193 Jan 6 15:58 GGCGCTAATGAA-1.qc.bam -rw-r--r-- 1 safina vangalenlab 96 Jan 6 15:58 GGCGCTAATGAA-1.qc.bam.bai -rw-r--r-- 1 safina vangalenlab 219108 Jan 6 15:59 TTCCTACGCAAT-1.qc.bam (base) -> l sparse_matrices/ | head total 27904 -rw-r--r-- 1 safina vangalenlab 159767 Jan 6 15:57 CCCGTCGTGGTA-1.A.txt -rw-r--r-- 1 safina vangalenlab 159806 Jan 6 15:57 CCCGTCGTGGTA-1.C.txt -rw-r--r-- 1 safina vangalenlab 77507 Jan 6 15:57 CCCGTCGTGGTA-1.G.txt -rw-r--r-- 1 safina vangalenlab 125697 Jan 6 15:57 CCCGTCGTGGTA-1.T.txt -rw-r--r-- 1 safina vangalenlab 308142 Jan 6 15:57 CCCGTCGTGGTA-1.coverage.txt -rw-r--r-- 1 safina vangalenlab 162181 Jan 6 15:57 GACCGTGCATTT-1.A.txt -rw-r--r-- 1 safina vangalenlab 160649 Jan 6 15:57 GACCGTGCATTT-1.C.txt -rw-r--r-- 1 safina vangalenlab 77111 Jan 6 15:57 GACCGTGCATTT-1.G.txt -rw-r--r-- 1 safina vangalenlab 129431 Jan 6 15:57 GACCGTGCATTT-1.T.txt Do you have all of these files? |
Same error here. Did you have a chance to find a fix? |
Hi @NBurnaevskiy, were you able to identify which part of the pipeline failed? Try going through the content of intermediate files as described in https://github.com/caleblareau/maegatk?tab=readme-ov-file#output-files |
@noranekonobokkusu , |
Yes, try looking into the snakemake-scatter text log file in the logs/ folder, it should have some error messages. Also note that a recently added option --skip-barcodesplit allows you to skip the barcode splitting step if it already finished successfully. It shouldn't matter if you are running maegatk on a small test dataset. |
I actually don't have that file. |
@NBurnaevskiy this is new. Are you running it on a test dataset? Are there any errors (or any messages) in the output of this run? (like sh.o or sh.e files, or just stdout/err?) I also noticed that yesterday you opened and then closed a yaml-related issue. Might it be that you introduced a fix that causes snakemake-scatter not to be executed? |
If you are running a test file (which is recommended for debugging), can you delete the entire output directory and re-run it from scratch? And then see what is in logs/base.maegatk.log and whether you have /.internal/parseltongue/snake.scatter.yaml which are instructions for scatter |
Yes, we had to introduce a fix into yaml command. We followed the instruction from error message and yaml documentation. "AttributeError: yaml = YAML(typ='unsafe', pure=True) instead of file "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/maegatk/cli.py", line 300 yaml.dump(dict1, yaml_file, default_flow_style=False, Dumper=yaml.RoundTripDumper)" In the cli.py file we changed line 300 from to That allow the script not to crash as that point. Do you think that could cause current issue with temp files? |
I suspect recent modifications @caleblareau introduced on yaml are not present in the current python package on PyPi. |
I just ran a script on the test file (with our current yaml). |
I am waiting for our admin to change yaml file. |
You can try installing maegatk locally which would allow you to modify the code yourself. Old yaml commands got outdated with the current version of yaml. I would try updating the maegatk code rather than downgrading yaml version. |
Just got response from our admin. They did exactly what you suggested, replaced old yaml file with updated one. Test run failed with identical symptoms. Yes, we do have have file snake.scatter.yaml in ./internal/parseltongue folder |
Error log from maegatk.snakemake_gather.log Config file /home/nburnaevskiy/maegatk_test/output/.internal/parseltongue/snake.gather.yaml is extended by additional config specified via the command line. all 1 Select jobs to execute... [Thu Mar 21 10:13:43 2024] [Thu Mar 21 10:13:43 2024] Config file /home/nburnaevskiy/maegatk_test/output/.internal/parseltongue/snake.gather.yaml is extended by additional config specified via the command line. |
Now the tool produced slightly more results but crashed anyway. Its content all 1 Select jobs to execute... [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] [Thu Mar 21 12:11:51 2024] Config file /home/nburnaevskiy/maegatk_test/output/.internal/parseltongue/snake.scatter.yaml is extended by additional config specified via the command line. Building DAG of jobs... Config file /home/nburnaevskiy/maegatk_test/output/.internal/parseltongue/snake.scatter.yaml is extended by additional config specified via the command line. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Using shell: /usr/bin/bash yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. yaml = YAML(typ='rt') and register any classes that you use, or check the tag attribute on the loaded data,
Waiting at most 5 seconds for missing files. |
Did they replace the yaml command or the python file in the installation folder (like, in ~/.local/lib/python3.9/site-packages/maegatk/)? snakemake_gather failed because it didn't have required input files. Are you sure you are not getting any other error messages at any stage of running it, and that you replaced cli.py entirely and didn't change anything else? and that you didn't have any existing (even empty) output directories like temp_bam and ready_bam before rerunning the command? I don't see a way to reproduce the absence of snakemake_scatter output. |
@NBurnaevskiy just saw your new comment, let me take a look. |
Ksenia, thank you for all your responses. |
I see, so yaml syntax in oneSample_maegatk.py got outdated, too. I will recreate in on my computer (I currently have an older yaml so everything seems to work) and try fixing it. |
we are iteratively working through issues. yaml update creates problems in few places. Then we updated header to change this Now we got an error: |
I am also trying the same - they dropped the load function https://yaml.readthedocs.io/en/latest/, but I am still figuring out what is the correct way to rewrite that |
maybe it should be |
It seems dropping the argument altogether allows the pipeline to proceed and start generating files in temp_bam/: |
Ok, I can confirm that
|
I can confirm that the script produced final output files. but hopefully they are not critical. I will now try real data and let you know if it completes. |
Hi
I am trying to run maegatk on my dataset. I have installed all the modules required as stated in the tutorial.
java, bwa, bedtools, freebayes, R (4.1.2, with data.table, Matrix, GenomicRanges, SummarizedExperiment). I am running it on python 3.7
I have tried to run the program on both the test dataset, and my own dataset using the commands below:
maegatk bcall --input $bam -o $resul_out -c $ncores -b $barcodes -mr $minReads -z
I keep getting the same error in both instances:
Mon Mar 14 15:46:24 AEST 2022: maegatk v0.1.1
Mon Mar 14 15:46:24 AEST 2022: Found bam file: Data/test_maester.bam for genotyping.
Mon Mar 14 15:46:24 AEST 2022: Will determine barcodes with at least: 100 mitochondrial reads.
Mon Mar 14 15:46:24 AEST 2022: User specified mitochondrial genome matches .bam file
Mon Mar 14 15:46:30 AEST 2022: Finished determining/splitting barcodes for genotyping.
Mon Mar 14 15:46:31 AEST 2022: Genotyping samples with 24 threads
Error in checkGrep(grep(".A.txt", files)) :
Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted
I have attached the a list of all the files generated using (ls -lRh $result_folder), scatter.log, gather.log
test_result_file_list.txt
maegatk.snakemake_scatter.log.txt
maegatk.snakemake_gather.log.txt
Any help would be greatly appreciated.
Thanks
The text was updated successfully, but these errors were encountered: