Steps to implement checkm2 quality report to dRep #220

mpdoane2 · 2023-12-15T05:34:57Z

Checkm2 quality report can be used. You will need to convert the Checkm2 quality report to a .csv file using:

awk -F'\t' 'BEGIN {OFS=","} {print $1, $2, $3}' quality_report.tsv > new_file_name.csv

In the new file convert headings to: genome,completeness,contamination

dRep command for using checkm2 output instead of checkm_genome which is default currently,

dRep dereplicate output --genomeInfo new_file_name.csv -g bins/*.fna

Just thought I would write it out in case others were facing similar issues.

-Mike

The text was updated successfully, but these errors were encountered:

MrOlm · 2023-12-15T17:38:47Z

Thanks for this, @mpdoane2 When I get a chance I'll add this to the documentation and cite you / this issue

achenderson · 2024-05-08T08:25:08Z

Hi, thanks for the information for converting the checkm output to a csv!

I tried running drep with the --genomeInfo flag and the csv, but when I check the log for the job, it is still running checkm. Is there another flag I need to add? Thanks!

MrOlm · 2024-05-08T16:06:41Z

Hi @achenderson - my guess is that there is a mismatch between the "genome" names provided in the genomeInfo file and the "genome" names loaded by dRep. If you check the file Bdb.csv in the dRep run (even if the run isn't complete) you'll see the names that dRep wants you to use.

Best,
Matt

achenderson · 2024-05-10T10:02:32Z

That's it! Thank you :)

CJREID · 2024-07-25T01:57:44Z

Hi there,

I've think I've run into this issue but as I have ~60k MAGs, CheckM is taking a long time and the Bdb.csv file is not present.

As an example, my genome files look like /scratch/usr/SBsP_T2_sr_metabat2_refined.002.fna and the corresponding cell in the .csv is /scratch/usr/SBsP_T2_sr_metabat2_refined.002 but it is still running checkM. Is it possibly due to the '.' before 002? Are the full paths unnecessary? Or is the .fna extension an issue?

Thanks,
Cam

MrOlm · 2024-07-25T03:35:53Z

Hi @CJREID - are you running checkM within dRep, or are you running checkM2 outside of dRep?

CJREID · 2024-07-25T03:56:45Z

Hi Matt,

I ran checkM2 outside of dRep and formatted it as described above for dRep. It worked on once I added the .fna extension to the names in the genomeInfo file. I was confused because the help message says this file must contain "genome"(basename of .fasta file of that genome) so I assumed this was the name without the .fna extension. Perhaps the 'basename' bit could be changed in the help message?

Thanks,
Cam

MrOlm · 2024-07-25T16:48:28Z

Hi @CJREID - thanks for the update and for the suggestion. I'll update in the next verison of dRep.

Best,
Matt

MrOlm added the documentation label Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steps to implement checkm2 quality report to dRep #220

Steps to implement checkm2 quality report to dRep #220

mpdoane2 commented Dec 15, 2023

MrOlm commented Dec 15, 2023

achenderson commented May 8, 2024

MrOlm commented May 8, 2024

achenderson commented May 10, 2024

CJREID commented Jul 25, 2024

MrOlm commented Jul 25, 2024

CJREID commented Jul 25, 2024

MrOlm commented Jul 25, 2024

Steps to implement checkm2 quality report to dRep #220

Steps to implement checkm2 quality report to dRep #220

Comments

mpdoane2 commented Dec 15, 2023

MrOlm commented Dec 15, 2023

achenderson commented May 8, 2024

MrOlm commented May 8, 2024

achenderson commented May 10, 2024

CJREID commented Jul 25, 2024

MrOlm commented Jul 25, 2024

CJREID commented Jul 25, 2024

MrOlm commented Jul 25, 2024