|
| 1 | +# Previous assembly releases of T2T-CHM13 |
| 2 | + |
| 3 | +## v1.1 |
| 4 | +[Complete T2T reconstruction of a human genome](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v1.1.fasta.gz). |
| 5 | + Changes from v1.0 include filled rDNA gaps and improved polishing within telomeres. |
| 6 | + One rare heterozygous variant causing a premature stop codon was changed at chr9:134589924 to the more common allele. |
| 7 | + Also available at [NCBI GCA_009914755.3](https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.3). |
| 8 | + Changes made from v1.0 to v1.1 are available as a [VCF](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v1.0_to_v1.1/v1.0_patch.vcf.gz). |
| 9 | + |
| 10 | +## v1.0 |
| 11 | +[Complete T2T reconstruction of a human genome](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v1.0.fasta.gz), |
| 12 | + with the exception of 5 known gaps within the rDNA arrays. |
| 13 | + Polished assembly based on v0.9. Introduces 4 structural corrections and 993 small variant corrections, including a 4 kb telomere extension on chr18. |
| 14 | + Polishing was performed using a conservative custom pipeline based on DeepVariant calls and structural corrections were manually curated. |
| 15 | + Consensus quality exceeds Q60. Prior to a preprint being drafted, a brief summary can be found at this [blog post](https://genomeinformatics.github.io/CHM13v1/). |
| 16 | + Also available at [NCBI GCA_009914755.2](https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.2). |
| 17 | + Changes made from v0.9 to v1.0 are available as a [VCF](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v0.9_to_v1.0/v0.9_patch.vcf.gz). |
| 18 | + |
| 19 | +## v0.9 |
| 20 | +[T2T reconstruction of all 23 chromosomes of CHM13](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.9.fasta.gz) based on a custom assembly pipeline, briefly featuring: |
| 21 | + |
| 22 | +1. Homopolymer-compression and self-correction of Pacbio HiFi reads |
| 23 | +2. Rescoring of overlaps to account for recurrent Pacbio HiFi errors |
| 24 | +3. Construction and custom pruning of a string graph built over 100% identical overlaps |
| 25 | +4. Manual reconstruction on chromosomal paths through the graph, if necessary aided by ultra-long Nanopore reads |
| 26 | +5. Layout/consensus of original HiFi reads, corresponding to the resulting paths |
| 27 | +6. Patching of regions absent from HiFi data with v0.7 draft sequences |
| 28 | + |
| 29 | +Consensus quality exceeds Q60. Mitochondrial sequence DNA included. Centers of the 5 rDNA arrays are represented by N-gaps. |
| 30 | + |
| 31 | +## v0.7 |
| 32 | +[Assembly draft v0.7](https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.7.fasta.gz) was generated with [Canu v1.7.1](https://github.com/marbl/canu) |
| 33 | + including rel1 data up to 2018/11/15 and incorporating the previously released PacBio data. |
| 34 | + Two gaps on the X plus the centromere were manually resolved. Contigs with low coverage support were split and the assembly was scaffolded with BioNano. |
| 35 | + The assembly was polished with two rounds of [nanopolish](https://github.com/jts/nanopolish) and two rounds of [arrow](https://github.com/PacificBiosciences/GenomicConsensus). |
| 36 | + The X polishing was done using unique markers matched between the assembly and the raw read data, the rest of the genome used traditional polishing. |
| 37 | + Finally, the assembly was polished with 10X Genomics data. |
| 38 | + We [validated](https://github.com/skoren/bacValidation) the assembly using [independent BACs](https://www.ncbi.nlm.nih.gov/nuccore/?term=VMRC59). |
| 39 | + The overall QV is estimated to be Q37 (Q42 in unique regions) and the assembly resolves over 80% of available CHM13 BACs (280/341). |
| 40 | + The assembly is 2.94 Gbp in size with 359 scaffolds (448 contigs) and an NG50 of 83 Mbp (70 Mbp). |
| 41 | + Outside of Chr8 and ChrX, this should be considered a draft and likely has mis-assemblies. |
| 42 | + Older unpolished assemblies are available for benchmarking purposes, but are of lower quality and should not be used for analyses. |
| 43 | + Also available at [NCBI GCA_009914755.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.1). |
| 44 | + |
| 45 | +## Downloads |
| 46 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v1.1.fasta.gz">Assembly draft v1.1</a> (md5: 1cab2b2776005cdf339ec9f283ba2c70) |
| 47 | + - Annotation from <a href="https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit">CAT</a> and <a href="https://github.com/agshumate/Liftoff">Liftoff</a> |
| 48 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v1.1.gene_annotation.v4.gff3.gz">annotation gff3 file</a> (md5: 14865ece7fe6367b8e2b06776a3d522f) |
| 49 | + - Telomere identified by the <a href="https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere">VGP</a> pipeline |
| 50 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v1.1.telomere.bed.gz">telomere bed file</a> (md5: d6b148d16bf303e25552e381cddff9df) |
| 51 | + - Liftover from v1.0 to v1.1 |
| 52 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v1.0_to_v1.1/v1.0_to_v1.1_rdna_merged.chain">chain file</a> (md5: 804d2a81dbf79199fa637f6bbed9a1a8) |
| 53 | + - Liftover from v1.1 to v1.0 |
| 54 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v1.0_to_v1.1/v1.1_to_v1.0_rdna_merged.chain">chain file</a> (md5: 03180ca0210957e85affc72bb7083b2b) |
| 55 | + - Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v1.1.hifi_20k.wm_2.0.1.pri.bam has a chm13.draft_v1.1.hifi_20k.wm_2.0.1.pri.bam.bai) |
| 56 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.1.hifi_20k.wm_2.01.pri.bam">PacBio HiFi alignments (generated via Winnowmap v2.01 -x map-pb)</a> (md5: ab6b38cb00efa919f6d93bc89787a121) |
| 57 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.1.ont_guppy_3.6.0.wm_2.01.pri.bam">Oxford nanopore Guppy alignments (generated via Winnowmap v2.01 -x map-ont)</a> (md5: 5cb543ac85513995893015a3709806f4) |
| 58 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.1.pcrfree.bam">PCRFree Illumina alignments (generated via bwa mem v0.7.15)</a> (md5: bb41008d0f5de787d26896fb49027420) |
| 59 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v1.0.fasta.gz">Assembly draft v1.0</a> (md5: 6d827b6512562630137008830c46e1ac) |
| 60 | + - Annotation from <a href="https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit">CAT</a> and <a href="https://github.com/agshumate/Liftoff">Liftoff</a> |
| 61 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v1.0.gene_annotation.v4.gff3.gz">annotation gff3 file</a> (md5: a39f18f553d5a426eaef9cfd4f858bf6) |
| 62 | + - Telomere identified by the <a href="https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere">VGP</a> pipeline |
| 63 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v1.0.telomere.bed.gz">telomere bed file</a> (md5: 5cdca0c8b563b87f7a624d61ae0b5497) |
| 64 | + - Liftover from hg38 to v1.0 (all files from <a href="https://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/hg38Lastz/">UCSC Genome Browser</a>) |
| 65 | + - <a href="http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/hg38Lastz/hg38.t2t-chm13-v1.0.over.chain.gz">chain file</a> (md5: ade08feeb01b75644cb1da383ebaa607) |
| 66 | + - Liftover from v1.0 to hg38 |
| 67 | + - <a href="http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/hg38Lastz/t2t-chm13-v1.0.hg38.over.chain.gz">chain file</a> (md5: 9edff5e020cc3f170350ff78fbe01d5c) |
| 68 | + - Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v1.0.wm_2.01.hifi.pri.bam has a chm13.draft_v1.0.wm_2.01.hifi.pri.bam.bai) |
| 69 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.0.clr_p6c4.wm_2.01.pri.bam">PacBio CLR alignments (generated via Winnowmap v2.01 -x map-pb-clr)</a> (md5: 235e23c72676279714a091fb226f3b1a) |
| 70 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.0.hifi_20k.wm_2.01.pri.bam">PacBio HiFi alignments (generated via Winnowmap v2.01 -x map-pb)</a> (md5: 2380bee4c3544d179b51cf22846e33ab) |
| 71 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.0.ont_guppy_3.6.0.wm_2.01.pri.bam">Oxford nanopore Guppy alignments (generated via Winnowmap v2.01 -x map-ont)</a> (md5: 5a012ae791f48678b829da6770216f5d) |
| 72 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.0.ont_bonito_0.3.1.wm_2.01.pri.bam">Oxford nanopore Bonito alignments (generated via Winnowmap v2.01 -x map-ont)</a> (md5: 84b0b9d5935140ead1d032b0a1610c39) |
| 73 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v1.0.pcrfree.bam">PCRFree Illumina alignments (generated via bwa mem v0.7.15)</a> (md5: 9143c6d6dc3e8f537c49f43f9e6cbedd) |
| 74 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.9.fasta.gz">Assembly draft v0.9</a> (md5: 05fd40ffc5d68a9b6754773a56381db8) |
| 75 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v0.9.region.bed.gz">Regions patched by non-HiFi data & rDNA loci</a> (md5: a754f98d5e960b3d1e9029cba4414cf2) |
| 76 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/graph/chm13.draft_v0.9.simplified.compress.gfa.gz">v0.9 assembly graph in GFA format (built over homopolymer-compressed HiFi reads)</a> (md5: df2218db9ebbcd239d07d2544372cfa5) |
| 77 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/graph/chm13.draft_v0.9.simplified.nodes.fasta.gz">Consensus sequences for individual nodes of the v0.9 assembly graph (since the sequence is not homopolymer compressed, the lengths and overlap sizes will not match the GFA!)</a> (md5: 086d3d968b2c8cbc8c4be891e56ad177) |
| 78 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/graph/chm13.draft_v0.9.layouts.tgz">Genomic paths through the v0.9 graph (part of chr9 was reconstructed by a different assembly method excluded)</a> (md5: 913205d75f5f9c49e5269eb4363fbf16) |
| 79 | + - Alignments (the index bai file is available under the same name as the bam with .bai appended (e.g. chm13.draft_v0.9.clr.bam has a chm13.draft_v0.9.clr.bam.bai) |
| 80 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v0.9.clr.bam">PacBio CLR alignments (generated via Winnowmap v1.11 -x map-pb-clr)</a> (md5: 7cd9c812e4398db6ed318969fe7080f9) |
| 81 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v0.9.hifi.bam">PacBio HiFi alignments (generated via Winnowmap v1.11 -x map-pb)</a> (md5: 7527b44aba07d9acbed597fbc445b61a) |
| 82 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v0.9.ont.bam">Oxford nanopore alignments (generated via Winnowmap v1.11 -x map-ont)</a> (md5: 4a5bbf70193e65c35a287a70099bb99c) |
| 83 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.draft_v0.9.pcrfree.bam">PCRFree Illumina alignments (generated via bwa mem v0.7.15)</a> (md5: 7c13fd36ae404eb41697ec5d54ba608f) |
| 84 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.chrX_v0.7.fasta.gz">Chromosome X v0.7</a> (md5: 89b3dd61db66177dd830527b920956fa) |
| 85 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/nanopore/rel1/rel1_to_v0.7_chrX.filtered.bam">Chromosome X v0.7 Nanopore rel1 unique k-mer anchored mappings</a> (md5: ada12a00d4781f6b0101a09be19abe93) |
| 86 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.chrX_v0.7.pacbioHiFi.bam">Chromosome X v0.7 PacBio HiFi unique k-mer anchored mappings</a> (md5: bd22daaf6d4a2cd775f109a853a911a9) |
| 87 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/alignments/chm13.chrX_v0.7.pacbioCLR.bam">Chromosome X v0.7 PacBio CLR unique k-mer anchored mappings</a> (md5: 69be7bd105ee590bf57853c249e1f8d8) |
| 88 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.chr8_v9.fasta.gz">Chromosome 8 v9</a> (md5: cc33037728ab1f743d3e79f85e8c10ac) |
| 89 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/nanopore/rel5/rel5_to_chr8_v9.filtered.bam">Chromosome 8 v9 Nanopore rel5 unique k-mer anchored mappings</a> (md5: e953525b097c98d8485a3a7b152da897) |
| 90 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.7.fasta.gz">Assembly draft v0.7</a> (md5: b9777540aaa0251c7dbb4974fb0a69d6) |
| 91 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.6.fasta.gz">Assembly draft v0.6</a> (md5: c3e3318e82ba5dc64b74f458f4989b85) |
| 92 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v0.4.fasta.gz">Assembly draft v0.4</a> (md5: 7e3c2fff9479ba45f7916fa1eee1310b) |
| 93 | + - <a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/HG002.chrX_v0.7.fasta.gz">HG002 chrX draft v0.7 (not T2T, missing p-arm PAR region)</a> (md5: 1d79ac022424fc5671135e2ac362d91d) |
| 94 | + |
0 commit comments