-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNP calling #11
Comments
Step 1: Produce file of accession numbers from all of the sequences we have phenotypic data for CornellPostdoc/misc_scripts.sh Line 11 in 9dfb027
|
Step 2: Download the sequences from NCBI Using: CornellPostdoc/get_SRR_data.sh Lines 4 to 9 in 9dfb027
|
Step 3: Create unmapped bams from fastqs Using: Lines 12 to 28 in 9dfb027
|
Step 4: Mark Illumina adapters Lines 36 to 53 in 41c55b7
|
Step 5: Validate Sam File Lines 61 to 72 in 41c55b7
|
Step 6: Convert from Sam to Fastq Lines 78 to 94 in 41c55b7
|
Need a reference genome for next step - asking Stanhope/lab slack for recommendations on which version/strain to use |
Stanhope recommends using the PANTHER database reference genome Escherichia coli | E. coli | ECOLI | EnsemblGenome | Reference Proteome 2020_04 https://www.ebi.ac.uk/reference_proteomes/ the E coli reference: ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Bacteria/UP000000625_83333.fasta.gz |
^ that is actually a proteome so gatk didn't work Need a GENOME:
|
How to pick the best reference genome? We are wanting to find SNPs that are associated with particular resistance phenotypes However, we could also use a consensus sequence and call SNPs in samples from that
|
WDL notes |
Dockstore
|
Decided to choose a reference genome that is from a canine Genome chosen: AMR Genotypes: |
Creating unmapped bams from fastq files
|
Using GATK Best Practices
The text was updated successfully, but these errors were encountered: