-
Notifications
You must be signed in to change notification settings - Fork 0
/
Assn_7_help.txt
30 lines (22 loc) · 1.38 KB
/
Assn_7_help.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# go to watermelon_files folder and run below code to create database:
cat nt/*.fasta | makeblastdb -dbtype nucl -out mt_gene_db -title mt_gene_db
# save the output of your script into a file, example:
./parseGFF.py watermelon.gff watermelon.fsa > combined_exons.fsa
# now blast the output of your parseGFF script to the database of known sequences:
blastn -query combined_exons.fsa -db mt_gene_db -outfmt "6 qseqid sseqid length pident qcovs" -perc_identity 90 -max_target_seqs 1 > testing_parseGFF.blastn
# cat the file to examine the output
cat testing_parseGFF.blastn
# each best match should be the corresponding gene
# exception: sdh3-1 and sdh3-2 are identical, so if wrong match no problem :)
# otherwise all should be perfect matches with perfect query cover.
# Check number of matches found:
wc -l testing_praseGFF.blastn
# If only returning "CDS" features from gff should equal 39
#NOTE: Are your results not turning out the way you'd want? I had that experience...
# to resolve it: I looked at the files in the nt folder. Try running the following command:
# from the watermelon_files folder, run:
cat nt/*.fasta | grep ">"
# This is where I had my problem..
# nad4, nad7, and nad9 need an empty line at the end of their file, and then rerun the code from above
# this resulted in exactly what I wanted to see
# if you are still having issues, the issue might be with your parseGFF.py file.