-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing example files #1
Comments
Hi Marcus,
Thank you for your question.
The 'sample_id.txt' file should contain the sample names as is present in
the vcf file. In the vcf file the sample names are AGG0030, AGG0031 and
AGG0032 but in the 'sample_id.txt' file the sample names are sample1,
sample2, and sample3.
novoCaller needs unrelated control samples are present which the algorithm
uses to judge the quality of the calls. The example vcf file contains only
three samples which make the trio. Please try using an example vcf file
with larger number of samples.
Best Regards,
Anwoy
…On Mon, Jan 7, 2019 at 5:23 PM cmarcuscy ***@***.***> wrote:
Hi developers of novoCaller,
I have tried running the first layer of novoCaller with the following
command but the program just keep on running for over 24 hours without
generating any output data. I am new to bioinformatics so please correct me
if I made any mistakes.
Command:
novoCaller -I input.vcf -O step_1_out.txt -T sample_id.txt -X 1 -P 0.005
-E 0.008
vcf:
example.vcf.gz
<https://github.com/bgm-cwg/novoCaller/files/2732605/example.vcf.gz>
sample ID file:
sample_id.txt
<https://github.com/bgm-cwg/novoCaller/files/2732610/sample_id.txt>
It would be very helpful if you can provide example files for the program.
Thanks a lot!
Marcus
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCNxKx0bpm--dwJi8lgjc_w4ljcrCzks5vAzU5gaJpZM4ZzXhq>
.
|
Hi Anwoy, Thank you for you answers. Upon you suggestions, I have incorporated more samples (261 samples) in the run and make sure the samples names and sample ID matches, but still, the program is unable to generate any data (after running for 2 days), nor did an error message pops up. Do you have any suggestion on how I should troubleshoot? Thanks a lot! Regards, |
Can you please send me the vcf file and the samples.txt file?
…On Fri, Jan 11, 2019, 8:15 AM cmarcuscy ***@***.*** wrote:
Hi Anwoy,
Thank you for you answers. Upon you suggestions, I have incorporated more
samples (261 samples) in the run and make sure the samples names and sample
ID matches, but still, the program is unable to generate any data (after
running for 2 days), nor did an error message pops up. Do you have any
suggestion on how I should troubleshoot?
Thanks a lot!
Regards,
Marcus
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN-0S9ydnveb35Mp3khL_NkGdmObXks5vB_q7gaJpZM4ZzXhq>
.
|
Dear Anwoy, Please find the vcf (first 1000 lines) and samples.txt files below. Thanks! Marcus |
Thanks Marcus, I will get back to you soon.
…On Sun, Jan 13, 2019 at 9:40 AM cmarcuscy ***@***.***> wrote:
Dear Anwoy,
Please find the vcf (first 1000 lines) and samples.txt files below. Thanks!
novocaller_sample.vcf.gz
<https://github.com/bgm-cwg/novoCaller/files/2752795/novocaller_sample.vcf.gz>
novoCaller_samples.txt
<https://github.com/bgm-cwg/novoCaller/files/2752794/novoCaller_samples.txt>
Marcus
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN8W-72wFVYTO--ffC3Bw6L7HBnpVks5vCrGZgaJpZM4ZzXhq>
.
|
Hi Marcus,
The caller was made to read the output of VEP (variant effect predictor)
which is present in the FORMAT field with the key 'CSQ'. Since VEP was not
run on the vcf file, the caller did not work. Thanks for finding this bug.
I will make it so that the caller gives an error when it doesn't find the
'CSQ' key. You can try running VEP on the file and running the caller again.
Best Regards,
Anwoy
…On Sun, Jan 13, 2019 at 6:32 PM anwoy mohanty ***@***.***> wrote:
Thanks Marcus, I will get back to you soon.
On Sun, Jan 13, 2019 at 9:40 AM cmarcuscy ***@***.***>
wrote:
> Dear Anwoy,
>
> Please find the vcf (first 1000 lines) and samples.txt files below.
> Thanks!
>
> novocaller_sample.vcf.gz
> <https://github.com/bgm-cwg/novoCaller/files/2752795/novocaller_sample.vcf.gz>
>
> novoCaller_samples.txt
> <https://github.com/bgm-cwg/novoCaller/files/2752794/novoCaller_samples.txt>
>
> Marcus
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AJwCN8W-72wFVYTO--ffC3Bw6L7HBnpVks5vCrGZgaJpZM4ZzXhq>
> .
>
|
Hi Anwoy, Thank you for your work to fix the bug. I will try running novocaller after running VEP. Regards, |
Hi Anwoy, infilename=/home/ramsar1971/project/asd/Reannotation/vep/ASD_276.recaliecalls_kggseq_samprm_vep.vcf 0 1 2 3 4 5 6 7 8 It seems that the program only recognizes three sets of trios among the 88 trios included. Do you have any idea? Thanks! Input txt file: Output file: Marcus |
Hi Marcus,
Sorry for the late reply. Yes the caller was made for a Mendelian diseases
research team which generally works on cases comprising of one trio when a
de-novo case is suspected. Although the code can be modified to give output
for all the trios. The expected number of de-novo mutations in the coding
region per trio (which is where the software looks at) is around 1 ~ 3 in
number. So I would say the 1 call is within the expected number of calls.
If you are interested in running the caller for a large scale de-novo
study, the code will have to be modified slightly.
Best Regards,
Anwoy
…On Mon, Feb 11, 2019 at 9:03 AM cmarcuscy ***@***.***> wrote:
Hi Anwoy,
I have tried annotating the vcf with VEP and I now successfully get the
program to run. Nonetheless, I encounter some unexpected results.
infilename=/home/ramsar1971/project/asd/Reannotation/vep/ASD_276.recaliecalls_kggseq_samprm_vep.vcf
trio_ID_filename=/home/ramsar1971/project/asd/Reannotation/ASD88_Trio_novocaller.txt
outfilename=/home/ramsar1971/project/asd/Reannotation/vep/ASD_276_step1_out.txt
X_choice=1
PP_thresh=0.005
ExAC_thresh=0.008
vcf_line_cols:
------------------------------
0 1 2 3 4 5 6 7 8
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
total_candidates=261
end_col=260
number of parents = 258
number of children = 3
parent_cols=
3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99
💯
101:102:103:104:105:106:107:108:109:110:111:112:113:114:115:116:117:118:119:120:121:122:123:124:125:126:127:128:129:130:131:132:133:134:135:136:137:138:139:140:141:142:143:144:145:146:147:148:149:150:151:152:153:154:155:156:157:158:159:160:161:162:163:164:165:166:167:168:169:170:171:172:173:174:175:176:177:178:179:180:181:182:183:184:185:186:187:188:189:190:191:192:193:194:195:196:197:198:199:200:201:202:203:204:205:206:207:208:209:210:211:212:213:214:215:216:217:218:219:220:221:222:223:224:225:226:227:228:229:230:231:232:233:234:235:236:237:238:239:240:241:242:243:244:245:246:247:248:249:250:251:252:253:254:255:256:257:258:259:260:
trio_set=
1:2:0:
CSQ_ExAC_AF_col=32
It seems that the program only recognizes three sets of trios among the 88
trios included.
Another point to note is that the output only contains 1 candidate DN
mutation:
Do you have any idea? Thanks!
Input vcf:
1000_novocaller.vcf.gz
<https://github.com/bgm-cwg/novoCaller/files/2849782/1000_novocaller.vcf.gz>
Input txt file:
pedigree.txt
<https://github.com/bgm-cwg/novoCaller/files/2849783/pedigree.txt>
Output file:
novocaller_step1_out.txt
<https://github.com/bgm-cwg/novoCaller/files/2849784/novocaller_step1_out.txt>
Marcus
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN1jucQ83DLPq7NWP8bgOKCPcrbKbks5vMOSVgaJpZM4ZzXhq>
.
|
Hi Anwoy, I've already got the CSQ vcf which means run VEP on VCF. Here is my command to run novocaller "./novoCaller -I 11.vcf -O SSC02220.txt -T trio_ids.txt -X 1 -P 0.5 -E 0.008" the trio_ids.txt looks like "SSC02220 SSC02219 SSC02217 " The 11.vcf is quad vcf, which have 4 individuals in this VCF. Can novoCaller works on quad VCFs? or something wrong with my command line? Sorry to ask you so many trivial questions Best Regards, Aojie |
Hi Anwoy, I am perplexed about unrelated control samples. I am new to bioinformatics. There's so much that I don't understand. Thanks a lot! |
Hi Liangdy,
the unrelated samples can be samples with normal phenotype, or samples with
other diseases.
Best Regards,
Anwoy
…On Mon, Mar 18, 2019 at 3:39 PM liangdyGao ***@***.***> wrote:
Hi Anwoy,
I am perplexed about unrelated control samples.
Are the unrelated samples those with normal phenotype, these with other
disease or different samples that have the same phenotype?
I am new to bioinformatics. There's so much that I don't understand.
Sorry to ask you so many trivial questions
Thanks a lot!
Liangdy
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN16Jcc14CW-0iAAWAr2hpdRZ-7fFks5vX2X2gaJpZM4ZzXhq>
.
|
The unrelated samples must also not be related to the proband (cousins,
uncles, aunts etc. of the proband are not preferred).
…On Mon, Mar 18, 2019 at 3:39 PM liangdyGao ***@***.***> wrote:
Hi Anwoy,
I am perplexed about unrelated control samples.
Are the unrelated samples those with normal phenotype, these with other
disease or different samples that have the same phenotype?
I am new to bioinformatics. There's so much that I don't understand.
Sorry to ask you so many trivial questions
Thanks a lot!
Liangdy
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN16Jcc14CW-0iAAWAr2hpdRZ-7fFks5vX2X2gaJpZM4ZzXhq>
.
|
Hi Anwoy, Thank you for your answers. If we merge multiple vcf files by vcftools or bcftools , the unrelated sample information of the merged file may display as follows:
AGG003 and AGG0001 lose information such as DP, PQ and so on . When merging vcfs in bam-level with GATK , the information above is preserved. But the computional amount is obviously increased.
Which approach is more suitable for DNMs calling in order to maximize accuracy and eliminate false negatives? Sorry to ask you so many trivial questions just like before Thanks a lot! |
The AD information (allele depth) is needed in as many unrelated samples as
possible as that information is used to judge the quality of the de-novo
call.
…On Tue, Mar 19, 2019 at 7:38 AM liangdyGao ***@***.***> wrote:
Hi Anwoy,
Thank you for you answers.
If we merge multiple vcf files by vcftools or bcftools , the unrelated
sample information of the merged file may display as follows:
#CHR POS ... AGG0002 AGG0003 AGG0001
Q X ... 1/0:10,0:10:27:0,27,405 .:.:.:.:. .:.:.:.:.
AGG003 and AGG0001 lose information such as DP, PQ and so on .
When merging vcfs in bam-level with GATK , the information above is
preserved. But the computional amount is obviously increased.
#CHR POS ... AGG0002 AGG0003 AGG0001
Q X ... 1/0:10,0:10:27:0,27,405 0/0:10,0:10:27:0,27,405
0/0:12,0:12:30:0,30,450
Which approach is more suitable for DNMs calling in order to maximize
accuracy and eliminate false negatives?
Or these are almost no effect on the final result?
Sorry to ask you so many trivial questions just like before
Thanks a lot!
Liangdy
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJwCN7yyvzJDzDV2RKI6cKZF1olwms-aks5vYEa1gaJpZM4ZzXhq>
.
|
@anwoy Thank you for the tips! Can you provide an example of the runtime for an exome trio? full genome trio? Can it be scaled to run on a pvcf with 50K samples? |
Hi developers of novoCaller,
I have tried running the first layer of novoCaller with the following command but the program just keep on running for over 24 hours without generating any output data. I am new to bioinformatics so please correct me if I made any mistakes.
Command:
novoCaller -I input.vcf -O step_1_out.txt -T sample_id.txt -X 1 -P 0.005 -E 0.008
vcf:
example.vcf.gz
sample ID file:
sample_id.txt
It would be very helpful if you can provide example files for the program.
Thanks a lot!
Marcus
The text was updated successfully, but these errors were encountered: