Skip to content
This repository has been archived by the owner on May 15, 2020. It is now read-only.

filter-bed parameter / option #87

Open
BenoitFiset opened this issue May 25, 2018 · 4 comments
Open

filter-bed parameter / option #87

BenoitFiset opened this issue May 25, 2018 · 4 comments

Comments

@BenoitFiset
Copy link

Hi Eric,

question about --filter-bed for Tumor-normal-enrichment. I was using the filter13.bed file to filter the centromere and all was fine and dandy.. Got results.

I created a bed (0 based as should be) file to filter everything (including the centromeres) that are not exon (so to keep only exon regions). My thought would be that the result CNV files would be smaller than when I only filter the centromeres.

Sample of the bed file: (total of 33491 lines in the whole file)

1	11868	31109
1	34553	36081
1	52472	53312
1	57597	64116
1	65418	71585
1	89294	134836
1	135140	135895
1	137681	137965
1	139789	140339
1	141473	173862
1	182695	184174
1	185216	195411
1	257863	522928
1	586070	859446
1	868070	877234
1	904833	915976
1	916864	921016
1	923927	959309
1	960586	965715
1	966496	982093
1	995965	998051

But no, there are many more CNVs called, lots more.

2465 - lines in the CNV.vcf of filter13.bed - centromere filter file
16944 - lines in the CNV.vcf of the filter not exon regions bed file

Other interesting thing, the EstimatedTumorPurity and OverallPloidy values are much better with the Exoms only region.

filter13.bed - (filter centromere) results:

##EstimatedTumorPurity=0.80
##PurityModelFit=0.0400
##InterModelDistance=0.7024
##LocalSDmetric=5.21
##Heterogeneity=0.00
##EstimatedChromosomeCount=49.86
##OverallPloidy=2.12

Filter not exon regions bed file results:

##EstimatedTumorPurity=0.99
##PurityModelFit=0.0312
##InterModelDistance=0.0293
##LocalSDmetric=3.09
##Heterogeneity=0.00
##EstimatedChromosomeCount=76.14
##OverallPloidy=3.25

Any thought on this, why more CNV when I filter out more regions ?
Is my understanding of the --filter-bed option good ? (In the file regions you want to exclude)

Thanks.

@eroller
Copy link
Member

eroller commented May 25, 2018

Instead of using the filter bed file to exclude regions in your manifest file, can't you just remove them from the manifest file? I don't know if that would make a difference, but the filter bed file is intended to be used globally for all samples and not adjusted per sample.

The reason you end up with a poorer model fit may be because you are limiting the number of regions. More data will give more points for fitting the model and result in a better fit.

You could always filter the regions after the VCF has been produced. Is there a reason you want to filter the regions during CNV calling? As long as you have valid read data in those regions I would use them for CNV calling unless there is a strong reason not to (e.g. runtime constraint, poor read quality in those regions).

@BenoitFiset
Copy link
Author

BenoitFiset commented Aug 24, 2018

Hi Eric,

to use Canvas for Whole Genome Tumor-normal-enrichment, what would I use as a manifest file as this option seems mandatory ?

If I create my own manifest file with all region... will Canvas break ?

[Header]
Manifest Version    1
ReferenceGenome Homo_sapiens\Ensembl\GRCh38\Sequence\WholeGenomeFASTA

[Regions]
Name    Chromosome  Start   End Upstream Probe Length   Downstream Probe Length
CEX-1-1-248956422	1	1	248956422	0	0
CEX-2-1-242193529	2	1	242193529	0	0
CEX-3-1-198295559	3	1	198295559	0	0
CEX-4-1-190214555	4	1	190214555	0	0
CEX-5-1-181538259	5	1	181538259	0	0
CEX-6-1-170805979	6	1	170805979	0	0
CEX-7-1-159345973	7	1	159345973	0	0
CEX-8-1-145138636	8	1	145138636	0	0
CEX-9-1-138394717	9	1	138394717	0	0
CEX-10-1-133797422	10	1	133797422	0	0
CEX-11-1-135086622	11	1	135086622	0	0
CEX-12-1-133275309	12	1	133275309	0	0
CEX-13-1-114364328	13	1	114364328	0	0
CEX-14-1-107043718	14	1	107043718	0	0
CEX-15-1-101991189	15	1	101991189	0	0
CEX-16-1-90338345	16	1	90338345	0	0
CEX-17-1-83257441	17	1	83257441	0	0
CEX-18-1-80373285	18	1	80373285	0	0
CEX-19-1-58617616	19	1	58617616	0	0
CEX-20-1-64444167	20	1	64444167	0	0
CEX-21-1-46709983	21	1	46709983	0	0
CEX-22-1-50818468	22	1	50818468	0	0
CEX-MT-1-16569	MT	1	16569	0	0
CEX-X-1-156040895	X	1	156040895	0	0
CEX-Y-2781480-56887902	Y	2781480	56887902	0	0

Thanks

@eroller
Copy link
Member

eroller commented Aug 24, 2018

You should be able to take the manifest file and adjust the regions to match your sequencing data. Canvas should not break.

@BenoitFiset
Copy link
Author

Thanks I'll give it a go.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants