Current development goals and outstanding tasks for bcbio-nextgen development. These are roughly ordered by current priority and we welcome contributors.
- Improved deployment experience using docker containers to provide a fully isolated bcbio-nextgen installation. Requires re-working of installation process to be a two step process: download docker + add external biological data. Also requires adjustment of the pipeline and distributed processing to involve starting and using code isolated inside docker container. Work in progress is at bcbio-nextgen-vm.
-
Enable processing on Amazon EC2 with use of spot instances and no shared filesystem. Store file intermediates in S3 object storage instead of globally shared filesystem and make use of high speed local ephemeral storage.
-
Integrated structural variant analysis, including CNV prediction. Current targets are lumpy, delly and cn.mops.
- Improved support for cancer tumor/normal paired callers. Suggested callers include SomaticSniper (#66, #109), LoFreq and others. A comprehensive discussion is at #112. FreeBayes supports tumor/normal calling: see this mailing list discussion for the suggested parameters. Requires improved framework for evaluating callers and approaches for handling Ensemble calling with multiple inputs (#67).
- Improve analysis of coverage, especially in targeted sequencing experiments. Plan to integrate with chanjo. See #249 for more discussion.
- Support gVCF and incremental join discovery approach for calling variants. Switches batch approaches to calling independently, then combining in a final step. Also integrate bcbio.variation.recall for performing in non-GATK, non-gVCF scenarios.
-
Explore options for accumulating and displaying summary information from multiple runs. Prioritize options which allow accumulation across multiple analysis machines and already handle query and visualization.
-
Once initial structural variation analysis and evaluation is in place, incorporate and evaluate additional CNV and structural variant callers. Some current targets are the VarScan2 CNV caller and Control-FREEC.
- Document and expand Ensemble calling functionality with work on speed ups and parallelization. Integrate development work on bcbio.variation.recall using recalling with local realignment.
-
Add in methylation analysis approaches. See [[#618][bcbio#618]] for discussion.
-
Handle split inputs across multiple sequencing lanes, handling merging of multiple fastq/BAM inputs and correctly maintaining lane information in BAM read group headers.
-
Test to see if less strict quality trimming results in better RNA-seq DE results.
-
Evaluate RNA-seq fusion analysis callers and implement support for one if we can find one with reliable results (#210).