Skip to content
/ vALK Public

A Bioinformatic Pipeline Optimized for Processing and Assessing Variants at the ALK Gene Locus

Notifications You must be signed in to change notification settings

ivadym/vALK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

vALK

A bioinformatic pipeline optimized for the processing and assessment of variants at the ALK gene locus

This project is the result of a master's thesis entitled:

Development of a Bioinformatic Pipeline for the Dectection of Somatic Mutations in Liquid Bipsy Samples from Non-Small Cell Lung Cancer Patients

Pipeline characterization

To detect variants at the ALK gene locus, specific conditions for each of them were established from a set of previously validated samples. The following flowchart shows the structure of the developed pipeline and the selection criteria based on certain variables such as:

  • MOL_COUNT: molecular coverage
  • READ_COUNT: read coverage
  • MAPD: median of the absolute values of all pair-wise differences
  • FILTER: Ion Reporter™ internal filter (Oncomine™ Variants v5.12)
  • CLN_SIG: clinical significance
  • VAR_CLASS: Oncomine™ Variant Class
  • POS: variant position
  • AF: allele frequency

Graphical User Interface (GUI)

The implemented user interface was developed to facilitate data input and interpretation of results.

On the one hand, it is possible to select the filter/s to apply to the non-filtered-oncomine.tsv file, as well as the gene to study. Currently only the ALK gene has been addressed. On the other hand, the user can select both the source and output files through a pop-up screen.

Finally, to fully characterize the variants that have passed a particular filter, the results are displayed specifying the mutation type, the row number of the variants in the non-filtered-oncomine.tsv file, and the identifier of the alternate allele. Simultaneously, each of the identified variants is appended to the .csv output file along with its main characteristics.

Performance

The initial sequenced samples were used to study the performance of the developed algorithm regarding the ALK mutations confirmed by dPCR, which was considered the gold standard. To assess this, sensitivity, specificity, and positive and negative predictive values, shown in the following table, were calculated based on 30 patients.

Statistic Value 95% CI
Sensitivity 87.50% 47.35% to 99.68%
Specificity 81.82% 59.72% to 94.81%
Disease prevalence 20.00% -
Positive Predictive Value 54.61% 32.31% to 75.20%
Negative Predictive Value 96.32% 80.55% to 99.40%
Accuracy 82.95% 64.83% to 94.13%

In this context, the algorithm managed to successfully identify 7 of the 8 mutations confirmed by dPCR, reaching a specificity of 87.50%. On the other hand, 4 patients were incorrectly classified as carriers of ALK mutations, thus a specificity of 81.82% was obtained. Finally, the algorithm reported an accuracy of 82.95%, with a remarkable performance in terms of discarding samples without any mutation since a negative predictive value of 96.32% was achieved. Noteworthy, the Oncomine™ Variants v5.12 filter only detected ALK mutations in 3 patients.

Technical requirements

The programming language used in this study was R v3.6.3, which capabilities were extended through additional packages such as Tcl/Tk and Scales v1.1.1.

About

A Bioinformatic Pipeline Optimized for Processing and Assessing Variants at the ALK Gene Locus

Resources

Stars

Watchers

Forks

Languages