Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more duplicate meta domain annotation #25

Merged
merged 46 commits into from
Sep 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
b55538f
fixed spelling error
laurensvdwiel Aug 27, 2018
a67fc8c
added representation of a single nuclotide variant and tests on initi…
laurensvdwiel Aug 29, 2018
7ba4a99
Merge branch 'master' of github.com:cmbi/metadome into no_more_duplic…
laurensvdwiel Aug 29, 2018
2d2f539
added relevant methods from codon computation to this class, for redu…
laurensvdwiel Aug 29, 2018
35a66ba
moved methods to SingleNucleotideVariant class as statis methods
laurensvdwiel Aug 29, 2018
c35c2be
added additional tests for the new way to initialize a SNV; added stu…
laurensvdwiel Aug 29, 2018
da5a7de
added alt_base_pair_representation
laurensvdwiel Aug 29, 2018
119b28f
removed interpret_SNV_type (moved to the new class), added alt_base_p…
laurensvdwiel Aug 29, 2018
2c81b1b
added alt_base_pair_representation, threeltetter representation of al…
laurensvdwiel Aug 29, 2018
5897548
added unit tests for the static methods
laurensvdwiel Aug 29, 2018
09a35ed
removed comment
laurensvdwiel Aug 29, 2018
f652eab
changed the unique_str_representation to another name so it does not …
laurensvdwiel Aug 30, 2018
babd3c4
removed unused import
laurensvdwiel Aug 30, 2018
b42f207
added SN annotation file name
laurensvdwiel Aug 30, 2018
128fc06
added variant source to tests
laurensvdwiel Aug 30, 2018
a80e18b
added meta_domain_annotation to the MetaDomain object and a way to bu…
laurensvdwiel Aug 30, 2018
c40d278
added variant source to the snv objects
laurensvdwiel Aug 30, 2018
970deaf
added interpretation service, this annotates codons
laurensvdwiel Aug 31, 2018
39fd841
renamed annotation interpretation to codon annotation
laurensvdwiel Aug 31, 2018
0ae6738
moved code to codon_annotation, effectively removing redundant code
laurensvdwiel Aug 31, 2018
75128e1
fixed typo
laurensvdwiel Aug 31, 2018
ea256ca
minor fixes and added method to retrieve SNVs for a single consensus …
laurensvdwiel Aug 31, 2018
ac79d20
added measurements for alignment depths
laurensvdwiel Aug 31, 2018
e34c0de
added additional tests for meta domain s
laurensvdwiel Aug 31, 2018
fd30247
annotation is now done when initializing from variant and not in init…
laurensvdwiel Aug 31, 2018
b9b927e
fixed unhandled exception catching; added support for singlenucleotid…
laurensvdwiel Sep 3, 2018
2602bb1
changed formatting of clinvar variants for user interface
laurensvdwiel Sep 3, 2018
78993d3
added toJson constructions for user interface representations and add…
laurensvdwiel Sep 3, 2018
de5efb1
deleted no longer needed mocking
laurensvdwiel Sep 3, 2018
3ef64b3
revised import order
laurensvdwiel Sep 4, 2018
83c0d73
added a tognomadjson function for displaying the variant correctly in…
laurensvdwiel Sep 4, 2018
b38bbc2
added a method to annotate a single position with meta domain variant…
laurensvdwiel Sep 4, 2018
8f3005b
removed the mocking tasks
laurensvdwiel Sep 4, 2018
965cd8a
added toCodonJson
laurensvdwiel Sep 5, 2018
059c5d9
added method to retrieve a codon for an aligned position
laurensvdwiel Sep 5, 2018
cf56fbf
fixed three_letter_amino_acid representation for stop codons and upda…
laurensvdwiel Sep 5, 2018
5c54353
reduced redundancy of the code further
laurensvdwiel Sep 5, 2018
1eb5637
added route for retrieving single position information
laurensvdwiel Sep 5, 2018
a38f3eb
finalized the method to retrieve the information on a single position
laurensvdwiel Sep 5, 2018
aeaf3ce
moved and updated code for filling the tables and overview for a sing…
laurensvdwiel Sep 5, 2018
79244c5
moved code to the dashboard.js updated called methods with proper var…
laurensvdwiel Sep 5, 2018
13febe3
moved these methods to the dashboard
laurensvdwiel Sep 5, 2018
2f30f3d
added methods from visualization that affect the tables on the dashboard
laurensvdwiel Sep 5, 2018
6a92045
added loading overlay; added fix for not-aligned positions
laurensvdwiel Sep 5, 2018
7331709
changed loading overlay text
laurensvdwiel Sep 5, 2018
20d34df
removed unused form
laurensvdwiel Sep 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions metadome/default_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
METADOMAIN_ALIGNMENT_FILE_NAME = 'metadomain_alignments' # Alignments are saved as: METADOMAIN_DIR+<Pfam_id>+'/'+METADOMAIN_ALIGNMENT_FILE_NAME
METADOMAIN_MAPPING_FILE_NAME = 'metadomain_mappings' # Mappings are saved as: METADOMAIN_DIR+<Pfam_id>+'/'+METADOMAIN_MAPPING_FILE_NAME
METADOMAIN_DETAILS_FILE_NAME = 'metadomain_details.json' # Details are saved as: METADOMAIN_DIR+<Pfam_id>+'/'+METADOMAIN_DETAILS_FILE_NAME
METADOMAIN_SNV_ANNOTATION_FILE_NAME = 'metadomain_snv_annotation' # Annotations are saved as: METADOMAIN_DIR+<Pfam_id>+'/'+METADOMAIN_SNV_ANNOTATION_FILE_NAME

# Pre-build visualization files
PRE_BUILD_VISUALIZATION_DIR = DATA_DIR+"metadome_visualization/"
Expand Down
54 changes: 29 additions & 25 deletions metadome/domain/models/entities/codon.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
from metadome.domain.services.helper_functions import convertListOfIntegerToRanges, list_of_stringified_of_ranges
from metadome.domain.services.computation.codon_computations import interpret_alt_codon, residue_variant_type
from metadome.domain.models.gene import Strand
from Bio.Data.IUPACData import protein_letters_1to3
from Bio.Seq import translate

class MalformedCodonException(Exception):
pass
Expand All @@ -29,6 +27,17 @@ class Codon(object):
cDNA_position_two int the position corresponding to the second base pair in the cDNA
cDNA_position_three int the position corresponding to the third base pair in the cDNA
"""
@staticmethod
def one_to_three_letter_amino_acid_residue(amino_acid_residue):
"""Returns a three letter representation of the provided amino acid residue"""
# Check if this is a Pyrrolysine
if amino_acid_residue == 'O':
return 'Pyl'
# Check if this is a Selenocysteine
if amino_acid_residue == 'U':
return 'Sec'
# Return one of the 20 amino acid residues
return protein_letters_1to3[amino_acid_residue];

def unique_str_representation(self):
return str(self.chr)+":"+str(self.regions)+"::("+str(self.strand)+")"
Expand Down Expand Up @@ -63,31 +72,9 @@ def retrieve_mappings_per_chromosome(self):

return mappings_per_chromosome

def interpret_SNV_type(self, position, var_nucleotide):
"""Interprets the new codon, residue and type of a SNV"""
codon_pos = self.retrieve_mappings_per_chromosome()[position]['codon_base_pair_position']

alt_codon = interpret_alt_codon(self.base_pair_representation, codon_pos, var_nucleotide)
alt_residue = translate(alt_codon)
var_type = residue_variant_type(self.amino_acid_residue, alt_residue)

if not var_type == 'nonsense':
alt_residue_triplet = protein_letters_1to3[alt_residue]
else:
alt_residue_triplet = alt_residue

return alt_codon, alt_residue, alt_residue_triplet, var_type

def three_letter_amino_acid_residue(self):
"""Returns a three letter representation of the amino acid residue for this codon"""
# Check if this is a Pyrrolysine
if self.amino_acid_residue == 'O':
return 'Pyl'
# Check if this is a Selenocysteine
if self.amino_acid_residue == 'U':
return 'Sec'
# Return one of the 20 amino acid residues
return protein_letters_1to3[self.amino_acid_residue];
return Codon.one_to_three_letter_amino_acid_residue(self.amino_acid_residue)

def pretty_print_cDNA_region(self):
return "c."+str(self.cDNA_position_one)+"-"+str(self.cDNA_position_three)
Expand Down Expand Up @@ -257,6 +244,23 @@ def toDict(self):

return _d

def toCodonJson(self):
laurensvdwiel marked this conversation as resolved.
Show resolved Hide resolved
json_entry = {}

# Add positional information
json_entry['strand'] = self.strand.value
json_entry['protein_pos'] = self.amino_acid_position
json_entry['cdna_pos'] = self.pretty_print_cDNA_region()
json_entry['chr'] = self.chr
json_entry['chr_positions'] = self.pretty_print_chr_region()

# Add residue and nucleotide information
json_entry['ref_aa'] = self.amino_acid_residue
json_entry['ref_aa_triplet'] = self.three_letter_amino_acid_residue()
json_entry['ref_codon'] = self.base_pair_representation

return json_entry

def __repr__(self):
return "<Codon(representation='%s', amino_acid_residue='%s', chr='%s', chr_positions='%s', strand='%s')>" % (
self.base_pair_representation, self.amino_acid_residue, self.chr, str(self.regions), self.strand )
124 changes: 116 additions & 8 deletions metadome/domain/models/entities/meta_domain.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
from metadome.domain.data_generation.mapping.meta_domain_mapping import generate_pfam_aligned_codons
from metadome.domain.models.entities.codon import Codon, MalformedCodonException
from metadome.domain.services.annotation.codon_annotation import annotate_ClinVar_SNVs_for_codons,\
annotate_gnomAD_SNVs_for_codons
from metadome.domain.models.entities.single_nucleotide_variant import SingleNucleotideVariant
from metadome.domain.models.entities.codon import Codon
from metadome.default_settings import METADOMAIN_DIR,\
METADOMAIN_MAPPING_FILE_NAME, METADOMAIN_DETAILS_FILE_NAME
METADOMAIN_MAPPING_FILE_NAME, METADOMAIN_DETAILS_FILE_NAME,\
METADOMAIN_SNV_ANNOTATION_FILE_NAME

import pandas as pd
import numpy as np
import json
import os

Expand All @@ -13,10 +19,10 @@
class UnsupportedMetaDomainIdentifier(Exception):
pass

class NotEnoughOccurrencesForMetaDomain(Exception):
class ConsensusPositionOutOfBounds(Exception):
pass

class ConsensusPositionOutOfBounds(Exception):
class NotInMetaDomain(Exception):
pass

class MetaDomain(object):
Expand All @@ -32,7 +38,34 @@ class MetaDomain(object):
n_instances int number of unique instances containing this domain
n_transcripts int number of unique transcripts containing this domain
meta_domain_mapping pandas.DataFrame containing all codons annotated with corresponding consensus position
meta_domain_annotation pandas.DataFrame containing all SNVs with corresponding consensus position
"""

def get_annotated_SNVs_for_consensus_position(self, consensus_position):
"""Retrieves SNVs for this consensus position as:
{SingleNucleotideVariant.unique_var_str_representation(): dict()}"""
snvs = dict()

if consensus_position < 0:
raise ConsensusPositionOutOfBounds("The provided consensus position ('"+str(consensus_position)+"') is below zero, this position foes not exist")
if consensus_position >= self.consensus_length:
raise ConsensusPositionOutOfBounds("The provided consensus position ('"+str(consensus_position)+"') is above the maximum consensus length ('"+str(self.consensus_length)+"'), this position foes not exist")

# Retrieve all codons aligned to the consensus position
aligned_to_position = self.meta_domain_annotation[self.meta_domain_annotation.consensus_pos == consensus_position].to_dict('records')

# first check if the consensus position is present in the mappings_per_consensus_pos
if len(aligned_to_position) >0:
for snv in aligned_to_position:
# aggregate duplicate chromosomal regions
if not snv['unique_snv_str_representation'] in snvs.keys():
snvs[snv['unique_snv_str_representation']] = []

# add the codon to the dictionary
snvs[snv['unique_snv_str_representation']].append(snv)

# return the codons that correspond to this position
return snvs

def get_consensus_positions_for_uniprot_position(self, uniprot_ac, uniprot_position):
"""Retrieves the consensus positions for this MetaDomain
Expand All @@ -58,10 +91,21 @@ def get_consensus_positions_for_uniprot_position(self, uniprot_ac, uniprot_posit

return consensus_positions

def get_codon_for_transcript_and_position(self, transcript_id, protein_position):
"""Construct the codon for a provided position"""
# Retrieve all codons aligned to the consensus position
aligned_to_position = self.meta_domain_mapping[(self.meta_domain_mapping.gencode_transcription_id == transcript_id) & (self.meta_domain_mapping.amino_acid_position == protein_position)].to_dict('records')

if len(aligned_to_position) == 0:
raise NotInMetaDomain("No codons found to be aligned for metadomain '"+str(self.domain_id)+"' for transcript '"+str(transcript_id)+"' at position '"+str(protein_position)+"'")
else:
return Codon.initializeFromDict(aligned_to_position[0])


def get_codons_aligned_to_consensus_position(self, consensus_position):
"""Retrieves codons for this consensus position as:
{Codon.unique_str_representation(): Codon}"""
codons = {}
codons = dict()

if consensus_position < 0:
raise ConsensusPositionOutOfBounds("The provided consensus position ('"+str(consensus_position)+"') is below zero, this position foes not exist")
Expand All @@ -73,7 +117,6 @@ def get_codons_aligned_to_consensus_position(self, consensus_position):

# first check if the consensus position is present in the mappings_per_consensus_pos
if len(aligned_to_position) >0:
codons = dict()
for codon_dict in aligned_to_position:
# initialize a codon from the dataframe row
codon = Codon.initializeFromDict(codon_dict)
Expand All @@ -88,11 +131,73 @@ def get_codons_aligned_to_consensus_position(self, consensus_position):
# return the codons that correspond to this position
return codons

def __init__(self, domain_id, consensus_length, n_instances, meta_domain_mapping):
def get_alignment_depth_for_consensus_position(self, consensus_position):
"""Retrieves the number of aligned codons for this consensus position"""
if consensus_position < 0:
raise ConsensusPositionOutOfBounds("The provided consensus position ('"+str(consensus_position)+"') is below zero, this position foes not exist")
if consensus_position >= self.consensus_length:
raise ConsensusPositionOutOfBounds("The provided consensus position ('"+str(consensus_position)+"') is above the maximum consensus length ('"+str(self.consensus_length)+"'), this position foes not exist")

# Retrieve all codons aligned to the consensus position
aligned_to_position = self.meta_domain_mapping[self.meta_domain_mapping.consensus_pos == consensus_position].to_dict('records')

unique_keys = [Codon.initializeFromDict(codon_dict).unique_str_representation() for codon_dict in aligned_to_position]
return len(np.unique(unique_keys))

def get_max_alignment_depth(self):
alignment_depths = [ self.get_alignment_depth_for_consensus_position(consensus_position) for consensus_position in range(self.consensus_length)]
return int(np.max(alignment_depths))

def annotate_metadomain(self, reannotate=False):
"""Annotate this meta domain with gnomAD and ClinVar variants"""
# check if a Meta Domain is already mapped
meta_domain_dir = METADOMAIN_DIR+self.domain_id
meta_domain_snv_annotation_file = meta_domain_dir+'/'+METADOMAIN_SNV_ANNOTATION_FILE_NAME

# initialize the meta_domain_annotation as a list
meta_domain_annotation = []

# Check if the mapping has previously been annotated already
if os.path.exists(meta_domain_snv_annotation_file) and not reannotate:
# The mapping exists, load it
_log.info('Loading previously annotated MetaDomain for domain id: '+str(self.domain_id))
# Read the files
_log.info("Reading '{}'".format(meta_domain_snv_annotation_file))
self.meta_domain_annotation = pd.read_csv(meta_domain_snv_annotation_file)
else:
# The annotation does not exists yet, or needs be recreated/reannotated
_log.info('Start annotation of MetaDomain for domain id: '+str(self.domain_id))

# Retrieve all codons
for consensus_position in range(self.consensus_length):
meta_codons = self.get_codons_aligned_to_consensus_position(consensus_position)

# Annotate ClinVar and gnomAD SNVs
for unique_str_repr in meta_codons.keys():
for snv in annotate_ClinVar_SNVs_for_codons(meta_codons[unique_str_repr]):
snv['consensus_pos'] = consensus_position
meta_domain_annotation.append(snv)
for snv in annotate_gnomAD_SNVs_for_codons(meta_codons[unique_str_repr]):
snv['consensus_pos'] = consensus_position
meta_domain_annotation.append(snv)

# convert meta_domain_mapping to a pandas Dataframe
meta_domain_annotation = pd.DataFrame(meta_domain_annotation)

# save meta_domain_mapping to disk
meta_domain_annotation.to_csv(meta_domain_snv_annotation_file)

# set to variable
self.meta_domain_annotation = meta_domain_annotation

_log.info('Finished annotation of MetaDomain for domain id: '+str(self.domain_id))

def __init__(self, domain_id, consensus_length, n_instances, meta_domain_mapping, meta_domain_annotation):
self.domain_id = domain_id
self.consensus_length = consensus_length
self.n_instances = n_instances
self.meta_domain_mapping = meta_domain_mapping
self.meta_domain_annotation = meta_domain_annotation

# derive from meta_domain_mapping
self.n_proteins = len(pd.unique(self.meta_domain_mapping.uniprot_ac))
Expand Down Expand Up @@ -165,7 +270,10 @@ def initializeFromDomainID(cls, domain_id, recreate=False):
raise UnsupportedMetaDomainIdentifier("Expected a Pfam domain, instead the identifier '"+str(domain_id)+"' was received")

# Attempt to create the object
meta_domain = cls(domain_id, consensus_length, n_instances, meta_domain_mapping)
meta_domain = cls(domain_id, consensus_length, n_instances, meta_domain_mapping, pd.DataFrame())

# Annotate this meta domain
meta_domain.annotate_metadomain()

# return the object
return meta_domain
Expand Down
Loading