Add VCF to JSON conversion #60

nvnieuwk · 2023-11-20T13:16:31Z

Description of feature

Add VCF to JSON conversion with certain filter to use for the visualization of the SVs

janvandenschilden · 2023-11-21T14:19:30Z

from pydantic import BaseModel, Field, validator
from typing import Optional

class Model(BaseModel):
    chr: str = Field(..., description="The chromosome where the structural variant (SV) is located")
    start_position: int = Field(..., description="The start position of the SV on the chromosome", ge=0)
    end_position: int = Field(..., description="The end position of the SV on the chromosome", ge=0)
    end_chr: Optional[str] = Field(None, description="The end chromosome where the SV is located, if different from the start chromosome")
    end_chr_start_position: Optional[int] = Field(None, description="The start position of the SV on the end chromosome, if different from the start chromosome", ge=0)
    end_chr_end_position: Optional[int] = Field(None, description="The end position of the SV on the end chromosome, if different from the start chromosome", ge=0)
    sv_type: str = Field(..., description="The type of the SV, such as deletion, duplication, inversion, translocation, etc.")
    size: int = Field(..., description="The size of the SV in base pairs", ge=0)
    caller: str = Field(..., description="The name of the tool or algorithm that detected the SV")
    qc: str = Field(..., description="The quality control status of the SV")
    genotype: str = Field(..., description="The genotype of the SV, such as 0/0, 0/1, 1/1, etc.")
    relevant_genes: Optional[list[str]] = Field(None, description="The list of genes that are affected by the SV")
    population_frequency: Optional[float] = Field(None, description="The frequency of the SV in the general population, if available", ge=0, le=1)
    repeat_content: Optional[bool] = Field(None, description="Whether the SV is located in a repeat region or not")

    @validator('end_chr', 'end_chr_start_position', 'end_chr_end_position', always=True)
    def check_end_chr(cls, v, values):
        # If end_chr is not None, then end_chr_start_position and end_chr_end_position must also be not None
        if values.get('end_chr') is not None and (values.get('end_chr_start_position') is None or values.get('end_chr_end_position') is None):
            raise ValueError('end_chr_start_position and end_chr_end_position must be specified if end_chr is not None')
        # If end_chr is None, then end_chr_start_position and end_chr_end_position must also be None
        if values.get('end_chr') is None and (values.get('end_chr_start_position') is not None or values.get('end_chr_end_position') is not None):
            raise ValueError('end_chr_start_position and end_chr_end_position must be None if end_chr is None')
        return v

nvnieuwk · 2023-11-21T14:24:13Z

Can you also post an example of a JSON entry used to create this model?

janvandenschilden · 2023-11-21T14:33:40Z

{
  "chr": "chr1",
  "start_position": 123456,
  "end_position": 123789,
  "end_chr": null,
  "end_chr_start_position": null,
  "end_chr_end_position": null,
  "sv_type": "deletion",
  "size": 333,
  "caller": "Manta",
  "qc": "PASS",
  "genotype": "0/1",
  "relevant_genes": ["BRCA1"],
  "population_frequency": 0.001,
  "repeat_content": false
}

nvnieuwk · 2023-11-21T14:42:08Z

Some simple notes I already have seeing this:

Maybe change caller to callers (because most variants will have more than one caller)
genotype will be in the format field for each sample, maybe you can reflect this in the JSON?
relevant_genes, caller, size, sv_type, population_frequency and repeat_content are all INFO fields. I think it would be better to have this in one field called info together (this could maybe also make it easier on us in the future to add more fields)

These are just suggestion, please let me know what you think 😃

nvnieuwk added the enhancement New feature or request label Nov 20, 2023

nvnieuwk added this to the 1.0.0 milestone Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VCF to JSON conversion #60

Add VCF to JSON conversion #60

nvnieuwk commented Nov 20, 2023

janvandenschilden commented Nov 21, 2023

nvnieuwk commented Nov 21, 2023

janvandenschilden commented Nov 21, 2023

nvnieuwk commented Nov 21, 2023

Add VCF to JSON conversion #60

Add VCF to JSON conversion #60

Comments

nvnieuwk commented Nov 20, 2023

Description of feature

janvandenschilden commented Nov 21, 2023

nvnieuwk commented Nov 21, 2023

janvandenschilden commented Nov 21, 2023

nvnieuwk commented Nov 21, 2023