HLA Imputation Accuracy Workflow

Introduction

Association studies for instance GWAS, traditionally use genotyping arrays to genotype large set of individuals and in this way determine SNPs that are significantly overrepresented in the cases compared to the controls and thus determine association with disease. Genotyping arrays are cheaper than sequencing but can only measure a tag of SNPs from the >300 million SNPs that are available. In order to increase the number of SNPs that can be used for association studies, genotype imputation is performed. Genotype imputation refers to the statistical inference of unobserved genotypes.

Some regions within the human genome, (MHC otherwise known as HLA), are highly variable and maybe difficult to impute. The HLA region has been associated to autoimmune diseases such as rheumatoid arthritis and infectious diseases such as HIV/AIDS. Accurate imputation of this region is key, as it would help increase the chances of identifying the causal variants of some autoimmune and immune mediated diseases.

Genotype imputation is a statistical process and thus needs to be assessed to ensure that the predicted genotypes are accurate.

The project focused on assessing the accuracy of imputing HLA Class I alleles in selected African populations.

Imputation accuracy was based on SNP2HLA and HIBAG imputation tools, 1kg-All, 1kg-Gwd, 1kg-Afr, H3Africa, prebuilt EUR reference panels and Illumina Omni 2.5 array, H3Africa array genotyping arrays

Installation

Nextflow The pipeline runs using Nextflow 21.10.6
Docker
Singularity

N/B You do not need to install any other tool as singularity profile will download the singularity image from https://quay.io/nanjalaruth/impute-hla

Preparing Input files

Target Genotype file

The input file must be a VCF file. As the work focuses on the HLA region, you are required to only use SNPs in chr6:29-34Mb. Thus, you can portably prepare only those SNPs in that region as input file. SNP2HLA uses hg18 data while HIBAG uses hg19 data. You are therefore required to provide both files as input.

Reference panel

The workflow focuses on running the data on a pre built European reference panel and reference panels that are built in the process of running the pipeline. To custom make a reference panel, a HLA type file and SNP genotype file are required. The pipeline thus requires a HLA type file and SNP genotype file to make the reference panel.

I focused on 4 custom made reference panels; 1kg-All, 1kg-Gwd, 1kg-Afr, H3Africa. The 1kg-Gwd and 1kg-Afr are a subset of 1kg-All. In case you have a similar dataset, assign the path to the file with the subpopulation rsids to the flag termed subpop_ids. This is to bypass the need to get a vcf file that matches the subsetted population.

If you are working with populations that are not linked, provide the paths to the SNP genotype vcf file as demonstrated in the genotype_files flag in the test.config file. If you have multiple files, you could assign them to the same flag by creating more lists like what has been done with the sample dataset. Also provide the path to the HLA type file as shown in the hlatype_files flag within the test.config file.

N/B HLA typing was done using the Optitype tool

Running the pipeline

There are 2 ways to run the pipeline:

Download it from GitHub

Steps to follow

Download the pipeline from GitHub

git clone [email protected]:nanjalaruth/MHC-Imputation-Accuracy.git

Move to that folder

cd MHC-Imputation-Accuracy

Edit the conf/test.config to suit the path to where your datasets are stored.
Run the command below

nextflow run main.nf -c conf/test.config -profile singularity

N/B If you are using any job scheduler, be sure to include it in the profile. For example:

nextflow run main.nf -c conf/test.config -profile singularity, slurm

Run the pipeline directly from GitHub

NextFlow will automatically fetch the pipeline from GitHub so you don't need to install it.

Steps to follow

Create your own config file locally to suit the path to your datasets. Use conf/test.config as a template.
Run the command below

nextflow run nanjalaruth/MHC-Imputation-Accuracy -c `path to your config file` -profile singularity

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.vscode		.vscode
conf		conf
modules		modules
multiethnic		multiethnic
scripts		scripts
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
SNP2HLA_package_v1.0.3.tar.gz		SNP2HLA_package_v1.0.3.tar.gz
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HLA Imputation Accuracy Workflow

Introduction

Installation

Preparing Input files

Target Genotype file

Reference panel

Running the pipeline

Download it from GitHub

Steps to follow

Run the pipeline directly from GitHub

Steps to follow

In case of any querries, tag me on an issue or send me a message on twitter (@Ruthnanje) or LinkedIn

About

Releases

Packages

Languages

License

nanjalaruth/MHC-Imputation-Accuracy

Folders and files

Latest commit

History

Repository files navigation

HLA Imputation Accuracy Workflow

Introduction

Installation

Preparing Input files

Target Genotype file

Reference panel

Running the pipeline

Download it from GitHub

Steps to follow

Run the pipeline directly from GitHub

Steps to follow

In case of any querries, tag me on an issue or send me a message on twitter (@Ruthnanje) or LinkedIn

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages