Skip to content

giannkas/gwas-bionets

Repository files navigation

gwas-bionets

gwas-bionets is a repository to run different biological network methods, namely Heinz, HotNet2, and SigMod. The input is a typical fileset for GWAS analysis in PLINK 1.9 format (.bim, .bed and .fam). The general workflow for constructing a consensus network is as follows:

And for generating a stable consensus network (when the k parameter is greater than 1) the correspoding pipeline is:

For the latter image, we used k=5 as illustration but you can change to a greater number which makes sense in your experiments. Also, 1H,2H, ... corresponds to the solutions outputted by Heinz; 1N, 2N, ... solutions of HotNet2; and 1S, 2S, ... solutions of SigMod. We omitted the filtering step in the pipeline, so please ensure your data is filtered prior. We used BioGRID as a source for the PPI, but you can choose another network of reference, considering a two-column header indicating the connection between two molecules (genes, proteins, ...) as the format, i.e., 'Official Symbol Interactor A' and 'Official Symbol Interactor B'.

Software requirements

This code uses software already outdated so it is encouraged to follow the installation process; otherwise, it may not work. An update of the methods and requirements is planned but still needs to be carried out. Due to not having administrative rights or to avoiding conflicts, most software needs to be installed locally or within an environment, and paths to these installations must be redirected. It is assumed that you run these commands on a Unix-like machine.

Ideally, create a folder in your home directory to store all software. For example:

mkdir ~/bin

Install Java (required for installing nextflow)

Create a "java" folder in the software directory and navigate to it.

mkdir ~/bin/java
cd ~/bin/java

Download the x64 version of Java as a tar.gz file from the Oracle website into your machine and decompress it.

wget https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz
tar xzfv jdk-17.0.10_linux-x64_bin.tar.gz 
rm jdk-17.0.10_linux-x64_bin.tar.gz

Export the path to the bin directory of this folder into the system variable $PATH to make Java executable. Also, export the $JAVA_HOME variable indicating the root directory. Ideally, add these to ~/.bashrc to avoid repeating the process on each server connection or reboot, eg.

export PATH=$PATH:/home/username/bin/java/jdk-17.0.10/bin
export JAVA_HOME=/home/username/bin/java/jdk-17.0.10

You may need to source .bashrc file before checking installation, so type:

source ~/.bashrc

Test the installation:

java -version

You should see something like:

openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu220.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu220.04, mixed mode, sharing)

Install Nextflow

Create a "nextflow" folder in the software directory and navigate to it.

mkdir ~/bin/nextflow
cd ~/bin/nextflow

Download Nextflow version 22.10.4 and decompress:

wget https://github.com/nextflow-io/nextflow/archive/refs/tags/v22.10.4.tar.gz
tar -xzvf /gwas-bionets/nextflow/nextflow-22.10.4.tar.gz

Compile and install it:

make compile
make pack
make install

Add to your path:

export PATH=$PATH:/home/username/bin/nextflow

Test the installation:

nextflow -version

Install MAGMA

You can follow the installation instructions for MAGMA at its website (version 1.10): Multi-marker Analysis of GenoMic Annotation. According to the documentation MAGMA: is a self-contained executable and does not need to be installed.

Install PLINK

Similarly, you can install PLINK (version 1.9) from its website: population linkage. PLINK is also self-contained executable so either you add to your path or reference the executable when using it.

Install R and some packages (required for the methods)

If you dont't have R installed in your machine (add your superuse credentials if needed to install software), then proceed as follows:

wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
apt-get -y install --no-install-recommends r-base r-base-dev

Setup the general CRAN repo.

echo 'local({
    r <- getOption("repos")
    r["CRAN"] = "https://cloud.r-project.org/"
    options(repos = r)
  })' >> /etc/R/Rprofile.site

Install Bioconductor, twilight and BioNet (the latter contains the necessary files to use Heinz method).

R -e "if (!requireNamespace('BiocManager', quietly = TRUE))
            install.packages('BiocManager')"
R -e "BiocManager::install('BioNet')"
R -e "BiocManager::install('twilight')"

Install R packages, tidyverse, cowplot, igraph and gprofiler2:

R -e "install.packages(c('tidyverse', 'cowplot', 'igraph', 'gprofiler2'))" 

Install Python2

HotNet2 uses python2 to run some of its scripts; nowadays, it may be troublesome to install python 2.7 so we suggest to use a conda environment (although, we did not follow this alternative).

apt-get -y install python2-dev python2 python-pip

Add some python2 libraries needed for HotNet2:

pip2 install numpy==1.12.1 scipy==0.19.0 networkx==1.11 h5py==2.7.0

Install SigMod

You can install SigMod (version 2) from this website: Strongly Interconnected Gene MODule. This is an R package and it suffices to assign the parameter sigmod_path when calling the bionets.nf script, eg. --sigmod_path="~/bin/SigMod_v2".

Install HotNet2

You can install HotNet2 from this website: HotNet2. Save the code in a folder and name it hotnet2 whose location can therefore reference in the parameter hotnet2_path when calling bionets.nf script, eg. --hotnet2_path="~/bin/hotnet2"

Install Heinz

You have already installed when installing the BioNet package from Bioconductor :-)

After that, we are all set!

Main Scripts

  1. This script works with the raw data for splitting it if parametrized with the k parameter. The script has the needed parameters to be filled by the user, clearly, you can run each of the steps within the script separately.

bionets_construction_from_data.sh

  1. This script works with the scores previously computed using a software like MAGMA for the gene P-values. Again, a k parameter greater than 1 generates k-fold solutions. As above, we conceived the script to be modified to provide the parameters.

bionets_construction_from_scores.sh

About

Pipelines for GWAS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published