Skip to content

idekerlab/cansrmapp

Repository files navigation

cansrmapp

a b Documentation Status

CanSRMaPP is a modeling tool for identifying a minimal feature set describing the metagenome of a cancer cohort.

Dependencies

Compatibility

  • Python 3.11+
  • CUDA 12.1 _only_ if using GPU
Note

CUDA is only required for implementations using GPUs; feel free to ignore if not using GPU.

The root CanSRMaPP module automatically detects whether CUDA is set up; cmbuilder and in particular cmsolver will configure themselves to use the GPU if available.

Installation

Anaconda environment

This tool depends on PyTorch and the easiest way to get a clean installation is via Anaconda

conda create -n cansrmapp python=3.11 -y
conda activate cansrmapp

# install pytorch
conda install pytorch torchvision -c pytorch

Building and installing cansrmapp package

git clone https://github.com/idekerlab/cansrmapp
cd cansrmapp
pip install -r requirements_dev.txt
make dist
pip install dist/cansrmapp*whl

Usage

Basic usage / code test

To fit CanSRMaPP models, two scripts are provided in demo/; the simplest invocation is .. code-block:

cd demo
./build.sh
./test-solve.sh

build.sh creates the CanSRMaPP input matrices; test-solve.sh solves them. In the interest of low runtime and debugging, some parameters in test-solve.sh have been set such that they may not converge on optimal solutions; those in full-solve.sh are set to produce an optimal solution.

Note
Anecdotally, you can expect a single cycle of cmsolver to take about 1 minute on a GPU and up to 20 minutes when parallelized over multiple CPUs. Parallelization largely takes place from backends handled by numpy, scipy, and pytorch, so if you wish to limit parallelization, follow their advice for setting environment variables.

Redistributed data sources

CanSRMaPP relies on a number of third-party files for reference and reconciling multiple data sources. This document describes the provenance of all such files, and hosts frozen copies since some may be updated in-place by the maintainers.

NCBI Files

Gene Info

Homo_sapiens.gene_info was downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz on November 3, 2024. This file is unrestricted as described here

Genbank Flat File

GCF_000001405.40_GRCh38.p14_genomic.gff.gz was downloaded from this FTP directory on November 12, 2024. This file is unrestricted as described according to these terms The reduced file gff_reduced.gff.gz derived from this one is the result of running the command

gunzip -c GCF_000001405.40_GRCh38.p14_genomic.gff.gz | awk -F'     ' '$0 !~ /^#/ && $3 == "gene" && $9 ~/GeneID/ ' | gzip -c > gff_reduced.gff.gz

NeSTv0

"NeSTv0" is a precursor of the interaction map found in Zheng, Kelly, et al., 2021, prior to filtering for mutation-enriched systems. It is distributed here as nest.pickle with permission from the authors, and is subject to the license governing this repository. The file contains a dict object mapping each system to a set of member gene Entrez IDs. Because systems in this file are named Clusterx-y, an additional file, NeST_map_1.5_default_node_Nov20.csv, is incorporated to map these to their NEST IDs as published.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published