GitHub

cansrmapp

CanSRMaPP is a modeling tool for identifying a minimal feature set describing the metagenome of a cancer cohort.

Free software: BSD license
Source code: https://github.com/idekerlab/cansrmapp

Dependencies

Pytorch 2.5+ with torchaudio, torchvision (tested on 2.5.0)0
tables
matplotlib
numpy
pandas
scikit-learn
scikit-image
scipy

Compatibility

Python 3.11+
CUDA 12.1 _only_ if using GPU

Note

CUDA is only required for implementations using GPUs; feel free to ignore if not using GPU.

The root CanSRMaPP module automatically detects whether CUDA is set up; cmbuilder and in particular cmsolver will configure themselves to use the GPU if available.

Installation

Anaconda environment

This tool depends on PyTorch and the easiest way to get a clean installation is via Anaconda

conda create -n cansrmapp python=3.11 -y
conda activate cansrmapp

# install pytorch
conda install pytorch torchvision -c pytorch

Building and installing cansrmapp package

git clone https://github.com/idekerlab/cansrmapp
cd cansrmapp
pip install -r requirements_dev.txt
make dist
pip install dist/cansrmapp*whl

Usage

Basic usage / code test

To fit CanSRMaPP models, two scripts are provided in demo/; the simplest invocation is .. code-block:

cd demo
./build.sh
./test-solve.sh

build.sh creates the CanSRMaPP input matrices; test-solve.sh solves them. In the interest of low runtime and debugging, some parameters in test-solve.sh have been set such that they may not converge on optimal solutions; those in full-solve.sh are set to produce an optimal solution.

Note: Anecdotally, you can expect a single cycle of cmsolver to take about 1 minute on a GPU and up to 20 minutes when parallelized over multiple CPUs. Parallelization largely takes place from backends handled by numpy, scipy, and pytorch, so if you wish to limit parallelization, follow their advice for setting environment variables.

Redistributed data sources

CanSRMaPP relies on a number of third-party files for reference and reconciling multiple data sources. This document describes the provenance of all such files, and hosts frozen copies since some may be updated in-place by the maintainers.

NCBI Files

Gene Info

Homo_sapiens.gene_info was downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz on November 3, 2024. This file is unrestricted as described here

Genbank Flat File

GCF_000001405.40_GRCh38.p14_genomic.gff.gz was downloaded from this FTP directory on November 12, 2024. This file is unrestricted as described according to these terms The reduced file gff_reduced.gff.gz derived from this one is the result of running the command

gunzip -c GCF_000001405.40_GRCh38.p14_genomic.gff.gz | awk -F'     ' '$0 !~ /^#/ && $3 == "gene" && $9 ~/GeneID/ ' | gzip -c > gff_reduced.gff.gz

NeSTv0

"NeSTv0" is a precursor of the interaction map found in Zheng, Kelly, et al., 2021, prior to filtering for mutation-enriched systems. It is distributed here as nest.pickle with permission from the authors, and is subject to the license governing this repository. The file contains a dict object mapping each system to a set of member gene Entrez IDs. Because systems in this file are named Clusterx-y, an additional file, NeST_map_1.5_default_node_Nov20.csv, is incorporated to map these to their NEST IDs as published.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
cansrmapp		cansrmapp
data		data
demo		demo
docker		docker
docs		docs
systems_maps		systems_maps
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cansrmapp

Dependencies

Compatibility

Installation

Anaconda environment

Usage

Basic usage / code test

Redistributed data sources

NCBI Files

Gene Info

Genbank Flat File

NeSTv0

Credits

About

Releases

Packages

Contributors 2

Languages

License

idekerlab/cansrmapp

Folders and files

Latest commit

History

Repository files navigation

cansrmapp

Dependencies

Compatibility

Installation

Anaconda environment

Usage

Basic usage / code test

Redistributed data sources

NCBI Files

Gene Info

Genbank Flat File

NeSTv0

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages