This repository contains a collection of notebooks, scripts and config files used to reproduce the analyses, results and figures in the Piol, et al. manuscript.
To set up the necessary environment and install all required packages to
reproduce the analyses found in this repository, we recommend using Conda, and
installing the packages with their specified versions from the .yaml
files
found in ./config/yamls/
, outlined in the steps below:
-
Ensure you have Conda installed. You can download it from the official Conda website.
-
Clone the repository to your local machine:
git clone https://github.com/sifrimlab/Piol_motor_neuron/ cd Piol_motor_neuron
-
Install the necessary packages using the provided Conda environment file, (example R environment shown here):
conda env install -f ./configs/yamls/R_env.yaml
-
Activate the newly created environment:
conda activate R_env
This will set up the environment with all the dependencies required for the project. The time to install the Conda dependencies can vary, depending on your machine. If it takes too long, we suggest using Mamba to install these dependencies.
There are three .yaml
files detailing the packages for different three
environments used for the Piol, et al. analysis. We recommend creating three
separate Conda environments (in the manner as outlined above) using the following
.yaml
files:
- 10X Visium software Python environment
.yaml
file describing dependencies to reproduce the 10X Visium portion of this study - Nanostring and Seurat software R environment
.yaml
file describing dependencies to reproduce the Nanostring preprocessing, DESeq2 DEG analysis, GSEA, GEO data Seurat DEG analysis and correlations portions of this study - Image processing Python environment
.yaml
file describing dependencies to reproduce the imaging processing protion of this study
Note: if you encounter issues installing all of the R dependencies via Conda for
the R environment, the necessary R packages can also be installed in an R
environment using the install_packages.R
script found in ./config/R_env/
.
This section contains code to reproduce the results for 10X Visium untargeted spatial transcriptomics from the the Piol, et al. manuscript. The raw data used for this analysis can be found on GEO GSE269377.
All code to reproduce this portion of the study is found in the 10XVisium_SC_marker_gene_enrich.ipynb
Jupyter notebook
This section contains code to reproduce the count matrix, results and downstream GSEA for Nanostring GeoMx, as well as select pre-rendered reports displaying code and results. The raw data used for this analysis can be found on GEO GSE269707
The scripts for this workflow should be run in the following order:
NanoString_Exploratory_Analysis_Final.R
convert.dcc
and.pcc
files into long count matrix (optional)NS_make_raw_counts.R
to re-generate metadata and processed count matrix files. These files are also found in./data
(optional)
Note: you may need to change the input file path, based on the file architecture of your local machine:
Then run the DESeq2 DEG analyses workbooks:
NS_sciatic_nerve_Chatpos_ctrl_v_Chatneg_ctrl_condition_model.Rmd
NS_sciatic_nerve_Chatpos_ctrl_v_Chatneg_ctrl_segment_model.Rmd
NS_sciatic_nerve_Chatpos_ctrl_v_Chatpos_FUS.Rmd
NS_spinal_cord_Chatpos_ctrl_v_Chatneg_ctrl.Rmd
To perform GSEA on the DEG lists derived from the DESEq2 analyses listed above, run:
5. NS_FGSEA_on_all_DE_res.R
Assuming that you have already installed the R environment dependencies for these scripts, they can be run in the command line as follows, or ran in RStudio:
- Rscripts, e.g.:
Rscript NS_make_raw_counts.R
- Rmarkdown workbooks, e.g.:
Rscript -e "rmarkdown::render('NS_spinal_cord_Chatpos_ctrl_v_Chatneg_ctrl.Rmd')"
Note: these scripts should only take around ~10 minutes to run in total
This section contains R scripts to perform correlations between publicly available motor neuron datasets from GEO and data from the Piol, et al. manuscript. The raw data for these datasets can be downloaded from GEO from the following links:
- Gautier, et al. 2023 - GSE228778
- Yadav, et al. 2023 - GSE190442
- Alkaslasi, et al. 2021 - GSE167597
- Blum, et al. 2021 - GSE161621
Note:
- author annotated versions of the datasets listed above that were used in the Piol, et al. study can also be downloaded from Spinal Cord Atlas, or by contacting the respective authors of these studies.
- you may need to change the input file path, based on the file architecture of your local machine
- Seurat objects of these datasets can be made available upon request
- the Seurat scripts will take a variable time to run. It's assumed that 72 cores are used (change based on your compute resources)
To reproduce the data from this section, download the raw data outlined above and then run the scripts found in the ./DEG_correlations
folder, in the following order:
GSE161621_Blum_et_al_DE.R
performs Seurat DE between Chat+ vs Chat - nuclei in the data of Blum, et al. 2021GSE167597_Alkaslasi_et_al_DE.R
performs Seurat DE between Chat+ vs Chat - nuclei in the data of Alkaslasi, et al. 2021GSE190442_Yadav_et_al_DE.R
performs Seurat DE between Chat+ vs Chat - nuclei in the data of Yadav, et al. 2021GSE228778_Gautier_et_al_DE.R
performs Seurat DE between Chat+ vs Chat - nuclei in the data of Gautier, et al. 2023NS_vs_GEO_DEG_list_corr.Rmd
this script takes the DEG lists from the four scripts listed above and correlates the LFC values between matched DEGs derived from the Nanostring comparisons
This section contains raw data files for Nanostring, as well processed DEG gene lists derived from Seurat analysis of the publicly available motor neuron datasets from GEO
This section contains select visualizations that appear in the Piol, et al. manuscript
For any questions or additional information, please contact:
- Name: Theo Killian
- Email: [email protected]
- Affiliation: Da Cruz/Sifrim Lab Bioinformatics