- Empirical Bayes estimators for single-cell RNA-Seq analysis, accompanying the paper "Determining sequencing depth in a single-cell RNA-seq experiment". Previous biorxiv version: "One read per cell per gene is optimal for single-cell RNA-Seq"
- Installation: pip install sceb
- See ./examples/example_pbmc_4k.ipynb for an example for estimating the Pearson correlation.
The code that reproduces all figures in the paper.
Fig. 1b-c, Fig. 2b top, Supp. Figs. 1-3:
Fig. 2a: The simulation is done using
and the figures are generated usingfigure_tradeoff_curve.ipynb
Supp. Fig. 4: The simulation is done using
called bycall_tradeoff_simu.sh
. The figures are generated usingfigure_tradeoff_simu.ipynb
Fig. 2b: Supp. Figs. 5-6: The simulations are done using
(Supp. Fig. 5) andtradeoff_posthoc_guide_brain.py
(Supp. Fig. 6). The figures are generated usingfigure_tradeoff_posthoc.ipynb
Fig. 3a top, Supp. Fig. 7:
Fig. 3a middle, Supp. Figs. 8-9:
Fig. 3a bottom, Supp. Fig. 10:
Fig. 3b:
Fig. 4a:
Fig. 4b-c: The network data is generated using
and analyzed using Gephi (an external software). The examples are generated usingfigure_network_example.ipynb
Fig. 5a-b, Supp. Fig. 13, 14: Comparison between Dropseq data and smFISH data (Figs. 5a-b, Supp. Fig. 14) was done in
. Comparison between CEL-seq data and smFISH data (Supp. Fig. 13) was done infigure_CELseq_smfish.ipynb
. -
Supp. Figs. 15-17:
Supp. Fig. 18:
for curating the data from "Ding et al. 2019" andfigure_sensitivity_analysis.ipynb
for the analysis
The figures that appeared in the paper as well as the simulated data to generate them.
The datasets are publicly available and their local path are specified inside ./sceb/data_loader. See ./examples/PC_estimation_pbmc_4k.ipynb for an example of defining a data loader function.
The datasets that we use are from 10x genomics v2 chemistry "Zheng et al. 2017". pbmc_4k, pbmc_8k contain peripheral blood mononuclear cells (PBMCs) from a healthy donor (the same donor). brain_1k, brain_2k, brain_9k, brain_1.3m contain cells from a combined cortex, hippocampus and sub ventricular zone of an E18 mouse. The pair of 293T_1k/3T3_1k contain 1:1 mixture of fresh frozen human (HEK293T) and mouse (NIH3T3) cells. So are the pair 293T_6k/3T3_6k, and the pair 293T_12k/3T3_12k. The data were downloaded from the 10x website and the links are provided as beblow. We used the filtered gene/cell matrix Gene / cell matrix (filtered)
from the links.
- pbmc_4k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k
- pbmc_8k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k
- brain_1k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neurons_900
- brain_2k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neurons_2000
- brain_9k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neuron_9k
- brain_1.3m: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
- 293T_1k, 3T3_1k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_1k
- 293T_6k, 3T3_6k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_6k
- 293T_12k, 3T3_12k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_12k
- Dropseq data and the corresponding smFISH data: from "Wang et al. 2018"
- CEL-seq data and the corresonding smFISH data: the CEL-seq data can be found from "Grün et al. 2014". The smFISH can be found by contacting the author of the paper (e.g., Dr. Grün).
- The three ERCC datasets (Zheng, Klein, Svensson): from "Wang et al. 2018"
- The Klein dataset with the pure RNA controls: from "Svensson et al. 2017"
- The data for sensitivity analysis: from "Ding et al. 2019"