Skip to content

Commit

Permalink
Merge pull request #847 from UTSouthwesternDSSR/UTSouthwesternDSSR/jwl
Browse files Browse the repository at this point in the history
Submission table for cell type and tumor classification of ETP T-ALL (SCPCP000003)
  • Loading branch information
jaclyn-taroni authored Nov 5, 2024
2 parents 4974a67 + b6244b8 commit 235d5f0
Show file tree
Hide file tree
Showing 139 changed files with 842 additions and 57 deletions.
15 changes: 14 additions & 1 deletion .github/workflows/run_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,19 @@ jobs:
run: |
cd ${MODULE_PATH}
# run module script(s) here
printf "\n\nRunning 00-01_processing_rds.R\n"
Rscript scripts/00-01_processing_rds.R
printf "\n\nRunning 02-03_annotation.R\n"
Rscript scripts/02-03_annotation.R
Rscript scripts/multipanel_plot.R
printf "\n\nRunning 04_multipanel_plot.R\n"
Rscript scripts/04_multipanel_plot.R
printf "\n\nRunning 05_cluster_evaluation.R\n"
Rscript scripts/05_cluster_evaluation.R
printf "\n\nRunning 06_sctype_exploration.R\n"
Rscript scripts/06_sctype_exploration.R
printf "\n\nRunning 07_run_copykat.R\n"
Rscript scripts/07_run_copykat.R
printf "\n\nRunning markerGenes_submission.R\n"
Rscript scripts/markerGenes_submission.R
printf "\n\nRunning writeout_submission.R\n"
Rscript scripts/writeout_submission.R
27 changes: 14 additions & 13 deletions analyses/cell-type-ETP-ALL-03/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,29 @@
# ETP T-ALL Annotation (SCPCP000003)

This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=30) present on the ScPCA portal.
This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=31) present on the ScPCA portal.

## Description

We first aim to annotate the cell types in ETP T-ALL, and use the annotated B cells in the sample as the "normal" cells to identify tumor cells, since T-ALL is caused by the clonal proliferation of immature T-cell [<https://www.nature.com/articles/s41375-018-0127-8>].

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

\*\*Azimuth_BM_level1.csv is converted to submission_markerGenes.tsv, in the final submission format.

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.
- After cell type annotation, we fine-tune the annotated B cells by applying 99 percentile cutoff of non-B ScType score on the "B cell clusters". We then use the new B cells (i.e those cells which passed the cutoff) as the normal cells in running [CopyKat](https://github.com/navinlabcode/copykat), for the identification of tumor cells.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

3. Fine-tuning the B cells (`scripts/06_sctype_exploration.R`)

4. Re-running CopyKat (`scripts/07_run_copykat.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:
Expand All @@ -27,6 +33,7 @@ Before running Rscripts in R or Rstudio, we first need to prepare the input file
sudo apt install libglpk40
sudo apt install libcurl4-openssl-dev #for Seurat
sudo apt-get install libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev libtiff5-dev #for devtools
sudo apt-get install r-cran-rjags #for InferCNV, if wish to run
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
Rscript -e "renv::restore()"
Expand All @@ -44,21 +51,15 @@ The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`
## Important output files

Running `scripts/02-03_annotation.R` will generate several outputs:
- `rds` objects in `results/rds`

- updated `rds` objects in `scratch/`
- ScType results of top 10 possible cell types in a cluster (`results/_sctype_top10_celltypes_perCluster.txt`) and ScType score (`results/_sctype_scores.txt`)

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`
- location of fine-tuned B cells in umap (`plots/sctype_exploration/_newBcells.png`) and the cell type assignment with added fine-tuned B cells (`results/_newB-normal-annotation.txt`)

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`
- final submission table (`results/submission_table/_metadata.tsv`) and the umap plots showing cell_type_assignment from ScType and tumor_cell_classification from CopyKat using fine-tuned B cells (`results/submission_table/multipanels_.png`)

## Software requirements

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
204 changes: 204 additions & 0 deletions analyses/cell-type-ETP-ALL-03/renv.lock
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,41 @@
],
"Hash": "3aec5928ca10897d7a0a1205aae64627"
},
"BiocNeighbors": {
"Package": "BiocNeighbors",
"Version": "1.22.0",
"Source": "Bioconductor",
"Repository": "Bioconductor 3.19",
"Requirements": [
"BiocParallel",
"Matrix",
"Rcpp",
"RcppHNSW",
"S4Vectors",
"methods",
"stats"
],
"Hash": "da9f332c88453734623406dcca13ee03"
},
"BiocParallel": {
"Package": "BiocParallel",
"Version": "1.38.0",
"Source": "Bioconductor",
"Repository": "Bioconductor 3.19",
"Requirements": [
"BH",
"R",
"codetools",
"cpp11",
"futile.logger",
"methods",
"parallel",
"snow",
"stats",
"utils"
],
"Hash": "7b6e79f86e3d1c23f62c5e2052e848d4"
},
"BiocVersion": {
"Package": "BiocVersion",
"Version": "3.19.1",
Expand Down Expand Up @@ -717,6 +752,25 @@
"Repository": "CRAN",
"Hash": "da69e6b6f8feebec0827205aad3fdbd8"
},
"bluster": {
"Package": "bluster",
"Version": "1.14.0",
"Source": "Bioconductor",
"Repository": "Bioconductor 3.19",
"Requirements": [
"BiocNeighbors",
"BiocParallel",
"Matrix",
"Rcpp",
"S4Vectors",
"cluster",
"igraph",
"methods",
"stats",
"utils"
],
"Hash": "ed9597168d850071aa9abbbef7be7204"
},
"bslib": {
"Package": "bslib",
"Version": "0.8.0",
Expand Down Expand Up @@ -1083,6 +1137,32 @@
],
"Hash": "c2efdd5f0bcd1ea861c2d4e2a883a67d"
},
"forcats": {
"Package": "forcats",
"Version": "1.0.0",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"cli",
"glue",
"lifecycle",
"magrittr",
"rlang",
"tibble"
],
"Hash": "1a0a9a3d5083d0d573c4214576f1e690"
},
"formatR": {
"Package": "formatR",
"Version": "1.14",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "63cb26d12517c7863f5abb006c5e0f25"
},
"fs": {
"Package": "fs",
"Version": "1.6.4",
Expand All @@ -1094,6 +1174,29 @@
],
"Hash": "15aeb8c27f5ea5161f9f6a641fafd93a"
},
"futile.logger": {
"Package": "futile.logger",
"Version": "1.4.3",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"futile.options",
"lambda.r",
"utils"
],
"Hash": "99f0ace8c05ec7d3683d27083c4f1e7e"
},
"futile.options": {
"Package": "futile.options",
"Version": "1.0.1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "0d9bf02413ddc2bbe8da9ce369dcdd2b"
},
"future": {
"Package": "future",
"Version": "1.34.0",
Expand Down Expand Up @@ -1134,6 +1237,21 @@
],
"Hash": "15e9634c0fcd294799e9b2e929ed1b86"
},
"geometry": {
"Package": "geometry",
"Version": "0.5.0",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"Rcpp",
"RcppProgress",
"linprog",
"lpSolve",
"magic"
],
"Hash": "b052bd270aeddeca332c20feecfb039d"
},
"ggplot2": {
"Package": "ggplot2",
"Version": "3.5.1",
Expand Down Expand Up @@ -1475,6 +1593,17 @@
],
"Hash": "b64ec208ac5bc1852b285f665d6368b3"
},
"lambda.r": {
"Package": "lambda.r",
"Version": "1.2.4",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"formatR"
],
"Hash": "b1e925c4b9ffeb901bacf812cbe9a6ad"
},
"later": {
"Package": "later",
"Version": "1.3.2",
Expand Down Expand Up @@ -1537,6 +1666,17 @@
],
"Hash": "b8552d117e1b808b09a832f589b79035"
},
"linprog": {
"Package": "linprog",
"Version": "0.9-4",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"lpSolve"
],
"Hash": "66e9d4ebd71ddcd6f86a2a9a34f5cdc5"
},
"listenv": {
"Package": "listenv",
"Version": "0.9.1",
Expand All @@ -1560,6 +1700,24 @@
],
"Hash": "c6fafa6cccb1e1dfe7f7d122efd6e6a7"
},
"lpSolve": {
"Package": "lpSolve",
"Version": "5.6.21",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "730a90bdc519fb0caff03df11218ddd8"
},
"magic": {
"Package": "magic",
"Version": "1.6-1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"abind"
],
"Hash": "1da6217cea8a3ef496258819b80770e1"
},
"magrittr": {
"Package": "magrittr",
"Version": "2.0.3",
Expand Down Expand Up @@ -1748,6 +1906,17 @@
],
"Hash": "68a2d681e10cf72f0afa1d84d45380e5"
},
"pdfCluster": {
"Package": "pdfCluster",
"Version": "1.0-4",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"geometry",
"methods"
],
"Hash": "51e3a7a4af0b863e5d380575cbd33cda"
},
"pillar": {
"Package": "pillar",
"Version": "1.9.0",
Expand Down Expand Up @@ -1899,6 +2068,30 @@
],
"Hash": "017561f17632c065388b7062da030952"
},
"rOpenScPCA": {
"Package": "rOpenScPCA",
"Version": "0.1.0",
"Source": "GitHub",
"RemoteType": "github",
"RemoteHost": "api.github.com",
"RemoteUsername": "AlexsLemonade",
"RemoteRepo": "OpenScPCA-analysis",
"RemoteSubdir": "packages/rOpenScPCA",
"RemoteRef": "main",
"RemoteSha": "d446cf35158d53e500e8bcacb08d9f2de4688b5a",
"Requirements": [
"BiocParallel",
"SingleCellExperiment",
"bluster",
"dplyr",
"methods",
"pdfCluster",
"purrr",
"tibble",
"tidyr"
],
"Hash": "5c214b8e7ab3d7fd01fa32daeb51c5f8"
},
"rappdirs": {
"Package": "rappdirs",
"Version": "0.3.3",
Expand Down Expand Up @@ -2165,6 +2358,17 @@
],
"Hash": "c956d93f6768a9789edbc13072b70c78"
},
"snow": {
"Package": "snow",
"Version": "0.4-4",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"utils"
],
"Hash": "40b74690debd20c57d93d8c246b305d4"
},
"sourcetools": {
"Package": "sourcetools",
"Version": "0.1.7-1",
Expand Down
16 changes: 16 additions & 0 deletions analyses/cell-type-ETP-ALL-03/results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,20 @@ These are the generated outputs for each sample in the S3 bucket:

- metadata and ScType results: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/`

- CopyKat results: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/copykat_output`

- InferCNV results: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/infercnv_output`

- evaluating cluster separation, stability, and purity: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/evalClus`

- umap and dot plots: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots`

- violin and stacked bar plots for exploring the results of CopyKat prediction: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots/copykat_exploration`

- ridge plots showing the ScType score for each cell type in annotated B cells from ScType, SingleR, and CellAssign, as well as the scatter plots showing the relationship between B cell ScType score and cluster purity of these cells: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots/sctype_exploration`

- ridge plots showing the ScType score for each cell type in fine-tuned B cells and feature plots showing the distribution of B cell ScType score: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots/sctype_exploration`

- final submission `tsv` files and `png` for cell type and/or tumor cell classification: `s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/submission_table`

\*\*All the plots are also found in the repository plots/.
Loading

0 comments on commit 235d5f0

Please sign in to comment.