Skip to content

Commit

Permalink
boxes for cellular-structure + new (unique) anchor logic
Browse files Browse the repository at this point in the history
  • Loading branch information
Luis committed Feb 19, 2025
1 parent 076e4f7 commit d59ff1f
Show file tree
Hide file tree
Showing 3 changed files with 100 additions and 70 deletions.
45 changes: 35 additions & 10 deletions jupyter-book/cellular_structure/annotation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,39 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"``````{admonition} How do I set up an environment with the yml file used in this chapter?\n",
":class: dropdown\n",
"# Annotation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{dropdown} <i class=\"fas fa-brain\"></i>&nbsp;&nbsp;&nbsp;Key takeaways\n",
"\n",
":::{card}\n",
":link: cellular-structure-annotation-key-takeaway-1\n",
":link-type: ref\n",
"Cell annotation is the process of labeling groups of cells based on known or unknown cellular phenotypes, which is crucial for understanding the cellular composition of your data.\n",
":::\n",
"\n",
":::{card}\n",
":link: cellular-structure-annotation-key-takeaway-2\n",
":link-type: ref\n",
"Manual annotation relies on known marker genes and clustering, but it can be subjective and labor-intensive.\n",
":::\n",
"\n",
":::{card}\n",
":link: cellular-structure-annotation-key-takeaway-3\n",
":link-type: ref\n",
"Automated annotation methods, such as CellTypist and scArches, offer faster and more scalable alternatives, but their accuracy depends on the quality of the reference data and the similarity between the reference and the query dataset.\n",
":::\n",
"\n",
"\n",
"\n",
"\n",
"```\n",
"\n",
"``````{dropdown} <i class=\"fa-solid fa-gear\"></i>&nbsp;&nbsp;&nbsp;Environment setup\n",
"`````{tab-set}\n",
" \n",
"````{tab-item} Steps\n",
Expand All @@ -21,21 +51,14 @@
"````\n",
"\n",
"`````\n",
"\n",
"``````"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Annotation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(cellular-structure-annotation-key-takeaway-1)=\n",
"## Motivation"
]
},
Expand Down Expand Up @@ -191,6 +214,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(cellular-structure-annotation-key-takeaway-2)=\n",
"## Manual annotation"
]
},
Expand Down Expand Up @@ -1140,6 +1164,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(cellular-structure-annotation-key-takeaway-3)=\n",
"## Automated annotation"
]
},
Expand Down
58 changes: 28 additions & 30 deletions jupyter-book/cellular_structure/clustering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,35 @@
"cells": [
{
"cell_type": "markdown",
"id": "ba138202",
"id": "cdfeeaf0-cf82-40ba-bf84-b82620fb700f",
"metadata": {},
"source": [
"(cellular-structure:clustering)=\n",
"# Clustering"
]
},
{
"cell_type": "markdown",
"id": "46547ceb",
"metadata": {},
"source": [
"``````{admonition} How do I set up an environment with the yml file used in this chapter?\n",
":class: dropdown\n",
"```{dropdown} <i class=\"fas fa-brain\"></i>&nbsp;&nbsp;&nbsp;Key takeaways\n",
"\n",
":::{card}\n",
":link: cellular-structure-clustering-key-takeaway-1\n",
":link-type: ref\n",
"Use Leiden community detection on a single-cell KNN graph.\n",
":::\n",
"\n",
":::{card}\n",
":link: cellular-structure-clustering-key-takeaway-2\n",
":link-type: ref\n",
"Sub-clustering with different resolution parameters allows the user to focus on more detailed substructures in the dataset to potentially identify finer cell states. \n",
":::\n",
"\n",
"```\n",
"\n",
"``````{dropdown} <i class=\"fa-solid fa-gear\"></i>&nbsp;&nbsp;&nbsp;Environment setup\n",
"`````{tab-set}\n",
" \n",
"````{tab-item} Steps\n",
Expand All @@ -22,24 +45,16 @@
"````\n",
"\n",
"`````\n",
"\n",
"``````"
]
},
{
"cell_type": "markdown",
"id": "cdfeeaf0-cf82-40ba-bf84-b82620fb700f",
"metadata": {},
"source": [
"(cellular-structure:clustering)=\n",
"# Clustering"
]
},
{
"cell_type": "markdown",
"id": "974344a0",
"metadata": {},
"source": [
"(cellular-structure-clustering-key-takeaway-1)=\n",
"(cellular-structure-clustering-key-takeaway-2)=\n",
"## Motivation"
]
},
Expand Down Expand Up @@ -233,23 +248,6 @@
"We would like to highlight again that distances between the displayed clusters must be interpreted with caution. As the UMAP embedding is in 2D, distances are not necessarily captured well between all points. We recommend to not interpret distances between clusters visualized on UMAP embeddings."
]
},
{
"cell_type": "markdown",
"id": "d398c113",
"metadata": {},
"source": [
"## Key takeaways"
]
},
{
"cell_type": "markdown",
"id": "6f972720",
"metadata": {},
"source": [
"1. Use Leiden community detection on a single-cell KNN graph.\n",
"2. Sub-clustering with different resolution parameters allows the user to focus on more detailed substructures in the dataset to potentially identify finer cell states. "
]
},
{
"cell_type": "markdown",
"id": "5382a907",
Expand Down
67 changes: 37 additions & 30 deletions jupyter-book/cellular_structure/integration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,43 @@
"cells": [
{
"cell_type": "markdown",
"id": "92574ffb",
"id": "4268c662-26d7-403e-8617-f1d69dfe3882",
"metadata": {},
"source": [
"``````{admonition} How do I set up an environment with the yml file used in this chapter?\n",
":class: dropdown\n",
"(cellular-structure:data-integration)=\n",
"# Data integration"
]
},
{
"cell_type": "markdown",
"id": "d144a53c",
"metadata": {},
"source": [
"```{dropdown} <i class=\"fas fa-brain\"></i>&nbsp;&nbsp;&nbsp;Key takeaways\n",
"\n",
"\n",
":::{card}\n",
":link: cellular-structure-integration-key-takeaway-1\n",
":link-type: ref\n",
"Visualize your data before attempting to correct for batch effects to assess the extent of the issue.\n",
"Batch effect correction is not always required and it might mask the biological variation of interest.\n",
":::\n",
"\n",
":::{card}\n",
":link: cellular-structure-integration-key-takeaway-2\n",
":link-type: ref\n",
"If cell labels are available and biological variation is the most important, the usage of methods that can use these labels (such as scANVI) is advised.\n",
":::\n",
"\n",
":::{card}\n",
":link: cellular-structure-integration-key-takeaway-3\n",
":link-type: ref\n",
"Consider running several integration methods on your dataset and evaluating them with the **scIB** metrics to use the integration that is most robust for your use case.\n",
":::\n",
"\n",
"```\n",
"\n",
"``````{dropdown} <i class=\"fa-solid fa-gear\"></i>&nbsp;&nbsp;&nbsp;Environment setup\n",
"`````{tab-set}\n",
" \n",
"````{tab-item} Steps\n",
Expand All @@ -22,23 +53,15 @@
"````\n",
"\n",
"`````\n",
"\n",
"``````"
]
},
{
"cell_type": "markdown",
"id": "4268c662-26d7-403e-8617-f1d69dfe3882",
"metadata": {},
"source": [
"# Data integration"
]
},
{
"cell_type": "markdown",
"id": "589a63c7",
"metadata": {},
"source": [
"(cellular-structure-integration-key-takeaway-1)=\n",
"## Motivation\n",
"\n",
"A central challenge in most scRNA-seq data analyses is presented by batch effects. Batch effects are changes in measured expression levels that are the result of handling cells in distinct groups or “batches”. For example, a batch effect can arise if two labs have taken samples from the same cohort, but these samples are dissociated differently. If Lab A optimizes its dissociation protocol to dissociate cells in the sample while minimizing the stress on them, and Lab B does not, then it is likely that the cells in the data from the group B will express more stress-linked genes (JUN, JUNB, FOS, etc. see {cite}`Van_den_Brink2017-si`) even if the cells had the same profile in the original tissue. In general, the origins of batch effects are diverse and difficult to pin down. Some batch effect sources might be technical such as differences in sample handling, experimental protocols, or sequencing depths, but biological effects such as donor variation, tissue, or sampling location are also often interpreted as a batch effect {cite}`Luecken2021-jo`. Whether or not biological factors should be considered batch effects can depend on the experimental design and the question being asked. Removing batch effects is crucial to enable joint analysis that can focus on finding common structure in the data across batches and enable us to perform queries across datasets. Often it is only after removing these effects that rare cell populations can be identified that were previously obscured by differences between batches. Enabling queries across datasets allows us to ask questions that could not be answered by analysing individual datasets, such as _Which cell types express SARS-CoV-2 entry factors and how does this expression differ between individuals?_ {cite}`Muus2021-ti`."
Expand All @@ -57,6 +80,7 @@
"id": "f3b3e2e6",
"metadata": {},
"source": [
"(cellular-structure-integration-key-takeaway-2)=\n",
"### Types of integration models\n",
"\n",
"Methods that remove batch effects in scRNA-seq are typically composed of (up to) three steps:\n",
Expand Down Expand Up @@ -105,6 +129,7 @@
"id": "903e7bfe",
"metadata": {},
"source": [
"(cellular-structure-integration-key-takeaway-3)=\n",
"### Comparisons of data integration methods\n",
"\n",
"Several benchmarks have previously evaluated the performance of methods for batch correction and data integration. When removing batch effects, methods may overcorrect and remove meaningful biological variation in addition to the batch effect. For this reason, it is important that integration performance is evaluated by considering both batch effect removal and the conservation of biological variation.\n",
Expand Down Expand Up @@ -3801,24 +3826,6 @@
"Existing benchmarks have suggested methods that generally perform well, but performance can also be quite variable across scenarios. For some analyses, it may be worthwhile performing your own evaluation of integration. The **scib** package makes this process easier, but it can still be a significant undertaking, relying on a good knowledge of the ground truth and interpretation of the metrics."
]
},
{
"cell_type": "markdown",
"id": "bd2e36f5",
"metadata": {},
"source": [
"## Key Takeaways"
]
},
{
"cell_type": "markdown",
"id": "b3b8b501",
"metadata": {},
"source": [
"1. Visualize your data before attempting to correct for batch effects to assess the extent of the issue. Batch effect correction is not always required and it might mask the biological variation of interest.\n",
"2. If cell labels are available and biological variation is the most important, the usage of methods that can use these labels (such as scANVI) is advised.\n",
"3. Consider running several integration methods on your dataset and evaluating them with the **scIB** metrics to use the integration that is most robust for your use case."
]
},
{
"cell_type": "markdown",
"id": "b7834cf1",
Expand Down

0 comments on commit d59ff1f

Please sign in to comment.