You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #354 general support for spatial AIRR data is discussed. The purpose of this issue to make a plan what would be required for supporting spot-based spatial AIRR data. Since we don't have single-cell resolution, and therefore no receptor pairing, this is in many ways similar to bulk AIRR data.
IO and Data Structure
reading AIRR-compilant bulk data already works. The cell_id column would need to be abused as a spot identifier.
Maybe support for widely used formats (is there any off-the-shelf spatial AIRR assay yet?)
The AwkwardArray in adata.obsm["airr"] then looks like
pp.index_chains needs to be adapted, but this should be straightforward. Instead of selecting only two pairs of chains per cell and flagging cells as multi-chain, simply create lists of VJ and VDJ chains.
tl.chain_qc doesn't make any sense for unpaired data. Chains with missing information still need to be removed. Maybe some additional QC metrics could be useful, such as the correlation of receptor chains with T cell fractions from deconvolution.
Clonotype definition
pp.ir_dist() should be straightforward to adapt. It simply needs to take into account all sequences instead of only primary/secondary chain.
tl.define_clonotypes/tl.define_clonotype_clusters would require substantial work. There's two ways how we could imagine "clonotype clusters" in spatial.
(1) We still assign clonotype labels to individual receptor chains. Then each spot would have multiple clonotype labels, where each one has a count. This could be represented as a sparse count matrix, potentially as a MuData layer. Identifying the clonotypes would work very similarly to the current single-cell implementation, but simpler, because we don't have to deal with chain pairing and dual TCRs.
(2) We consider sequence-based distances between spots, defining the AIRR analogy of "niches". It warrants additional discussion what metrics could make sense here, but simple metrics could be
* at least one chain matches between spots (where match means distance(chain1, chain2) < threshold)
* sum of distances between all chains of spots < threshold
* at least one chain of each type (VJ/VDJ) matches between spots
Probably we'd want both. The resulting clonotype labels can be easily visualized on the spatial image and can be used as a variable for spatial algorithms.
Clonotype networks
For (1) described above, the current visualization of clonotype networks could still work well.
For (2), we have a spot x spot distance matrix that we can use to make network plots. I am not sure if the single-cell network plots with separate clonotype components still make sense. Other visualizations like UMAP etc could be explored.
Clonal expansion
tl.clonal_expansion should work, however it would consider if a clonotype occurs in multiple spots, not if the same chain occurs multiple times within the same spot. Maybe different definitions of "expanded" could be explored.
Query reference databases
In the single-cell case, this was just a wrapper around tl.define_clonotypes. Here, it is slightly different, because we want to query a single-cell dataset using bulk/spot data. Each spot would usually get multiple labels because there can be multiple T/B cells in each spot.
Still, the logic is similar as in define_clonotypes: Based on some distance metric, we find all entries in the reference database that have a match below a certain threshold.
In #354 general support for spatial AIRR data is discussed. The purpose of this issue to make a plan what would be required for supporting spot-based spatial AIRR data. Since we don't have single-cell resolution, and therefore no receptor pairing, this is in many ways similar to bulk AIRR data.
IO and Data Structure
cell_id
column would need to be abused as a spot identifier.The AwkwardArray in
adata.obsm["airr"]
then looks likeChain indices
pp.index_chains
needs to be adapted, but this should be straightforward. Instead of selecting only two pairs of chains per cell and flagging cells as multi-chain, simply create lists ofVJ
andVDJ
chains.adata.obsm["chain_indices"]
would then look likeQuality control
tl.chain_qc
doesn't make any sense for unpaired data. Chains with missing information still need to be removed. Maybe some additional QC metrics could be useful, such as the correlation of receptor chains with T cell fractions from deconvolution.Clonotype definition
pp.ir_dist()
should be straightforward to adapt. It simply needs to take into account all sequences instead of only primary/secondary chain.tl.define_clonotypes
/tl.define_clonotype_clusters
would require substantial work. There's two ways how we could imagine "clonotype clusters" in spatial.(1) We still assign clonotype labels to individual receptor chains. Then each spot would have multiple clonotype labels, where each one has a count. This could be represented as a sparse count matrix, potentially as a MuData layer. Identifying the clonotypes would work very similarly to the current single-cell implementation, but simpler, because we don't have to deal with chain pairing and dual TCRs.
(2) We consider sequence-based distances between spots, defining the AIRR analogy of "niches". It warrants additional discussion what metrics could make sense here, but simple metrics could be
* at least one chain matches between spots (where match means
distance(chain1, chain2) < threshold
)* sum of distances between all chains of spots < threshold
* at least one chain of each type (VJ/VDJ) matches between spots
Probably we'd want both. The resulting clonotype labels can be easily visualized on the spatial image and can be used as a variable for spatial algorithms.
Clonotype networks
For (1) described above, the current visualization of clonotype networks could still work well.
For (2), we have a spot x spot distance matrix that we can use to make network plots. I am not sure if the single-cell network plots with separate clonotype components still make sense. Other visualizations like UMAP etc could be explored.
Clonal expansion
tl.clonal_expansion
should work, however it would consider if a clonotype occurs in multiple spots, not if the same chain occurs multiple times within the same spot. Maybe different definitions of "expanded" could be explored.Query reference databases
In the single-cell case, this was just a wrapper around
tl.define_clonotypes
. Here, it is slightly different, because we want to query a single-cell dataset using bulk/spot data. Each spot would usually get multiple labels because there can be multiple T/B cells in each spot.Still, the logic is similar as in
define_clonotypes
: Based on some distance metric, we find all entries in the reference database that have a match below a certain threshold.Diversity metrics
tbd
Gene usage
tbd
Comparing repertoires
tbd
Comparing with transcriptomics data
Clonotype modularity etc. -> tbd
CC @FFinotello @felixpetschko
The text was updated successfully, but these errors were encountered: