The HCA provides a reference atlas to human cell types, states, and
the biological processes in which they engage. The utility of the
reference therefore requires that one can easily compare references to
each other, or a new sample to the compendium of reference
samples. Because they compress the space, low-dimensional representations
provide the building blocks for search approaches that can be
practically applied across very large datasets such as the HCA.
Our seed network proposes to compress HCA data
into fewer dimensions that preserve the important attributes of the
original high dimensional data and yield interpretable, searchable
features.
We hypothesize that using latent space methods to identify low
dimensional representations of HCA data will accurately capture biological
sources of variability and will be robust to measurement noise.
We propose techniques that learn interpretable, biologically-aligned
representations, improve techniques for fast and
accurate quantification, and implement these base enabling
technologies and methods for search, analysis, and latent space
transformations as freely available, open source software tools.
By using and extending our base enabling technologies, we will provide
three principle tools and resources for the HCA:
- software to enable fast and accurate search and annotation using low-dimensional representations of cellular features,
- a versioned and annotated catalog of latent spaces corresponding to signatures of cell types, states, and biological attributes across the the HCA, and 3) short course and educational materials that will increase the use and impact of low-dimensional representations and the HCA in general.