From fca4989c029a3f837ea105d655bc6d2327982642 Mon Sep 17 00:00:00 2001
From: Stephanie Hicks <stephaniechicks@gmail.com>
Date: Mon, 12 Nov 2018 20:41:13 -0500
Subject: [PATCH] adding changes to Aim 1

---
 content/04.body.md | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/content/04.body.md b/content/04.body.md
index ffb79b5..bcdbf28 100644
--- a/content/04.body.md
+++ b/content/04.body.md
@@ -34,7 +34,7 @@ non-negative matrix factorization method scCoGAPS [@doi:10.1101/378950,@doi:10.1
 (PIs Fertig & Goff). This technique learns biologically relevant features across contexts
 and data modalities [@doi:10.1186/1471-2164-13-160,@doi:10.18632/oncotarget.12075,@doi:10.1007/978-1-62703-721-1_6,@doi:10.1186/s13073-018-0545-2,@doi:10.1101/378950],
 including notably the HPN DREAM8 challenge [@doi:10.1038/nmeth.3773]. This technique is
-specifically selected as a base enabling technnology because its error distribution can
+specifically selected as a base enabling technology because its error distribution can
 naturally account for measurement-specific technical variation
 [@doi:10.1371/journal.pone.0078127] and its prior distributions for different feature
 quantifications or spatial information. For non-linear needs, neural networks with multiple
@@ -59,7 +59,7 @@ leading to systematic biases in gene expression estimates [@doi:10.1101/335000].
 this, we will build on our recently developed quantification method for tagged-end data that
 accounts for reads mapping to multiple genomic loci in a principled and consistent way
 [@doi:10.1101/335000] (PI Patro), and extend this into a production quality tool for
-scRNA-Seq preprocessing. Our tool will support: 1. Exploration of alternative models for
+scRNA-seq preprocessing. Our tool will support: 1. Exploration of alternative models for
 Unique Molecular Identifier (UMI) resolution. 2. Development of new approaches for quality
 control and filtering using the UMI-resolution graph. 3. Creation of a compressed and
 indexible data structure for the UMI-resolution graph to enable direct access, query, and
@@ -70,7 +70,7 @@ analysis, and latent space transformations as freely available, open source soft
 We will additionally develop platform-agnostic input and output data formats and standards
 for latent space representations of the HCA data to maximize interoperability. The software
 tools produced will be fast, scalable, and memory-efficient by leveraging the available
-assets and expertises of the R/Bioconductor project (PIs Hicks & Love) as well as the
+assets and expertise of the R/Bioconductor project (PIs Hicks & Love) as well as the
 broader HCA community.
 
 By using and extending our base enabling technologies, we will provide three principle
@@ -95,18 +95,19 @@ The primary approach to search in low-dimensional spaces is straightforward: one
 must create an appropriate low-dimensional representation and identify distance functions
 that enable biologically meaningful comparisons. Ideal low-dimensional representations are
 predicted to be much faster to search, and potentially more biologically relevant, as noise
-can be removed. In this aim, we will evaluate novel low-dimensional representations to
+can be removed. In this aim, we will evaluate novel, low-dimensional representations to
 identify those with optimal qualities of compression, noise reduction, and retention of
-biologically meaningful features. Current scRNA-Seq approaches require investigators to
-perform gene-level quantification on the entirety of a new sample. We aim to enable search
+biologically meaningful features. Current scRNA-seq approaches require investigators to
+perform gene-level quantification on the entirety of a new sample. We aim to search
 during sample preprocessing, prior to gene-level quantification. This will enable in-line
 annotation of cell types and states and identification of novel features as samples are
 being processed. We will implement and evaluate techniques to learn and transfer shared
-low-dimensional representations between raw or lightly processed data (e.g., kmer representations or UMI-graphs) and quantified samples, so
-that samples where either quantified or raw data are available can be used for search and annotation
+low-dimensional representations between raw or lightly processed data (e.g., kmer
+representations or UMI-graphs) and quantified samples, so that samples where either
+quantified or raw data are available can be used for search and annotation
 [@url:https://github.com/greenelab/shared-latent-space].
 
-Similarly to the approach by which comparisons to a reference genomes can identify specific
+Similar to the approach by which comparisons to a reference genomes can identify specific
 differences in a genome of interest, we will use low-dimensional representations from latent
 spaces to define a reference transcriptome map (the HCA), and use this to quantify
 differences in target transcriptome maps from new samples of interest. We will leverage
@@ -186,7 +187,7 @@ individual-specific differences with the linear models proposed in Aim 1.
 *Rationale:* Low-dimensional representations of scRNA-seq and HCA data make tasks faster and
 provide interpretable summaries of complex high-dimensional cellular features. The HCA
 data-associated methods and workflows will be valuable to many biomedical fields, but their
-use will require an understanding of basic bioinformatics, scRNA-Seq, and how the tools
+use will require an understanding of basic bioinformatics, scRNA-seq, and how the tools
 being developed work. Furthermore, researchers will need exposure to the conceptual basis of
 low-dimensional interpretations of biological systems. This aim addresses these needs in
 three ways.