In this module, we perform four preprocessing steps on the SQLite files using pycytominer:
- Merge and annotate single cells from the SQLite file using the pycytominer SingleCell class
- Normalize the single cells using the negative controls (e.g., DMSO for compound treatment, no-target or target intergenic region sgRNAs for crispr treatment, and genes with weak signatures in orf treatment) as reference for the standard scalar method per plate.
- Feature Select the single cell plate morphology data per plate by variance thresholding, correlation thresholding, and by filtering columns containing NaNs and columns specified in the blocklist.
- Aggregate both the normalized and feature selected single-cell morphology data to the well level.
To process the data, run the process_data.sh file which will convert the notebook into a python file and run it from terminal.
# Make sure you are in the 1.process_data directory
cd 1.process_data
# Process the data with steps 1-3
./process_data.sh
# Process the data with step 4
./aggregate_sc_data.sh