CubeGLM is a Python package for analyzing fluorescent hyperspectral images. Key features include the abilities to quantify known spectral components (e.g. GFP and chlorophyll, in the context of plant biology) in each pixel of hyperspectral images (A.K.A. hypercubes) as well as producing false color images to allow visualization of each component's concentration in each pixel. This package was developed for use in the GMOdetector workflow to study plant transformation; however, it can be used to analyze any hyperspectral images in which the number of spectral components is low enough for them to be quantified by regression without high levels of noise.
conda install -c "naglemi" cubeglm
- Install a GitHub interface (command line or graphical and clone this repository (e.g.
git clone https://github.com/naglemi/gmodetector_py.git
if using the command line. - (Recommended): Install Anaconda or Miniconda to support installation of dependencies without conflicts.
- Create an environment that already has most dependencies using this command:
conda create -y -c conda-forge -n test-environment python=3.7.4 pytest codecov numpy pandas matplotlib tzlocal h5py pytables rpy2 parallel
- Activate the new conda environment
conda activate test-environment
- Install Spectral Python from the pypi repository
pip install spectral
- Launch R and install scales from the R console via:
install.packages('scales', repos = 'http://cran.us.r-project.org')
- Note: To launch R, just type
R
on the command line and press enter. Once you have completed installation and are ready to leave R and return to command line, enterq()
. When prompted, entern
to indicate there is no need to save the workspace.
- Install gmodetector_py and test dependencies. The below command should be run from inside the gmodetector_py directory that is created from step 1 of this section.
pip install ".[test]"
We provide two sample hyperspectral images that can be used to test CubeGLM and make sure the package is working properly before running your own data. Since these hyperspectral files are too large for GitHub (~1.4GB each), they can be found on Google Drive. This folder also includes example outputs for comparison to ensure test analysis runs properly.
To quantify spectral components in our hypercube, we must have a design matrix with columns representing the spectra of each known component. This can be prepared using the XMatrix
class.
test_matrix = XMatrix(fluorophore_ID_vector = ['GFP', 'Chl', 'Noise'],
spectral_library_path = spectral_library_path,
intercept = 1,
wavelengths = read_wavelengths(file_path = file_path),
spectra_noise_threshold = 0.01,
min_desired_wavelength = 500,
max_desired_wavelength = 900)
intercept
can be set to 0 if there is no background signal in your hyperspectral images. To be clear, this means there is not even a faint signal when images are taken with no light source and perfect darkness is expected. If there is background signal, a y-intercept should be included by settingintercept
to 1 instead of 0.spectral_library_path
should point to the path for thespectral_library
folder in this GitHub repo, or alternatively your own spectral library with spectra in the same format.fluorophore_ID_vector
should be a list that matches file prefixes of spectra in this folder.file_path
should point to an ENVI-formathdr
file that lists wavelengths for a sample hyperspectral image. Assuming the wavelengths for all of your images are the same, it can be any one of these images. The prefix to thehdr
file (containing metadata) must match araw
file (containing the hypercube itself) in the same folder.
test_cube = Hypercube(file_path,
min_desired_wavelength = 500,
max_desired_wavelength = 900)
- See under "Preparing a design matrix..." for the description of
file_path
. When creating an object ofHypercube
class, it should point to thehdr
file for the desired hyperspectral image.
weight_array = WeightArray(test_matrix=test_matrix,
test_cube=test_cube,
relu = relu_before_plot)
test_matrix
andtest_cube
should point to objects ofXMatrix
andHypercube
classes, produced as described in the previos two steps.relu_before_plot
can be set toTrue
(recommended for standard use) orFalse
. IfTrue
, negative weights will be replaced with zero. This helps us make sure the scales for false color are consistent across images with the same settings (as described below).
To save a weight_array
object for downstream analysis with other tools, we can use the save
method for the WeightArray
class as follows:
weight_array.save(path = output_file_name_prefix,
format = weight_format,
output_dir = output_dir,
threshold = threshold)
path
points to where the file will be saved. If you wish for the path to match the file preix of the imported hypercube, this can be set tontpath.basename(weight_array.source)
. Note,ntpath
must first be imported (import ntpath
).format
can be set to eithercsv
orhdf
(the latter for faster loading in downstream analysis with other tools).threshold
refers to a significance threshold. Above this threshold, all cumulative signal will be counted to produce summary statistics. The units are arbitrary and depend on a given hyperspectral camera. In our research, we set this to 38 because this represents above which significant GFP or DsRed reporter protein signal can be clearly distinguished from background noise. This only affects the_summary.csv
files discussed below.
When a weight array is saved, two outputs will be produced. The first is the weight array itself, which can be analyzed with downstream tools such as in our GMOdetector workflow to study plant transformation. The second output is a _summary.csv
file that contains the cumulative signals for each fluorescent component (with thresholding according to the aforementioned threshold
parameter). This latter file can be used for statistical analysis in other workflows that do not involve further examination of the weight array itself.
We can produce false color images representing the intensities of up to three spectral components using the FalseColor
and ImageChannel
classes.
stacked_component_image = FalseColor([ImageChannel(weight_array = weight_array,
desired_component_or_wavelength = "GFP",
color = 'green',
cap = 400),
ImageChannel(weight_array = weight_array,
desired_component_or_wavelength = "Chl",
color = 'red',
cap = 200),
ImageChannel(weight_array = weight_array,
desired_component_or_wavelength = "Noise",
color = 'blue',
cap = 200)])
weight_array
can point to an object of classWeightArray
if you wish to plot spectral component weights. Alternatively, this parameter can be replaced withhypercube
and pointed to aHypercube
object if you wish to plot wavelength intensities.Desired_component_or_wavelength
can be set to either a string value referring to a known spectral component in the spectral library, or to a numeric value for a given wavelength for which data is collected.color
must be set tored
,green
, orblue
for a given channel.cap
indicates an upper limit of signal for each of these component. Any signal at or above these values will appear with maximum brightness; thus, these variables are comparable to exposure on an RGB camera. If caps are too high, not much signal at lower ranges will be seen. If cap for a given component is too low, the false color images will appear overexposed with respect to the component.
Saving false color images:
stacked_component_image.save(path_prefix = stacked_component_image.source,
output_dir = output_dir)
This will save a png
file for the FalseColor
image with the same file prefix as the input hypercube, in the user-defined output_dir
.
In this repository, analyze_sample.py
is provided in the wrappers
folder. This provides a high-level Python function, as well as a command line interface, to run over the previously described functions for a given sample.
For details, run python analyze_sample.py
from the command line to see command-line argument descriptions, or run help(analyze_sample)
from Python after loading the function.
See the notebook Designing_regression.ipynb
for a description of the mathematical approach.
We thank the National Science Foundation Plant Genome Research Program for support (IOS #1546900, Analysis of genes affecting plant regeneration and transformation in poplar), and members of GREAT TREES Research Cooperative at OSU for its support of the Strauss laboratory.