Container class to represent and manage multi-omics genomic experiments. MultiAssayExperiment
(MAE) simplifies the management of multiple experimental assays conducted on a shared set of specimens, follows Bioconductor's MAE R/Package.
To get started, install the package from PyPI
pip install multiassayexperiment
An MAE contains three main entities,
-
Primary information (
column_data
): Bio-specimen/sample information. Thecolumn_data
may provide information about patients, cell lines, or other biological units. Each row in this table represents an independent biological unit. It must contain anindex
that maps to the 'primary' insample_map
. -
Experiments (
experiments
): Genomic data from each experiment. either aSingleCellExperiment
,SummarizedExperiment
,RangedSummarizedExperiment
or any class that extends aSummarizedExperiment
. -
Sample Map (
sample_map
): Map biological units fromcolumn_data
to the list ofexperiments
. Must contain columns,- assay provides the names of the different experiments performed on the biological units. All experiment names from experiments must be present in this column.
- primary contains the sample name. All names in this column must match with row labels from col_data.
- colname is the mapping of samples/cells within each experiment back to its biosample information in col_data.
Each sample in
column_data
may map to one or more columns per assay.
Let's start by first creating few experiments:
from random import random
import numpy as np
from biocframe import BiocFrame
from genomicranges import GenomicRanges
from iranges import IRanges
nrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
gr = GenomicRanges(
seqnames=[
"chr1",
"chr2",
"chr2",
"chr2",
"chr1",
"chr1",
"chr3",
"chr3",
"chr3",
"chr3",
] * 20,
ranges=IRanges(range(100, 300), range(110, 310)),
strand = ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
mcols=BiocFrame({
"score": range(0, 200),
"GC": [random() for _ in range(10)] * 20,
})
)
col_data_sce = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=[f"sce_{i}" for i in range(6)],
)
col_data_se = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=[f"se_{i}" for i in range(6)],
)
sample_map = BiocFrame({
"assay": ["sce", "se"] * 6,
"primary": ["sample1", "sample2"] * 6,
"colname": ["sce_0", "se_0", "sce_1", "se_1", "sce_2", "se_2", "sce_3", "se_3", "sce_4", "se_4", "sce_5", "se_5"]
})
sample_data = BiocFrame({"samples": ["sample1", "sample2"]}, row_names= ["sample1", "sample2"])
Finally, we can create an MultiAssayExperiment
object:
from multiassayexperiment import MultiAssayExperiment
from singlecellexperiment import SingleCellExperiment
from summarizedexperiment import SummarizedExperiment
tsce = SingleCellExperiment(
assays={"counts": counts}, row_data=gr.to_pandas(), column_data=col_data_sce
)
tse2 = SummarizedExperiment(
assays={"counts": counts.copy()},
row_data=gr.to_pandas().copy(),
column_data=col_data_se.copy(),
)
mae = MultiAssayExperiment(
experiments={"sce": tsce, "se": tse2},
column_data=sample_data,
sample_map=sample_map,
metadata={"could be": "anything"},
)
## output
class: MultiAssayExperiment containing 2 experiments
[0] sce: SingleCellExperiment with 200 rows and 6 columns
[1] se: SummarizedExperiment with 200 rows and 6 columns
column_data columns(1): ['samples']
sample_map columns(3): ['assay', 'primary', 'colname']
metadata(1): could be
For more use cases, checkout the documentation.
This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.