Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results Mixscape from PertPy vs Seurat #679

Open
albacorp opened this issue Nov 12, 2024 · 8 comments
Open

Different results Mixscape from PertPy vs Seurat #679

albacorp opened this issue Nov 12, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@albacorp
Copy link

Report

Hello,
We (@jameshyojaelee) are running PertPy Mixscape vs Mixscape with the same parameters and we are getting very different results -- PertPy is calling more perturbed cells than Mixscape -- which is ideal, but we want to be sure if this is true. Could you help us out, please?

Or at least to understand which are the differences between the Mixscape code vs PertPy Mixscape if there are any.

Please find the code attached and examples of the different guide barplots that we get running PertPy vs Seurat

241112_Scanpy_vs._Seurat.txt
guide_barplot_PertPy.pdf
guide_barplot_Seurat.pdf

Version information

No response

@albacorp albacorp added the bug Something isn't working label Nov 12, 2024
@Zethson Zethson changed the title Different results Mixscape from PertPy vs Seurat - Help Different results Mixscape from PertPy vs Seurat Nov 12, 2024
@Zethson
Copy link
Member

Zethson commented Nov 13, 2024

Thanks for reporting! We will have a look.

@Lilly-May
Copy link
Collaborator

Hi @albacorp! Could you provide more information on the data you're using, please? Are you loading it via the pertpy dataloader? If not, it would be great if you could share the data with me. If sharing it publicly isn't possible, feel free to reach out to me via [email protected]

@jameshyojaelee
Copy link

Hello @Zethson and @Lilly-May, thank you so much for such a quick response. We've been enjoying your PertPy package a lot!
@albacorp has generated the dataset from her experiment on the 10x platform. We used Cellranger multi to generate the h5 matrix which contains both the RNA and CRISPR guide data. Below is the code I wrote to import the data.

If this is not enough info, I can also subsample one of our datasets and email it to you. Thank you!

`samples = {
"Sample1": "/count/filtered_feature_bc_matrix.h5",
"Sample2": "/count/filtered_feature_bc_matrix.h5"
}
rna_dict = {}
crispr_dict = {}

for sample_id, filepath in samples.items():
sample_data = mu.read_10x_h5(filepath)
rna_data = sample_data.mod['rna']
crispr_data = sample_data.mod['CRISPR Guide Capture']
rna_data.obs_names = [f"{barcode}{sample_id}" for barcode in rna_data.obs_names]
crispr_data.obs_names = [f"{barcode}
{sample_id}" for barcode in crispr_data.obs_names]
rna_data.var_names_make_unique()
crispr_data.var_names_make_unique()
rna_data.obs['sample'] = sample_id
crispr_data.obs['sample'] = sample_id
rna_dict[sample_id] = rna_data
crispr_dict[sample_id] = crispr_data

rna_combined = ad.concat(rna_dict, label='sample', join='outer')
crispr_combined = ad.concat(crispr_dict, label='sample', join='outer')
combined_mdata = mu.MuData({"rna": rna_combined, "gdo": crispr_combined})
combined_mdata.write("combined_mdata.h5mu")`

@Zethson
Copy link
Member

Zethson commented Nov 13, 2024

Thank you!

I do fear that we need a downsampled version of the dataset where you can reproduce the issue that you described above. We have done extensive tests with other datasets where we verified that the R and Python implementations are very similar. Therefore, your dataset is essential to debug this.

@jameshyojaelee
Copy link

Got it! We just sent a sample matrix file to [email protected]. Thank you!

@Lilly-May
Copy link
Collaborator

Hi @albacorp and @jameshyojaelee! I finally found the time to look into this (sorry it took me so long!). Unfortunately, I can't fully reproduce your steps as there seems to be some part of the code missing. For example, in the original .txt file you sent, you call pt.tl.Mixscape.perturbation_signature on the perturbation column and split by sample. However, as far as I can see, these columns are not generated during preprocessing and are also not present in the data excerpt you shared with me.

That said, while reviewing your code, I noticed the following: you run pertpy's Mixscape implementation on scaled data. Although in Seurat's implementation the data is also scaled, note that the perturbation signature is calculated from unscaled data in Seurat's version (combined <- CalcPerturbSig(... slot = "data", ...). I tested this on another dataset and found that running pertpy's Mixscape on scaled data produced quite different results compared to running Seurat's Mixscape on unscaled data, which might explain why you get different results. Keep in mind that PCA is indeed calculated on scaled data in Seurat's implementation, so you might still want to scale the data for PCA with pertpy/scanpy, but consider storing the scaled data in a separate layer.

If you'd like me to take a closer look, it would be helpful if you could send the entire code you used, starting from the raw data to the Mixscape results (per mail is also fine if you don't want to share it publicly). Additionally, please try to adjust the scaling on your end and report back if that resolves the issue.

One final note: some differences between Mixscape results are expected due to several non-deterministic steps in the function. Unless you specify a seed, the results might slightly vary between runs even for the same implementation. However, these differences should generally be minimal.

@albacorp
Copy link
Author

albacorp commented Dec 9, 2024 via email

@Zethson
Copy link
Member

Zethson commented Dec 9, 2024

Dear @albacorp

We don't want to add more work on your end

No, we are actually very interested in 100% ensuring that our results match the expectations. So please do not be afraid to add more work to our pile :)
Tiny deviations are expected because the R and Python numerical calculations deviate slightly. But the overall results simply must be very comparable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants