-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Seurat object to h5ad file #59
Comments
Update: I've attempted to transform the Seurat object to an h5ad file using another R package, but unfortunately encountered the same error. This leads me to believe that the problem may be due to the existence of missing values (NA) in the expression data. However, after checking the data, I found that there are no NAs. Can you please provide some suggestions on other potential issues that could be causing this problem? Thank you very much!
|
Hi @uqzqiao Thank you for reporting the issue. It seems your I couldn't locate the exact issue. Being more careful with input data types on your side or forcing the data to have the same type on the software side may solve the problem. If you can provide us with a small set of input files that can reproduce this error, I can look into the issue and get back to you in a few days. Best, |
Hi @martinjzhang, Thank you so much for your response! I really appreciate your help! I noticed that the error I encountered was due to NAs in the covariate file, which was an oversight in my preparation process. I am sorry for any inconvenience this may have caused. I do have a couple more questions regarding the software (I apologize for posting my additional questions in this thread, as they may be better suited for a separate discussion),
Thank you again for your time and help! Best regards, |
Hi Martin, I apologize for asking so many questions, but I'm looking for some advice on how to proceed if no cells are significant at certain FDR thresholds (0.1 or 0.2). I'm aware that FDR correction may not effectively retain true positives if there are a large number of tests needed to be corrected for (e.g., I got 1.26m cells). In this case, would it be appropriate to compute a score for each cell type to identify any cells associated with disease? Best wishes, |
Hi @uqzqiao Great that you can successfully run the software. Please see my suggestions below.
The MHC region has a complex LD structure. That's why we excluded this region. scDRS is compatible with any disease gene set. So if you can curate a set of disease genes using other software, that would work too. If you focus on signals of common SNPs (e.g., >5%), HapMap3 SNPs should capture most of the GWAS information. I am not sure about rare SNPs. It may be helpful to include other SNPs if you care about rare variant signals.
You can encode these covariates as dummy variables.
Even if no cells are significant individually, they may yield significant results in the group-level analysis. I suggest creating a UMAP visualization with color representing the normalized scDRS disease score. If you can notice any pattern, such as certain groups of cells having much greater scDRS disease scores, then it indicates that scDRS is likely capturing some disease signals. Please feel free to let us know if you have more questions. Best, |
Hi Martin, Thanks so much for your reply! It has been incredibly helpful!! I've got some interesting results and I'm currently working on understanding and interpreting them. I have a couple more questions: Could you please explain the methods used to correct for multiple testing (regarding the individual cell disease relevance scores)? If it is BH, would it be appropriate for me to recalculate the number of significant cells with alternative methods, using the scDRS p-values (pval) from the score file? In the downstream analyses ( I am particularly interested in the prioritized disease-relevant genes obtained from the Thank you very much for your help! Best regards, |
Yes, we used BH. The pval in the score files are raw p-values. You can apply any multiple testing methods on them.
Yes. Similar to the group-level analysis, the correlation analysis may be more powerful than the individual cell-level association analysis by pooling information across cells.
scDRS may give you NA results because it computes the correlation using all overlapping cells between adata and the score file. One way to get around this is to create smaller score files restricted to only cells with the cell-level variable values.
Have you looked at the drug targets in https://www.opentargets.org/ |
Hi,@martinjzhang example data from
|
Hi @jjzixue , In your new data file, gene names become numbers. As a result, scDRS can not match genes in the
Let me know if you need further help. |
Hi @martinjzhang , |
Hi @martinjzhang , Thank you for providing us with such a wonderful package. So my question is whether it would be acceptable to use the raw count data for multiple samples under the same conditions? Adding batch information into a covariate file could be one of the solutions? Sincerely, |
Hi @TakuroIwami , that's fine. |
Hi @martinjzhang , Thank you for your response. |
Hi @TakuroIwami , The results are likely still interpretable but this is not scDRS's intended use.
I suggest also performing this analysis and show the results are consistent. Martin |
Hi @martinjzhang , Thank you for your quick response. |
yes, it's fine that the UMAP and scDRS disease scores are computed using different versions of datasets. |
HI @martinjzhang , Thank you for your answer. The analysis seemed to be ok with your tremendous support. I greatly appreciate it. |
Similar to the others, I am using a large single cell dataset that I am trying to convert from .rds to .h5ad to use with scDRS. However, it appears that my single cell dataset is larger than the Seurat package can handle (it seems the authors are aware of this limitation and will work on a solution at some point). My question for you is - is there an alternative way for working with single cell datasets within scDRS rather than requiring them to be unified in a single object? Thank you! |
Can you split the dataset into a few parts, convert them to h5ad, and read and merge them within python? |
Hi!
Our scRNA-seq data was processed using the Seurat package. In order to run scDRS with our scRNA-seq data, I first converted the Seurat object saved as an RDS file to an h5ad file using the following script,
However, when I tried to use this h5ad file for the compute-score step, I encountered the following error,
I'm more experienced with R than Python, so I was hoping you could help me suggest a better way to convert the Seurat object to an h5ad file. Thanks so much!
The text was updated successfully, but these errors were encountered: