-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correcting cell-level p-values for multiple comparisons? #94
Comments
It is a really good question. I think to do correction or not to do, really depending on your cost for false positive or false negative. Performing bh correction is to reduce false positive rate, with the scrafication for missing true signals, but I think in this case, the cost of missing a true important cell type for a disease is larger than accepting a risky cell type for a disease, and thus I think it is ok to use the current p-value setting. https://stats.libretexts.org/Bookshelves/Applied_Statistics/Biological_Statistics_(McDonald)/06%3A_Multiple_Tests/6.01%3A_Multiple_Comparisons For the second point, I am considering to improve it with more atlas-level datasets 🤔️. |
I recommend always using FDR control. Detecting cells based on p<0.05 will give you a lot of false positives and is against the statistical principles of hypothesis testing. If it is very underpowered, consider increasing the FDR threshold, e.g., to 0.2. |
Yes, this may indeed be the reason, that scDRS is underpowered. Again, consider increase the threshold. Also, consider imputing the data using MAGIC first before applying scDRS, a procedure discussion here #32 This procedure seems to be a good workaround for the low power issue, as documented in a recent paper https://www.biorxiv.org/content/10.1101/2024.02.05.579042v1.abstract Moreover, we are developing a much more powerful version of scDRS, which I hope to share in a few months. |
Hi, thanks for a great package! I am working with a brain snRNAseq dataset and have run scDRS to test for the enrichment of MDD, ADHD, ALZ, MS, SCZ, and height GWAS hits (using the MAGMA scores from your original publication). For the cell-level MC p-values, is it appropriate to use a cutoff of 0.05 to say something like, X number of cells were significantly associated with X disease? Or should I be doing a B-H p-value correction based on the number of cells (i.e. total number of p-values computed)?
I also ran the group-level downstream analysis and found that very few cell types were significantly associated (FDR < 0.1; as plotted here: https://martinjzhang.github.io/scDRS/notebooks/quickstart.html) with these traits, despite prior studies (including your original paper), showing that many more should be. Any thoughts on this? Is this because of what you noted in the discussion section of the paper?: "Second, the fact that scDRS assesses the statistical significance of an individual cell’s association to disease by implicitly comparing it to other cells via matched control genes may reduce power if most cells in the data are truly causal."
Many thanks,
Margaret
The text was updated successfully, but these errors were encountered: