Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical assoc_mcp in results #107

Open
auesro opened this issue Oct 30, 2024 · 12 comments
Open

Identical assoc_mcp in results #107

auesro opened this issue Oct 30, 2024 · 12 comments

Comments

@auesro
Copy link

auesro commented Oct 30, 2024

Hi @martinjzhang ,

I have used scDRS (great tool, thank you!) to score a single cell dataset against a list of genes related to a disease (obtained from GWAS sum stats). I am a bit puzzled with the result for the cell type association:

`group n_cell n_ctrl assoc_mcp assoc_mcz hetero_mcp hetero_mcz n_fdr_0.05 n_fdr_0.1 n_fdr_0.2
A 8938 200 0.0049751243 9.102756 0.0049751243 5.8678365 2800 3027 3296
B 1432 200 0.0049751243 4.5017133 0.0099502485 3.6684802 278 312 354
C 214 200 0.0049751243 11.462115 0.0049751243 7.4737678 139 151 159
D 142 200 0.0049751243 27.9078 0.24875621 0.7671775 142 142 142
E 373 200 0.0049751243 26.924177 0.0049751243 4.5101657 373 373 373
F 316 200 0.0049751243 29.63548 0.0049751243 3.9916444 316 316 316
G 169 200 0.0049751243 30.313704 0.024875622 2.6987221 169 169 169
H 228 200 0.0049751243 28.936302 0.029850746 2.2715647 228 228 228
I 324 200 0.0049751243 30.983482 0.0049751243 4.5710645 322 322 322
J 301 200 0.0049751243 29.881008 0.049751244 2.0174477 300 300 300
K 232 200 0.0049751243 25.27538 0.07960199 1.5823182 232 232 232
L 372 200 0.0049751243 25.981464 0.0099502485 2.542919 369 369 369
M 282 200 0.0049751243 30.938341 0.029850746 2.1587946 282 282 282
N 173 200 0.0049751243 34.117924 0.06965174 1.4641023 173 173 173
O 88 200 0.0049751243 36.643044 0.024875622 1.9150562 88 88 88
P 259 200 0.0049751243 41.091362 0.014925373 2.2539406 259 259 259
Q 273 200 0.0049751243 34.183285 0.0099502485 2.7920082 273 273 273
R 263 200 0.0049751243 38.294342 0.0099502485 2.379525 257 258 258
S 454 200 0.0049751243 40.438503 0.0049751243 2.7016406 452 452 452
T 179 200 0.0049751243 35.299545 0.2636816 0.75084305 172 172 172
U 627 200 0.0049751243 38.881256 0.0049751243 5.3322315 627 627 627
V 165 200 0.0049751243 40.369034 0.11442786 1.2370316 159 160 160
W 89 200 0.0049751243 34.50571 0.094527364 1.3114314 89 89 89
X 304 200 0.0049751243 34.702892 0.11442786 1.3594794 304 304 304
Y 260 200 0.0049751243 33.966785 0.0049751243 2.3594382 259 259 259
Z 187 200 0.0049751243 32.84846 0.014925373 2.4981256 181 182 183
AA 143 200 0.0049751243 28.68645 0.014925373 2.4380805 139 139 139
AB 359 200 0.0049751243 38.488644 0.5920398 -0.046441507 358 358 358
AC 136 200 0.0049751243 31.9772 0.19900498 0.90170306 134 134 134
AD 293 200 0.0049751243 35.52879 0.0099502485 2.445356 292 292 293
AE 438 200 0.0049751243 32.291782 0.0099502485 2.7945273 437 437 437
AF 321 200 0.0049751243 35.203712 0.04477612 1.8123521 321 321 321
AG 143 200 0.0049751243 37.025703 0.03482587 1.7594645 143 143 143
AH 201 200 0.0049751243 39.599396 0.0049751243 3.6155078 201 201 201
AI 198 200 0.0049751243 39.15733 0.2885572 0.5785125 198 198 198
AJ 122 200 0.0049751243 38.167957 0.11442786 1.1606476 120 120 120
AK 1025 200 0.0049751243 33.04211 0.03482587 2.1185117 1021 1023 1023
AL 488 200 0.0049751243 6.744684 0.0049751243 4.9473066 306 334 359
AM 1953 200 0.0049751243 7.2396345 0.0049751243 3.588506 637 696 769
AN 15425 200 0.05472637 1.7903067 0.0049751243 11.273278 915 1156 1536
AO 1614 200 0.0049751243 14.856123 0.0049751243 8.108011 1581 1586 1595
AP 4768 200 0.0049751243 20.928226 0.0049751243 11.643588 4613 4633 4644
AQ 66266 200 0.0049751243 32.22967 0.0049751243 14.024827 15426 16408 17676
AR 607 200 0.0049751243 8.526978 0.0049751243 3.1208818 257 289 311
AS 921 200 0.0049751243 11.970217 0.0049751243 5.740329 662 688 712`

Whats the reason behind getting all (except one) cell types an assoc_mcp equal to 0.004975? Also I read you recommend looking at the mcz scores...sorry for the naive question but is it the higher mcz score the better?

Thank you!

Cheers

@martinjzhang
Copy link
Owner

it means your observed test statistic is more extreme than all control statistics (for scDRS group-level analyses), indicating your cell type-disease associations are very significant. I note that
0.0049751243 = 1 / 201
I am a little surprised that all (but 1) cell types are associated. The software runs ok, but probably you want to check if you done a proper QC...

@auesro
Copy link
Author

auesro commented Oct 30, 2024

I feared something like that...
Anything I could have missed beyond log-normalization, filtering to remove 0 count genes? I followed even your processing script so any suggestions more than welcome.
I must add that all the celltypes include glia and neuron subtype cells (edited for clarification).

@martinjzhang
Copy link
Owner

It's common for scDRS to associate most neuronal cells to a brain disease like SCZ. Maybe looking at the different association strength across cell types will tell you something. Also, increase n_ctrl (from 200) to 1000 can improve calibration.

@auesro
Copy link
Author

auesro commented Oct 30, 2024

Thank you for the suggestion, I will run it right away with 1000 n_ctrl.
How would you evaluate the "different association strength"?

@martinjzhang
Copy link
Owner

Higher norm disease scores --> stronger association. Maybe you can do a UMAP and look at which regions are more strongly associated?

@auesro
Copy link
Author

auesro commented Oct 30, 2024

Yea, sure, I did that. I can find some subtypes with clearly higher norm disease scores so I would delve into those.
Also regarding the scores in the group-level analyses, mc z-scores are "better" than mc p values, right? Now, what is not clear to me is whether the higher the "better" or the lower the "better"?
Thanks a lot for your help Martin!

@martinjzhang
Copy link
Owner

mc z-scores are "better" than mc p values in the sense that they can better distinguish disease association strength when the p-values are saturated.

higher z-score --> lower p-value.

p = 1 - CDF(z), roughly , where CDF is normal CDF.

@auesro
Copy link
Author

auesro commented Oct 30, 2024

Saturated p-values which is the case I find for most of my celltypes in the first table I posted, right?

@martinjzhang
Copy link
Owner

Yes, that your mc_p = 1 / (1+n_ctrl)

@NaomiHuntley
Copy link

Hi @martinjzhang
I am getting a similar result, with my lowest assoc_mcp being 0.000999001 for many of my cell type associations. From this thread I understand that first I should ensure that I have properly done QC to my scRNA seq data. I want to confirm that it was appropriate for me to use my raw single cell data in both the compute score and downstream analyses, as long as in both of these steps I set both --flag_filter True and --flag_raw_count True? Or should I have done additional preprocessing?

Alternatively may I have created way too large of a single cell dataset? I am trying to do a body wide analysis so I ended up with an AnnData object with n_obs × n_vars = 771060 × 79209. Maybe I should be trying to scale this back?

@martinjzhang
Copy link
Owner

martinjzhang commented Nov 26, 2024

raw single cell data in both the compute score and downstream analyses

Yes, you are supposed to use the raw count data

If assoc_mcz are different, the results should be considered as normal.

@auesro
Copy link
Author

auesro commented Nov 26, 2024

AnnData object with n_obs × n_vars = 771060 × 79209

@NaomiHuntley how did you end up with 79209 genes? Way more than any "normal" scRNAseq study I have ever seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants