-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about paired experimental design with correction plus testing interaction effect? #41
Comments
@haowulab Instead, Do you know of a way I can use the results from my paired analysis to perform a second statistical test? Basically, I am trying to find a work around here. I found those CpG sites that were significantly different between before and after surgery (after correcting for cell type fractions), and now I want to take those sites and see if they are associated with phenotypes of surgical complications... Code: DMLfit = DMLfit.multiFactor(chr2_bismarkBSseq, design, formula = ~Treatment + pair + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL) |
Sorry for my tardy reply. You can certainly analyze paired data using DSS multiple factor function. But you need to be careful to setup the design. Please read this for RNA-seq analysis using DESeq2: https://support.bioconductor.org/p/84241/#84256. Basically, the effect you try to test should be an interaction term. |
@haowulab Q1) Could you please help with how I setup (the paired design plus test for an interaction of chronic pain response) exactly using DSS multiple factor function for bsseq data? So far, mine looks like this: DMLfit = DMLfit.multiFactor(BSobj, design, formula = ~ Treatment + pair + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL) I had set it up using: covar <- read.csv(#path to .csv containing my cell type fractions to correct for next to filename of each patient...
Treatment = factor(rep(c("Preop","Postop"),each = 51))
pair = factor(rep(1:51,2))
design = data.frame(Treatment, pair, covar) And where my design dataframe ended up looking like this: design.csv I know I need to add in a "response" column to my dataframe in which 1= pain and 0 = no pain; then somewhere in my equation add in the response. Also not sure but DESeq2 example there were talking about it being important that their paired design was set up 1,1,2,2,3,3,4,4,5,5,6,6 instead of 1,2,3,4,5,6,1,2,3,4,5,6 for some reason? Should I change mine? Sincerely grateful for your help and time to allow us to perform this analysis, |
The paired design described in the DSS manual is a very simple one, where only treatment (surgery in your design) is tested. In your study, I believe you want to test the association between the change of methylation before/after surgery and chronic pain. So you want to study the change (with chronic pain) of change (with surgery), which is an interaction. Do you have cell type proportions before and after surgery? If so, your formula should be
Then you want to test the I wrote a simple simulation for you to play with. See below. You can study this to understand the concept.
|
Thank you Hao! Yes, I have cell type proportions both before and after surgery (please see The She told me to use the following formula instead : Thank you for being so helpful to allow us to perform this analysis! |
The singularity problem is because the design matrix is not of full rank. Perhaps you have to remove one of the cell type proportions in the model, since they sum up to 1 they are not independent. For the formula suggested by your statistician: you can use that formula too. But if the cell type proportions are independent with Response, it won't make much difference. However, if the proportions are associated with Response, the interpretation will be different. Even with the interaction, if you test Response:Treatment, it's not testing the "pain specific cell type effect". Such effect can be tested by testing things like Response:Mono_3, etc. You can play with the following codes:
|
By the way, I'm curious how you get the cell type proportions. Did you estimate them from the data? |
Hmm I am in the middle of studying your examples (and multiple regression in general), but I do not understand: For my blood fraction in silico estimation:
|
Oh! And Q4) Your DSS manual outlines reasoning for why one would not need to subset their whole-genome bisulfite seq data by minimum coverage threshold: does the same apply for the general hypothesis testing design that I am using?? ...I would think I would want to subset my data by at least mincov 4 or so, just to qualify that I am getting an accurate methylation call since my seq. data is from whole blood (mixed cells)... Thank you! |
See my comments below.
On Nov 12, 2023, at 12:24 AM, KristenKrolick ***@***.***> wrote:
@haowulab <https://github.com/haowulab>
Hmm I am in the middle of studying your examples (and multiple regression in general), but I do not understand:
Q1) The 'fit = lm()' function in R seems to be a little different from your 'DMLfit = DMLfit.multifactor function'-- as in the dependent variable (aka the y in y = B0 + B1x) usually goes to the left of the tilde. In your function are we assuming that to the left of the tilde would be the Bsobj, the methylation values?
Yes this is correct.
Q2) I am not understanding how we are testing the difference in methylation between preop and postop (after correcting for blood cell type fractions) from this formula? -- I suppose it's okay if I don't quite understand this one, as long as you and our Statistician confirm this for me.
Yes. consult your statistician.
Q3) Okay so if I use the DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response)). I get the error, Error in makeContrast(DMLfit, term) : Some term(s) to be tested can't be found in the formula . How would I fix this?
hmmm, this error should not appear in DMLfit.multiFactor function. It should be in DMLtest.multiFactor function, because you are trying to test something. What did you use for that? To choose the proper term to test is a bit complicated, especially in your design. Please read the manual, and consult your statistician.
For my blood fraction in silico estimation:
I used the Reference provided by Salas et al. 2022 (https://www.nature.com/articles/s41467-021-27864-7), actual Ref available:(https://github.com/immunomethylomics/FlowSorted.BloodExtended.EPIC/tree/main).
I converted this reference from the Illumina Array sites into UCSC chromosomal coordinates using the manifest provided by Illumina.
I used Epidish to perform the deconvolution.
All of this is outlined and the code I used here: https://github.com/KristenKrolick/Chidambaran_Lab_Manuscript/blob/main/n)%20Epidish
p.s. We had samples that we had flowcyto data for as well as whole-genome meth seq, so I was able to perform a Pearson correlation, and (at least for those cells whose same cell markers were used in our flow data vs. salas ref cell markers he used to make his Ref), they correlated! So, I really liked his Reference.
Okay thanks. I’m not sure if using epidish will be good for BS-seq. What’s the correlation?
… —
Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD7H4N7OSVYKSJ2N5H3XTZLYD6RDJAVCNFSM6AAAAAA6472JXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWHA2TONRQHE>.
You are receiving this because you were mentioned.
|
You can certainly filter the sites by coverage, but it’s not necessary. DSS will consider the coverage in the model. If the coverage is very shallow, it’s very unlikely that the site will be called as DML. This said, it doesn’t hurt to filter.
… On Nov 12, 2023, at 12:36 AM, KristenKrolick ***@***.***> wrote:
Oh! And Q4) Your DSS manual outlines reasoning for why one would not need to subset their whole-genome bisulfite seq data by minimum coverage threshold: does the same apply for the general hypothesis testing design that I am using??
...I would think I would want to subset my data by at least mincov 4 or so, just to qualify that I am getting an accurate methylation call since my seq. data is from whole blood (mixed cells)...
Thank you!
—
Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD7H4NYJPWGOJBPVKEJFIL3YD6SPRAVCNFSM6AAAAAA6472JXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWHA3DAMZWGQ>.
You are receiving this because you were mentioned.
|
@haowulab So the design matrix is the same as above (4 days ago comment). Here was my code: DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
dmlTest = DMLtest.multiFactor(DMLfit, term="(Treatment*response)")
#Error in makeContrast(DMLfit, term) : Some term(s) to be tested can't be found in the formula. treatment*response was clearly in the formula? Should I send all the beginning parts of my code again in me making the design dataframe? |
...and If I look at the doublearry in DMLfit called "X". It is 102 rows by 60 columns, and looks like this: retried the dmlTest like so: Oh I have to name it as coef if I use it that way, using: |
@haowulab I would like to point out that I had my design data.frame wrong because of the way I had input my pain values (they did not correspond to the same order as my file names & were hence random data). Here is me fixing this and now including the full code: Basically, now when my design frame,design.csv, is fixed so that my "response" aka pain "yes" or "no" variables are in the correct location, DMLfit no longer works, returning collinearity error: require(bsseq)
library(dmrseq)
files <-list.files("/data/path/bismarkcovfiles")
files
setwd("/data/path/bismarkcovfiles")
bismarkBSseq <- read.bismark(files,
loci = NULL,
colData = NULL,
rmZeroCov = FALSE,
strandCollapse = TRUE,
verbose = TRUE)
#covar dataframe consisting of columns of cell type estimates and rows consisting of each sample:
## (preop samples 1 through 51 as rows 1 through 51 and then postop samples as rows 52 - 102)
covar <- read.csv("/data/path/CellTypeCovar_same51Visit1_2.csv")
Treatment = factor(rep(c("Preop","Postop"),each = 51))
pair = factor(rep(1:51,2))
response <- factor(c("Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes"))
#also tried this guy immediately below just in case, but had same error type of collinearity after attempting DMLfit function:
#response = factor(c("1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1", "1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1"))
##and also tried this guy immediately below just in case, but had same error type of collinearity after attempting DMLfit function:
#response = c("1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1", "1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1")
design = data.frame(Treatment, pair, response, covar)
#need to filter bismarkBSseq by chromosome because my analysis is too large to run
chr1_bismarkBSseq<- chrSelectBSseq(bismarkBSseq, seqnames = "chr1", order = TRUE)
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.59732e-18"
##Hmmm above says it means I have a problem of collinearity, so I ran a Pearson Correlation and turns out CD8_total was related to pain, so getting rid of CD8:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.61807e-18"
###trying it without NK and CD8 which were correlated > .1
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.62955e-18"
##trying it without NK and CD8, B, and Mono which were almost closer to correlated?
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.67338e-18"
#Also tried using Hao's formula:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response*(Treatment + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.42241e-18"
#Trying Hao's formula without CD8:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response*(Treatment + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.45254e-18"
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/R/4.2.1/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/4.2.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] dmrseq_1.18.1 DSS_2.46.0 BiocParallel_1.32.4 bsseq_1.34.0 SummarizedExperiment_1.28.0 Biobase_2.58.0 MatrixGenerics_1.10.0
[8] matrixStats_0.63.0 GenomicRanges_1.50.1 GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 rjson_0.2.21 ellipsis_0.3.2 XVector_0.38.0 rstudioapi_0.14 bit64_4.0.5
[7] interactiveDisplayBase_1.36.0 AnnotationDbi_1.60.0 fansi_1.0.4 xml2_1.3.3 codetools_0.2-18 splines_4.2.1
[13] R.methodsS3_1.8.2 sparseMatrixStats_1.10.0 cachem_1.0.6 Rsamtools_2.14.0 dbplyr_2.2.1 png_0.1-8
[19] R.oo_1.25.0 shiny_1.7.3 HDF5Array_1.26.0 BiocManager_1.30.19 readr_2.1.3 compiler_4.2.1
[25] httr_1.4.4 assertthat_0.2.1 Matrix_1.5-3 fastmap_1.1.0 limma_3.54.0 cli_3.6.1
[31] later_1.3.0 htmltools_0.5.4 prettyunits_1.1.1 tools_4.2.1 gtable_0.3.3 glue_1.6.2
[37] GenomeInfoDbData_1.2.9 annotatr_1.24.0 reshape2_1.4.4 dplyr_1.1.1 doRNG_1.8.2 rappdirs_0.3.3
[43] Rcpp_1.0.10 bumphunter_1.40.0 vctrs_0.6.1 Biostrings_2.66.0 rhdf5filters_1.10.0 nlme_3.1-160
[49] rtracklayer_1.58.0 iterators_1.0.14 DelayedMatrixStats_1.20.0 stringr_1.5.0 mime_0.12 lifecycle_1.0.3
[55] restfulr_0.0.15 rngtools_1.5.2 gtools_3.9.4 XML_3.99-0.13 AnnotationHub_3.6.0 zlibbioc_1.44.0
[61] scales_1.2.1 BSgenome_1.66.2 hms_1.1.2 promises_1.2.0.1 rhdf5_2.42.0 RColorBrewer_1.1-3
[67] yaml_2.3.6 curl_4.3.3 memoise_2.0.1 ggplot2_3.4.2 biomaRt_2.54.0 stringi_1.7.12
[73] RSQLite_2.2.19 BiocVersion_3.16.0 BiocIO_1.8.0 foreach_1.5.2 permute_0.9-7 GenomicFeatures_1.50.2
[79] filelock_1.0.2 rlang_1.1.0 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45 Rhdf5lib_1.20.0
[85] GenomicAlignments_1.34.0 bit_4.0.5 tidyselect_1.2.0 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1
[91] generics_0.1.3 DelayedArray_0.24.0 DBI_1.1.3 pillar_1.9.0 KEGGREST_1.38.0 RCurl_1.98-1.9
[97] tibble_3.2.1 crayon_1.5.2 utf8_1.2.3 BiocFileCache_2.6.0 tzdb_0.3.0 progress_1.2.2
[103] locfit_1.5-9.6 grid_4.2.1 data.table_1.14.8 blob_1.2.3 digest_0.6.31 xtable_1.8-4
[109] httpuv_1.6.6 regioneR_1.30.0 outliers_0.15 R.utils_2.12.2 munsell_0.5.0
> ``` |
There might be confounding among your variables. You can look at the design matrix to see what’s going on. Please consult your statistician to see what you can do.
发件人: ***@***.*** ***@***.*** 代表 KristenKrolick
发送时间: 2023年11月17日 3:49
收件人: haowulab/DSS
抄送: Hao Wu; Mention
主题: Re: [haowulab/DSS] Question about paired experimental design with correction plus testing interaction effect? (Issue #41)
@haowulab <https://github.com/haowulab> Sorry about this, I would just like to point out that I had my design data.frame wrong because of the way I had input my pain values (they did not correspond to the same order as my file names & were hence random data). Here is me fixing this and now including the full code:
Basically, now when my design frame,design.csv <https://github.com/haowulab/DSS/files/13382740/design.csv> , is fixed so that my "response" aka pain "yes" or "no" variables are in the correct location, DMLfit no longer works, returning collinearity error:
require(bsseq)
library(dmrseq)
files <-list.files("/data/path/bismarkcovfiles")
files
setwd("/data/path/bismarkcovfiles")
bismarkBSseq <- read.bismark(files,
loci = NULL,
colData = NULL,
rmZeroCov = FALSE,
strandCollapse = TRUE,
verbose = TRUE)
#covar dataframe consisting of columns of cell type estimates and rows consisting of each sample:
## (preop samples 1 through 51 as rows 1 through 51 and then postop samples as rows 52 - 102)
covar <- read.csv("/data/path/CellTypeCovar_same51Visit1_2.csv")
Treatment = factor(rep(c("Preop","Postop"),each = 51))
pair = factor(rep(1:51,2))
response <- factor(c("Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes"))
#also tried this guy immediately below just in case, but had same error type of collinearity after attempting DMLfit function:
#response = factor(c("1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1", "1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1"))
##and also tried this guy immediately below just in case, but had same error type of collinearity after attempting DMLfit function:
#response = c("1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1", "1", "0", "1", "1", "1", "0", "0", "1", "0", "0", "0", "1", "1", "1", "0", "0", "0", "1", "0", "0", "1", "0", "1", "1", "0", "0", "1", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0", "1", "0", "1", "1", "1", "1", "1", "0", "1", "1", "1", "1", "1")
design = data.frame(Treatment, pair, response, covar)
#need to filter bismarkBSseq by chromosome because my analysis is too large to run
chr1_bismarkBSseq<- chrSelectBSseq(bismarkBSseq, seqnames = "chr1", order = TRUE)
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.59732e-18"
##Hmmm above says it means I have a problem of collinearity, so I ran a Pearson Correlation and turns out CD8_total was related to pain, so getting rid of CD8:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.61807e-18"
###trying it without NK and CD8 which were correlated > .1
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + Mono_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.62955e-18"
##trying it without NK and CD8, B, and Mono which were almost closer to correlated?
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response + CD4_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.67338e-18"
#Also tried using Hao's formula:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response*(Treatment + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.42241e-18"
#Trying Hao's formula without CD8:
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + response*(Treatment + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + GRANULOCYTES_TOTAL))
#Error returned: "Fitting DML model for CpG site: Error in solve.default(XTVinv %*% X) : system is computationally singular: reciprocal condition number = 1.45254e-18"
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/R/4.2.1/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/4.2.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] dmrseq_1.18.1 DSS_2.46.0 BiocParallel_1.32.4 bsseq_1.34.0 SummarizedExperiment_1.28.0 Biobase_2.58.0 MatrixGenerics_1.10.0
[8] matrixStats_0.63.0 GenomicRanges_1.50.1 GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 rjson_0.2.21 ellipsis_0.3.2 XVector_0.38.0 rstudioapi_0.14 bit64_4.0.5
[7] interactiveDisplayBase_1.36.0 AnnotationDbi_1.60.0 fansi_1.0.4 xml2_1.3.3 codetools_0.2-18 splines_4.2.1
[13] R.methodsS3_1.8.2 sparseMatrixStats_1.10.0 cachem_1.0.6 Rsamtools_2.14.0 dbplyr_2.2.1 png_0.1-8
[19] R.oo_1.25.0 shiny_1.7.3 HDF5Array_1.26.0 BiocManager_1.30.19 readr_2.1.3 compiler_4.2.1
[25] httr_1.4.4 assertthat_0.2.1 Matrix_1.5-3 fastmap_1.1.0 limma_3.54.0 cli_3.6.1
[31] later_1.3.0 htmltools_0.5.4 prettyunits_1.1.1 tools_4.2.1 gtable_0.3.3 glue_1.6.2
[37] GenomeInfoDbData_1.2.9 annotatr_1.24.0 reshape2_1.4.4 dplyr_1.1.1 doRNG_1.8.2 rappdirs_0.3.3
[43] Rcpp_1.0.10 bumphunter_1.40.0 vctrs_0.6.1 Biostrings_2.66.0 rhdf5filters_1.10.0 nlme_3.1-160
[49] rtracklayer_1.58.0 iterators_1.0.14 DelayedMatrixStats_1.20.0 stringr_1.5.0 mime_0.12 lifecycle_1.0.3
[55] restfulr_0.0.15 rngtools_1.5.2 gtools_3.9.4 XML_3.99-0.13 AnnotationHub_3.6.0 zlibbioc_1.44.0
[61] scales_1.2.1 BSgenome_1.66.2 hms_1.1.2 promises_1.2.0.1 rhdf5_2.42.0 RColorBrewer_1.1-3
[67] yaml_2.3.6 curl_4.3.3 memoise_2.0.1 ggplot2_3.4.2 biomaRt_2.54.0 stringi_1.7.12
[73] RSQLite_2.2.19 BiocVersion_3.16.0 BiocIO_1.8.0 foreach_1.5.2 permute_0.9-7 GenomicFeatures_1.50.2
[79] filelock_1.0.2 rlang_1.1.0 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45 Rhdf5lib_1.20.0
[85] GenomicAlignments_1.34.0 bit_4.0.5 tidyselect_1.2.0 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1
[91] generics_0.1.3 DelayedArray_0.24.0 DBI_1.1.3 pillar_1.9.0 KEGGREST_1.38.0 RCurl_1.98-1.9
[97] tibble_3.2.1 crayon_1.5.2 utf8_1.2.3 BiocFileCache_2.6.0 tzdb_0.3.0 progress_1.2.2
[103] locfit_1.5-9.6 grid_4.2.1 data.table_1.14.8 blob_1.2.3 digest_0.6.31 xtable_1.8-4
[109] httpuv_1.6.6 regioneR_1.30.0 outliers_0.15 R.utils_2.12.2 munsell_0.5.0
```
—
Reply to this email directly, view it on GitHub <#41 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD7H4N5RNWUKQ3PIWSOPULLYEZUZPAVCNFSM6AAAAAA6472JXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJVGIYDSOBWGI> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AD7H4N4LHOUZJLQIGHZUGDLYEZUZPA5CNFSM6AAAAAA6472JXKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTMGHTYM.gif> Message ID: ***@***.*** ***@***.***> >
|
@haowulab , Question about your coverage reply:
I performed below just to get those sites sig after surgery (not associated with pain yet). But I am getting significant loci being returned that are lower coverage. I was trying to find more detailed info for how you take coverage into consideration during the stats, can you please provide? Also can you please check my code below to make sure I am not missing a step?-- like for example, I do not have to perform the smoothing step or anything?? require(bsseq)
library(dmrseq)
files <-list.files("/data/path/bismarkcovfiles")
files
setwd("/data/path/bismarkcovfiles")
bismarkBSseq <- read.bismark(files,
loci = NULL,
colData = NULL,
rmZeroCov = FALSE,
strandCollapse = TRUE,
verbose = TRUE)
#covar dataframe consisting of columns of cell type estimates and rows consisting of each sample:
## (preop samples 1 through 51 as rows 1 through 51 and then postop samples as rows 52 - 102)
covar <- read.csv("/data/path/CellTypeCovar_same51Visit1_2.csv")
Treatment = factor(rep(c("Preop","Postop"),each = 51))
pair = factor(rep(1:51,2))
design = data.frame(Treatment, pair, covar)
#need to filter bismarkBSseq by chromosome because my analysis is too large to run
chr1_bismarkBSseq<- chrSelectBSseq(bismarkBSseq, seqnames = "chr1", order = TRUE)
DMLfit = DMLfit.multiFactor(chr1_bismarkBSseq, design, formula = ~Treatment + pair + Mono_3 + NK_3 + B_TOTAL + CD4_TOTAL + CD8_TOTAL + GRANULOCYTES_TOTAL + (Treatment*response))
dmlTest = DMLtest.multiFactor(DMLfit, term = "Treatment") |
@haowulab
Hello, I just performed a paired analysis. For my next analysis, I will need the methylation values per individual sample for each of my significant CpG sites in the paired analysis. Since I performed the DMLfit to correct for differences in cell types, I do not want to take the original methylation values (from my bismark cov files) for performing my next analysis. Is there a way to get the DSS corrected methylation values per individual sample back?
Thank you so much for your help and tool!
Kristen
The text was updated successfully, but these errors were encountered: