You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope this finds you well. I had a quick question regarding the estimations of the latent correlation matrices.
Description
I would like to get an accurate estimate of the latent correlation matrix. There is no compiling error, but I've observed that when I use both the original and approx, I reach a maximum iteration warning, which is likely due to the fact that my data is sparse. I am curious to see if this compromises the accuracy of the estimate.
I would like to know which estimate is better, and if there is a way to improve these estimates because some of the results are quite different. I've tried to adjust the tol parameter as well, but the results stay relatively similar for the original. I have also experimented with different shrinkage values and lower boundary values.
latentcor version: 0.2.5
Python version: Python 3.11.7
Operating System: macOS Ventura
What I Did
Consider the nxk matrix mat where n > k. The latent correlation we want to measure are the column-wise covariates. As such, the tps argument is simply just an array with "tru" (the data we are dealing with are all gene expressions of single cell data, so we assume truncated Gaussian copula).
latentcor(mat, tps = tps_arr, tol = 1e-17 ,method ='original', use_nearPD=True)['R'] #using original method
latentcor(mat, tps = tps_arr, method ='approx', nu = 0.01, ratio = 0.9, use_nearPD=True)['R'] #use approx
The end result is that I get higher magnitudes of correlation (+/- 0.1 more depending on the sign +/-) when using the approximation . However, since both are done executing by the max iteration termination, I'm not sure what is the better estimate.
After looking at some of the base code, I see it may be relevant to nearest_corr(), but the n_fact parameter you have set is already so high, so I am a bit confused by the difference in results. Thank you!
The text was updated successfully, but these errors were encountered:
Hi, @qianzach; sorry for the late reply! Generally speaking, 'original' is more accurate than 'approx'. However, I'm investigating the convergence problem you mentioned. I'll get back to you ASAP!
No worries! I see. Thank you so much! Just to provide an additional detail-- the issue occurs when using statsmodels.stats.correlation_tools.corr_nearest. I reach maximum iterations (likely due to very sparse data), but it seems like this might be the source of the differences in the approximation and the original method latent correlation.
Hi, @qianzach. I think we use corr_nearest() just to further adjust the output to guarantee the output matrix is positive definite. If it doesn't converge properly, you can try turning it off by setting use_nearPD = False. Then, you can get the semi-definite output matrix, which you may adjust yourself. And it's the original result from our algorithm :)
Hi @mingzehuang,
I hope this finds you well. I had a quick question regarding the estimations of the latent correlation matrices.
Description
I would like to get an accurate estimate of the latent correlation matrix. There is no compiling error, but I've observed that when I use both the
original
andapprox
, I reach a maximum iteration warning, which is likely due to the fact that my data is sparse. I am curious to see if this compromises the accuracy of the estimate.I would like to know which estimate is better, and if there is a way to improve these estimates because some of the results are quite different. I've tried to adjust the
tol
parameter as well, but the results stay relatively similar for the original. I have also experimented with different shrinkage values and lower boundary values.What I Did
Consider the nxk matrix
mat
where n > k. The latent correlation we want to measure are the column-wise covariates. As such, thetps
argument is simply just an array with "tru" (the data we are dealing with are all gene expressions of single cell data, so we assume truncated Gaussian copula).The end result is that I get higher magnitudes of correlation (+/- 0.1 more depending on the sign +/-) when using the approximation . However, since both are done executing by the max iteration termination, I'm not sure what is the better estimate.
After looking at some of the base code, I see it may be relevant to nearest_corr(), but the n_fact parameter you have set is already so high, so I am a bit confused by the difference in results. Thank you!
The text was updated successfully, but these errors were encountered: