Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add calculate_cng_indices #97

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sjawhar
Copy link

@sjawhar sjawhar commented Mar 15, 2022

I implemented this function while working on a paper and decided it belongs in this excellent project. I will add tests (I have been comparing to output from R), but there is a major potential issue in the code which I will mark out with inline comments.

if model == "factors":
data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data))))
# TODO: Should this line be here?
data = covariance_to_correlation(data)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line comes from the eigenComputes function. When the data passed in are correlations and cor == TRUE, it calls eigen(cov2cor(corFa(x))). However, it doesn't do this if the variables/observations themselves are passed, in which case it calls eigen(corFA(cov(x))). This produces very different results! I think this is only needed if covariances are passed in rather than correlations, but since this implementation calculates the correlations I think this line should be removed. Agreed?

Copy link
Collaborator

@jbiggsets jbiggsets Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are we assuming the input value here will always be eigenForm == "data"? If so, I think you're right that this should be removed, but I may be misunderstanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right. I was getting confused by the case where dataType == correlation && cor == FALSE, but I think in that case the data passed in should actually be a covariance matrix, not a correlation matrix.


values = np.sort(np.linalg.eigvals(data))[::-1]

num_variables = len(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't len(data) give you the number of observations, rather than the number of variables, assuming data is a numpy array or pandas data frame?

Unfortunately, this doesn't translate exactly from R:

> df <- data.frame(A = c(1, 2, 3), B = c(1, 2, 3))
> length(df) 
[1] 2
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]})
>>> len(df)
3
>>> len(df.values)
3

Also, should we move the line below to the beginning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this line of the code data refers to the correlation matrix, which is square. data.shape[0] == data.shape[1]

The eigenvalues and CNG indices of the dataset
"""
data = corr(data)
if model == "factors":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're just going to have "factors", do we need the model argument? Does it make sense to exclude this for now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if model == "components", the adjustment underneath is not needed. In other words, only taking the correlation of the data is sufficient. We might want to add elif model != "components": raise ValueError

if model == "factors":
data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data))))
# TODO: Should this line be here?
data = covariance_to_correlation(data)
Copy link
Collaborator

@jbiggsets jbiggsets Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are we assuming the input value here will always be eigenForm == "data"? If so, I think you're right that this should be removed, but I may be misunderstanding.

@jbiggsets
Copy link
Collaborator

Thanks so much for submitting this PR! I had a few comments/questions.

Let me know if any of them don't make sense, or if you just want me to submit a PR to merge into this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants