Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of coauthorship dataset #180

Merged
merged 6 commits into from
Jul 10, 2023
Merged

Addition of coauthorship dataset #180

merged 6 commits into from
Jul 10, 2023

Conversation

Hellsegga
Copy link
Contributor

@Hellsegga Hellsegga commented Jun 27, 2023

This is the dataset used in the Simplicial Neural Networks paper (https://arxiv.org/abs/2010.03633). I'd like to include it here to be able to use it in my implementation in TopoModelX (pyt-team/TopoModelX#98). It can hopefully be useful to test other methods as well.

toponetx/datasets/graph.py Show resolved Hide resolved
toponetx/datasets/graph.py Show resolved Hide resolved
toponetx/datasets/graph.py Outdated Show resolved Hide resolved
cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True)

simplices = []
for dim in list(range(len(cochains)))[::-1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to explicitly construct a list, use the step parameter of range:

Suggested change
for dim in list(range(len(cochains)))[::-1]:
for dim in range(len(cochains) - 1, -1, -1):

Comment on lines 171 to 172
li = list(cochains[dim].keys())
simplices += [list(l) for l in li]
Copy link
Member

@ffl096 ffl096 Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to explicitly construct the list here, you are only iterating over its values, which you can do with the iterator you have in the first place:

Suggested change
li = list(cochains[dim].keys())
simplices += [list(l) for l in li]
simplices += [list(l) for l in cochains[dim].keys()]

sc = SimplicialComplex(simplices)

for i in range(len(cochains)):
dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, explicit list construction is not necessary, sorted happily works on an iterable:

Suggested change
dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()}
dic = {tuple(sorted(k)): v for k, v in cochains[i].items()}

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #180 (4067f4c) into main (cbe1475) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #180      +/-   ##
==========================================
+ Coverage   77.18%   77.29%   +0.11%     
==========================================
  Files          22       22              
  Lines        2402     2414      +12     
==========================================
+ Hits         1854     1866      +12     
  Misses        548      548              
Impacted Files Coverage Δ
toponetx/datasets/graph.py 97.77% <100.00%> (+0.80%) ⬆️

Copy link
Contributor Author

@Hellsegga Hellsegga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the last commit + see comment about storage of dataset.

Copy link
Contributor

@USFCA-MSDS USFCA-MSDS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has been addressed by now.


References
----------
[SNN20] Stefania Ebli, Michael Defferrard and Gard Spreemann.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do no think the description of the dataset is clear enough.

  • What is the dimension of the SC?
  • What is the problem exactly ? Can you describe the training pairs in more details
  • Can you provide a brief statistics about the dataset?

- the number of citations attributed to the given collaborations of k authors.

"""
cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the name of the file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's choose one name for this dataset and stick to it:
"coauthorship", then "coauthorship.npy" and the variable itself can be called coauthorship

@@ -9,7 +10,9 @@
from toponetx.algorithms.spectrum import hodge_laplacian_eigenvectors
from toponetx.transform.graph_to_simplicial_complex import graph_2_clique_complex

__all__ = ["karate_club"]
__all__ = ["karate_club", "coauthorship"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the addition of this dataset to graph is justified, maybe it needs a different file all together. Can you justify your choice for graph? other people might not find it intuitive anyway,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, can you suggest a name for the file?
I'll be away for a week, I can fix the code based on the comments when I'm back.

@ffl096 ffl096 added the enhancement New feature or request label Jun 29, 2023
@Hellsegga
Copy link
Contributor Author

Hellsegga commented Jul 8, 2023

I have fixed the changes suggested above.

However please advise about where to place it. The dataset originates from a citation graph which is why I placed it in graph.py first, but it is already pre-processed as a simplicial complex and therefore I agree graph.py may not be ideal place. But I'm not sure where it should be placed.

@ninamiolane
Copy link
Collaborator

@USFCA-MSDS we need to merge this to allow the challenge's participant @Hellsegga to move forward with their submission.
--> If there are remaining issues with the code, could you point them out so that @Hellsegga can create a follow-up PR? Thanks!

@ninamiolane ninamiolane merged commit 6435f29 into pyt-team:main Jul 10, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants