-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of coauthorship dataset #180
Conversation
toponetx/datasets/graph.py
Outdated
cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True) | ||
|
||
simplices = [] | ||
for dim in list(range(len(cochains)))[::-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to explicitly construct a list, use the step
parameter of range
:
for dim in list(range(len(cochains)))[::-1]: | |
for dim in range(len(cochains) - 1, -1, -1): |
toponetx/datasets/graph.py
Outdated
li = list(cochains[dim].keys()) | ||
simplices += [list(l) for l in li] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to explicitly construct the list here, you are only iterating over its values, which you can do with the iterator you have in the first place:
li = list(cochains[dim].keys()) | |
simplices += [list(l) for l in li] | |
simplices += [list(l) for l in cochains[dim].keys()] |
toponetx/datasets/graph.py
Outdated
sc = SimplicialComplex(simplices) | ||
|
||
for i in range(len(cochains)): | ||
dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, explicit list construction is not necessary, sorted
happily works on an iterable:
dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()} | |
dic = {tuple(sorted(k)): v for k, v in cochains[i].items()} |
Codecov Report
@@ Coverage Diff @@
## main #180 +/- ##
==========================================
+ Coverage 77.18% 77.29% +0.11%
==========================================
Files 22 22
Lines 2402 2414 +12
==========================================
+ Hits 1854 1866 +12
Misses 548 548
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the last commit + see comment about storage of dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has been addressed by now.
|
||
References | ||
---------- | ||
[SNN20] Stefania Ebli, Michael Defferrard and Gard Spreemann. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do no think the description of the dataset is clear enough.
- What is the dimension of the SC?
- What is the problem exactly ? Can you describe the training pairs in more details
- Can you provide a brief statistics about the dataset?
toponetx/datasets/graph.py
Outdated
- the number of citations attributed to the given collaborations of k authors. | ||
|
||
""" | ||
cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's choose one name for this dataset and stick to it:
"coauthorship", then "coauthorship.npy" and the variable itself can be called coauthorship
@@ -9,7 +10,9 @@ | |||
from toponetx.algorithms.spectrum import hodge_laplacian_eigenvectors | |||
from toponetx.transform.graph_to_simplicial_complex import graph_2_clique_complex | |||
|
|||
__all__ = ["karate_club"] | |||
__all__ = ["karate_club", "coauthorship"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the addition of this dataset to graph is justified, maybe it needs a different file all together. Can you justify your choice for graph? other people might not find it intuitive anyway,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can you suggest a name for the file?
I'll be away for a week, I can fix the code based on the comments when I'm back.
I have fixed the changes suggested above. However please advise about where to place it. The dataset originates from a citation graph which is why I placed it in graph.py first, but it is already pre-processed as a simplicial complex and therefore I agree graph.py may not be ideal place. But I'm not sure where it should be placed. |
@USFCA-MSDS we need to merge this to allow the challenge's participant @Hellsegga to move forward with their submission. |
This is the dataset used in the Simplicial Neural Networks paper (https://arxiv.org/abs/2010.03633). I'd like to include it here to be able to use it in my implementation in TopoModelX (pyt-team/TopoModelX#98). It can hopefully be useful to test other methods as well.