Addition of coauthorship dataset #180

Hellsegga · 2023-06-27T20:41:32Z

This is the dataset used in the Simplicial Neural Networks paper (https://arxiv.org/abs/2010.03633). I'd like to include it here to be able to use it in my implementation in TopoModelX (pyt-team/TopoModelX#98). It can hopefully be useful to test other methods as well.

toponetx/datasets/graph.py

ffl096 · 2023-06-27T20:49:59Z

toponetx/datasets/graph.py

+    cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True)
+
+    simplices = []
+    for dim in list(range(len(cochains)))[::-1]:


No need to explicitly construct a list, use the step parameter of range:

Suggested change

for dim in list(range(len(cochains)))[::-1]:

for dim in range(len(cochains) - 1, -1, -1):

ffl096 · 2023-06-27T20:52:59Z

toponetx/datasets/graph.py

+        li = list(cochains[dim].keys())
+        simplices += [list(l) for l in li]


No need to explicitly construct the list here, you are only iterating over its values, which you can do with the iterator you have in the first place:

Suggested change

li = list(cochains[dim].keys())

simplices += [list(l) for l in li]

simplices += [list(l) for l in cochains[dim].keys()]

ffl096 · 2023-06-27T20:54:35Z

toponetx/datasets/graph.py

+    sc = SimplicialComplex(simplices)
+
+    for i in range(len(cochains)):
+        dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()}


Again, explicit list construction is not necessary, sorted happily works on an iterable:

Suggested change

dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()}

dic = {tuple(sorted(k)): v for k, v in cochains[i].items()}

codecov · 2023-06-28T06:24:11Z

Codecov Report

Merging #180 (4067f4c) into main (cbe1475) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #180      +/-   ##
==========================================
+ Coverage   77.18%   77.29%   +0.11%     
==========================================
  Files          22       22              
  Lines        2402     2414      +12     
==========================================
+ Hits         1854     1866      +12     
  Misses        548      548

Impacted Files	Coverage Δ
toponetx/datasets/graph.py	`97.77% <100.00%> (+0.80%)`	⬆️

Hellsegga

Fixed in the last commit + see comment about storage of dataset.

USFCA-MSDS

I think this has been addressed by now.

USFCA-MSDS · 2023-06-29T02:59:33Z

toponetx/datasets/graph.py

+
+    References
+    ----------
+    [SNN20] Stefania Ebli, Michael Defferrard and Gard Spreemann.


I do no think the description of the dataset is clear enough.

What is the dimension of the SC?

What is the problem exactly ? Can you describe the training pairs in more details

Can you provide a brief statistics about the dataset?

USFCA-MSDS · 2023-06-29T03:00:04Z

toponetx/datasets/graph.py

+            - the number of citations attributed to the given collaborations of k authors.
+
+    """
+    cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True)


Can you change the name of the file?

Agreed, let's choose one name for this dataset and stick to it:
"coauthorship", then "coauthorship.npy" and the variable itself can be called coauthorship

USFCA-MSDS · 2023-06-29T03:00:51Z

toponetx/datasets/graph.py

@@ -9,7 +10,9 @@
 from toponetx.algorithms.spectrum import hodge_laplacian_eigenvectors
 from toponetx.transform.graph_to_simplicial_complex import graph_2_clique_complex

-__all__ = ["karate_club"]
+__all__ = ["karate_club", "coauthorship"]


I am not sure the addition of this dataset to graph is justified, maybe it needs a different file all together. Can you justify your choice for graph? other people might not find it intuitive anyway,

Yes, can you suggest a name for the file?
I'll be away for a week, I can fix the code based on the comments when I'm back.

Hellsegga · 2023-07-08T10:15:19Z

I have fixed the changes suggested above.

However please advise about where to place it. The dataset originates from a citation graph which is why I placed it in graph.py first, but it is already pre-processed as a simplicial complex and therefore I agree graph.py may not be ideal place. But I'm not sure where it should be placed.

ninamiolane · 2023-07-10T18:32:01Z

@USFCA-MSDS we need to merge this to allow the challenge's participant @Hellsegga to move forward with their submission.
--> If there are remaining issues with the code, could you point them out so that @Hellsegga can create a follow-up PR? Thanks!

Hellsegga added 3 commits June 12, 2023 23:40

Sorting when converting set to tuple

8737269

Merge branch 'main' of github.com:Hellsegga/TopoNetX

6771f64

Addition of coauthorship dataset

bd93e4e

ffl096 requested changes Jun 27, 2023

View reviewed changes

Suggested fixes on PR

2d986a3

Hellsegga commented Jun 28, 2023

View reviewed changes

ffl096 approved these changes Jun 28, 2023

View reviewed changes

USFCA-MSDS requested review from mhajij and USFCA-MSDS June 29, 2023 02:57

USFCA-MSDS reviewed Jun 29, 2023

View reviewed changes

USFCA-MSDS requested changes Jun 29, 2023

View reviewed changes

ffl096 added the enhancement New feature or request label Jun 29, 2023

Hellsegga added 2 commits July 8, 2023 11:23

Merge branch 'pyt-team:main' into main

4067f4c

Changed names and completed description of dataset

2beee70

Hellsegga requested a review from USFCA-MSDS July 10, 2023 17:19

ninamiolane merged commit 6435f29 into pyt-team:main Jul 10, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of coauthorship dataset #180

Addition of coauthorship dataset #180

Hellsegga commented Jun 27, 2023 •

edited

Loading

ffl096 Jun 27, 2023

ffl096 Jun 27, 2023 •

edited

Loading

ffl096 Jun 27, 2023

codecov bot commented Jun 28, 2023 •

edited

Loading

Hellsegga left a comment

USFCA-MSDS left a comment

USFCA-MSDS Jun 29, 2023

USFCA-MSDS Jun 29, 2023

ninamiolane Jun 29, 2023

USFCA-MSDS Jun 29, 2023

Hellsegga Jun 29, 2023

Hellsegga commented Jul 8, 2023 •

edited

Loading

ninamiolane commented Jul 10, 2023

	for dim in list(range(len(cochains)))[::-1]:
	for dim in range(len(cochains) - 1, -1, -1):

		li = list(cochains[dim].keys())
		simplices += [list(l) for l in li]

	li = list(cochains[dim].keys())
	simplices += [list(l) for l in li]
	simplices += [list(l) for l in cochains[dim].keys()]

	dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()}
	dic = {tuple(sorted(k)): v for k, v in cochains[i].items()}

Addition of coauthorship dataset #180

Addition of coauthorship dataset #180

Conversation

Hellsegga commented Jun 27, 2023 • edited Loading

ffl096 Jun 27, 2023

Choose a reason for hiding this comment

ffl096 Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

ffl096 Jun 27, 2023

Choose a reason for hiding this comment

codecov bot commented Jun 28, 2023 • edited Loading

Codecov Report

Hellsegga left a comment

Choose a reason for hiding this comment

USFCA-MSDS left a comment

Choose a reason for hiding this comment

USFCA-MSDS Jun 29, 2023

Choose a reason for hiding this comment

USFCA-MSDS Jun 29, 2023

Choose a reason for hiding this comment

ninamiolane Jun 29, 2023

Choose a reason for hiding this comment

USFCA-MSDS Jun 29, 2023

Choose a reason for hiding this comment

Hellsegga Jun 29, 2023

Choose a reason for hiding this comment

Hellsegga commented Jul 8, 2023 • edited Loading

ninamiolane commented Jul 10, 2023

Hellsegga commented Jun 27, 2023 •

edited

Loading

ffl096 Jun 27, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading

Hellsegga commented Jul 8, 2023 •

edited

Loading