Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhomogeneous walks length gives error #1

Open
arushi-08 opened this issue Aug 14, 2023 · 1 comment
Open

Inhomogeneous walks length gives error #1

arushi-08 opened this issue Aug 14, 2023 · 1 comment

Comments

@arushi-08
Copy link

arushi-08 commented Aug 14, 2023

Hi, thanks for sharing the code of your work. I'd love to run this project and see it in action.

I have loaded the BirdwatchSG dataset from HuggingFace:

from datasets import load_dataset
edge_data = load_dataset("Twitter/SignedGraphs")
edge_data = pd.DataFrame(edge_data['train'])

However, I am facing the following error in the last training cell : ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (55,) + inhomogeneous part.
in this part of the code:

Cell In[26], line 124, in TrainingSamples.convert(self, walks, topic_idx, workers)
    122 print("check [len(w) for w in walks]")
    123 print([len(w) for w in walks])
--> 124 walks_lists = np.array_split(walks, workers)

which indicates that the elements in the walks array don't have the same shape.
I have printed the walks array and indeed this is the case:
[2, 2, 3, 2, 2, 3, 2, 2, 3, 2, 3, 2, 2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 2, 3, 2, 2, 2, 2, 4, 3, 2, 2, 2, 3, 2, 4, 2, 3, 2, 3, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 2, 2, 3, 2]

This is happening because for some of the walks, there are more neighbors present, so there are longer walk_options :

walk_options = d_graph[walk[-1]].get(neighbors_key, None)

For example for source node: 2665, the walk options are: [482, 2804], and for 482 the walk options are [1595]. So the walk length becomes [2665, 482, 1595], (whereas the walk length is normally 2 for other nodes). But for some source nodes, the walk length is only [2345, 593].

I'd like to know if we need to limit the walk length to be homogenous (i.e., either through padding the length)? or allow for inhomogenous walks?

@arushi-08
Copy link
Author

arushi-08 commented Aug 22, 2023

I have decided to pad the walks_length, and doing so resolved this issue.
However, I am not getting close to the tSNE visualization results as shown in the paper.
I understand that tSNE is stochastic, so it's likely to not get the same results, but my results show that the positive and negative opinions are not at all distinguishable.
My sklearn version is 1.3.0.
I'd appreciate your guidance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant