Methods for generating node embeddings from word embeddings #8

caufieldjh · 2022-06-09T17:13:42Z

While updating NEAT to use the most recent grape release, @justaddcoffee and @hrshdhgd and I took a look at what we're using to generate node embeddings based on pretrained word embeddings like BERT etc. : https://github.com/Knowledge-Graph-Hub/NEAT/blob/main/neat/graph_embedding/graph_embedding.py

We know we can run something like get_okapi_tfidf_weighted_textual_embedding() on a graph, but is there a more "on demand" way to run this in grape now for an arbitrary graph?

The text was updated successfully, but these errors were encountered:

justaddcoffee · 2022-06-09T17:21:21Z

Thanks @caufieldjh - specifically what we are looking for @LucaCappelletti94 @zommiommy is something like this:

g = Ensmallen.from_csv(**my_graph_params)
my_embedddings = get_okapi_tfidf_weighted_textual_embedding(g)

If I understand correctly (which I might not), the only way to do this now is:

get_okapi_tfidf_weighted_textual_embedding("KGCOVID19") # <- goes to KG-Hub and downloads graph files, gets text from nodes file, and gets embeddings from name and description columns

LucaCappelletti94 · 2022-06-09T17:37:56Z

Hello @justaddcoffee and @caufieldjh, while there are methods already parametrized for the various repositories, the one you have reported here is the most generic one and does not work on graphs, but on generic CSVs. It requires the path of the CSV to parse: you can see its documentation by either using the help python function or by using the SHIFT+TAB shortcut in a Jupiter Notebook.

justaddcoffee · 2022-06-09T17:39:38Z

Okay, great - thanks @LucaCappelletti94

@caufieldjh can you have a look and see if this provides what we need in NEAT to switch to Grape for text embeddings? I think it should

caufieldjh · 2022-06-09T21:01:12Z

It looks like it should work, though there is some kind of name collision between Embiggen's transformers and the transformers providing the tokenizer:

>>> get_okapi_tfidf_weighted_textual_embedding(path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/harry/neat-env/lib/python3.8/site-packages/cache_decorator/cache.py", line 613, in wrapped
    result = function(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/ensmallen/datasets/get_okapi_tfidf_weighted_textual_embedding.py", line 88, in get_okapi_tfidf_weighted_textual_embedding
    from transformers import AutoTokenizer
ImportError: cannot import name 'AutoTokenizer' from 'transformers' (/home/harry/neat-env/lib/python3.8/site-packages/embiggen/transformers/__init__.py)

LucaCappelletti94 · 2022-06-10T07:48:33Z

That's extremely odd, I'll look into it.

LucaCappelletti94 · 2022-06-12T12:34:13Z

Ok so, I have managed to reproduce it and tried to resolve this collision for a while. This has turned out to be quite cursed, so I will fall-back to the "I'm just going to rename that" option.

I'm thinking about what name could fit that better. It's the submodule that given a node embedding and a graph gets you the edge embedding or any of the likes. A name like graph_processing seems too vague. Do you have any proposals?

LucaCappelletti94 · 2022-06-12T12:34:54Z

Maybe embedding_transformers?

LucaCappelletti94 · 2022-06-12T12:42:37Z

I have renamed it for now from transformers to embedding_transformers. If we can find a better name, I'm absolutely up for it. At least for now there won't be a collision.

caufieldjh · 2022-06-13T15:23:15Z

I think that should work fine - at least I can't see a package on Pypi with that name so it shouldn't create the same kind of collision

LucaCappelletti94 · 2022-06-15T14:10:15Z

This issue should be now resolved, @caufieldjh could you confirm?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methods for generating node embeddings from word embeddings #8

Methods for generating node embeddings from word embeddings #8

caufieldjh commented Jun 9, 2022

justaddcoffee commented Jun 9, 2022

LucaCappelletti94 commented Jun 9, 2022

justaddcoffee commented Jun 9, 2022

caufieldjh commented Jun 9, 2022

LucaCappelletti94 commented Jun 10, 2022

LucaCappelletti94 commented Jun 12, 2022

LucaCappelletti94 commented Jun 12, 2022

LucaCappelletti94 commented Jun 12, 2022

caufieldjh commented Jun 13, 2022

LucaCappelletti94 commented Jun 15, 2022

Methods for generating node embeddings from word embeddings #8

Methods for generating node embeddings from word embeddings #8

Comments

caufieldjh commented Jun 9, 2022

justaddcoffee commented Jun 9, 2022

LucaCappelletti94 commented Jun 9, 2022

justaddcoffee commented Jun 9, 2022

caufieldjh commented Jun 9, 2022

LucaCappelletti94 commented Jun 10, 2022

LucaCappelletti94 commented Jun 12, 2022

LucaCappelletti94 commented Jun 12, 2022

LucaCappelletti94 commented Jun 12, 2022

caufieldjh commented Jun 13, 2022

LucaCappelletti94 commented Jun 15, 2022