Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for generating node embeddings from word embeddings #8

Open
caufieldjh opened this issue Jun 9, 2022 · 10 comments
Open

Methods for generating node embeddings from word embeddings #8

caufieldjh opened this issue Jun 9, 2022 · 10 comments

Comments

@caufieldjh
Copy link
Contributor

While updating NEAT to use the most recent grape release, @justaddcoffee and @hrshdhgd and I took a look at what we're using to generate node embeddings based on pretrained word embeddings like BERT etc. : https://github.com/Knowledge-Graph-Hub/NEAT/blob/main/neat/graph_embedding/graph_embedding.py

We know we can run something like get_okapi_tfidf_weighted_textual_embedding() on a graph, but is there a more "on demand" way to run this in grape now for an arbitrary graph?

@justaddcoffee
Copy link

Thanks @caufieldjh - specifically what we are looking for @LucaCappelletti94 @zommiommy is something like this:

g = Ensmallen.from_csv(**my_graph_params)
my_embedddings = get_okapi_tfidf_weighted_textual_embedding(g)

If I understand correctly (which I might not), the only way to do this now is:

get_okapi_tfidf_weighted_textual_embedding("KGCOVID19") # <- goes to KG-Hub and downloads graph files, gets text from nodes file, and gets embeddings from name and description columns

@LucaCappelletti94
Copy link
Member

Hello @justaddcoffee and @caufieldjh, while there are methods already parametrized for the various repositories, the one you have reported here is the most generic one and does not work on graphs, but on generic CSVs. It requires the path of the CSV to parse: you can see its documentation by either using the help python function or by using the SHIFT+TAB shortcut in a Jupiter Notebook.

@justaddcoffee
Copy link

Okay, great - thanks @LucaCappelletti94

@caufieldjh can you have a look and see if this provides what we need in NEAT to switch to Grape for text embeddings? I think it should

@caufieldjh
Copy link
Contributor Author

It looks like it should work, though there is some kind of name collision between Embiggen's transformers and the transformers providing the tokenizer:

>>> get_okapi_tfidf_weighted_textual_embedding(path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/harry/neat-env/lib/python3.8/site-packages/cache_decorator/cache.py", line 613, in wrapped
    result = function(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/ensmallen/datasets/get_okapi_tfidf_weighted_textual_embedding.py", line 88, in get_okapi_tfidf_weighted_textual_embedding
    from transformers import AutoTokenizer
ImportError: cannot import name 'AutoTokenizer' from 'transformers' (/home/harry/neat-env/lib/python3.8/site-packages/embiggen/transformers/__init__.py)

@LucaCappelletti94
Copy link
Member

That's extremely odd, I'll look into it.

@LucaCappelletti94
Copy link
Member

Ok so, I have managed to reproduce it and tried to resolve this collision for a while. This has turned out to be quite cursed, so I will fall-back to the "I'm just going to rename that" option.

I'm thinking about what name could fit that better. It's the submodule that given a node embedding and a graph gets you the edge embedding or any of the likes. A name like graph_processing seems too vague. Do you have any proposals?

@LucaCappelletti94
Copy link
Member

Maybe embedding_transformers?

@LucaCappelletti94
Copy link
Member

I have renamed it for now from transformers to embedding_transformers. If we can find a better name, I'm absolutely up for it. At least for now there won't be a collision.

@caufieldjh
Copy link
Contributor Author

I think that should work fine - at least I can't see a package on Pypi with that name so it shouldn't create the same kind of collision

@LucaCappelletti94
Copy link
Member

This issue should be now resolved, @caufieldjh could you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants