Skip to content
This repository has been archived by the owner on Apr 11, 2021. It is now read-only.

Can you be a bit more specific about using insurance_qa_embeddings.py? #13

Open
wailoktam opened this issue Jun 21, 2016 · 3 comments
Open

Comments

@wailoktam
Copy link

Hi, thanks for your great work and support. I have got insurance_qa_evaluation.py running with Kera's own embedding layer. How do I bridge insurance_qa_embeddings.py to insurance_qa_evaluation.py?
I want to substitute parts of the input sentences with their synonyms or antonyms and see how training and prediction goes? Any suggestion on how it can be done?

@codekansas
Copy link
Owner

Hmm that seems pretty interesting, let me know how it goes. Embeddings can be trained in different ways, or you can use the ones I provided. You can make your own embeddings and specify them here. insurance_qa_embeddings.py is a stand-alone file for generating embeddings using Word2Vec, and it saves the resulting embeddings in a particular location which you can load elsewhere. So you could mess with the embeddings, save the file, then load it and run the program.

@wailoktam
Copy link
Author

Hi, thanks for your prompt reply.

So the following line is necessary for using the embedding layer in Keras?

'initial_embed_weights': np.load('word2vec_100_dim.embeddings')

Is my understanding correction: Keras won't train its own word embeddings. We need to supply the weights generated from word2vec/glovec by the above line ?

However, is the file "models/word2vec_100_dim.h5" in insurance_qa_embeddings.py renamed to
"'word2vec_100_dim.embeddings'"in insurance_qa_evaluation.py or are they different files?

If I want to use a different dataset, I need to generate a different "'word2vec_100_dim.embeddings', right? How should I do it? I know how to save a model in Genism but this "word2vec_100_dim.embeddings" is not the model file yield from the save command in Genism, right?

Many thanks in advance.

@codekansas
Copy link
Owner

Keras will train its own word embeddings, it just works better if you start with word2vec embeddings (you can choose whether or not they are trained as well). I think they're the same file, probably just a naming difference. And yes, you should generate a new embeddings file. The file word2vec_100_dim.embeddings is, specifically, the embedding part of the Gensim model (consult the code to see what this means).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants