-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fasttextB_embeddings_300d.npy file #2
Comments
Hi, Thanks for tryring out DeepXML. In general: the embedding files are created using pre-trained model available at the fasttext's website. You can use the following link to download the embedding file for EURLex-4K: https://owncloud.iitd.ac.in/nextcloud/index.php/s/5XsZAKLbHfbpfZA Please let me know, if you need anything else. |
thanks for your insight! does that mean that there are different embeddings for every dataset? the way I tried to generate the embedding files was using for example wki.en.vec, reading it in as a np 0 dimensional array, and then saving that to a .npy file. it didn't give the same results as the embedding file that you shared, what did you do differently? |
The embedding file in our case contains a V x D matrix, where V is the vocabulary dimension and D is the dimensionality. In other words, there is a vector for each token in the dataset. So, the embedding file would be different for each dataset as the vocabulary will be different. We use the FastText model to compute embedding for each token in vocabulary, which is then passed to our model. |
thanks for the clarification. so what I ended up doing was something like: and then using Am I understanding your process correctly? |
Hi I have added an example here which computes embeddings from pre-trained fasttext model. You are free to train your own model provided your corpus is: (i) large enough, (ii) general english or relevant to the task. |
Hi Please re-install pyxclib. The latest version contains required files. See this link. |
Hey, |
Hi, You can follow this example to get embeddings for a given vocabulary: https://github.com/kunaldahiya/pyxclib/blob/master/xclib/examples/get_ftx_embeddings.py |
I tried to do this but eurlex only had bow files features and not the text corpus. |
hi, I am running the basic learner
./run_main.sh 0 DeepXML EURLex-4k 0 108
and everything is going fine except I don't have the fasttext embeddings file.The error output is
Embedding File not found. Check path or set 'init' to null
. Where/how was the .npy embeddings file created? Is it from the pretrained word vectors on fasttext's website?would appreciate any info to illuminate this issue! thanks
The text was updated successfully, but these errors were encountered: