fasttextB_embeddings_300d.npy file #2

cairomo · 2021-06-16T02:05:55Z

hi, I am running the basic learner ./run_main.sh 0 DeepXML EURLex-4k 0 108 and everything is going fine except I don't have the fasttext embeddings file.

The error output is Embedding File not found. Check path or set 'init' to null. Where/how was the .npy embeddings file created? Is it from the pretrained word vectors on fasttext's website?

would appreciate any info to illuminate this issue! thanks

The text was updated successfully, but these errors were encountered:

kunaldahiya · 2021-06-16T13:34:04Z

Hi,

Thanks for tryring out DeepXML. In general: the embedding files are created using pre-trained model available at the fasttext's website.

You can use the following link to download the embedding file for EURLex-4K: https://owncloud.iitd.ac.in/nextcloud/index.php/s/5XsZAKLbHfbpfZA

Please let me know, if you need anything else.

cairomo · 2021-06-16T19:35:23Z

thanks for your insight! does that mean that there are different embeddings for every dataset?

the way I tried to generate the embedding files was using for example wki.en.vec, reading it in as a np 0 dimensional array, and then saving that to a .npy file. it didn't give the same results as the embedding file that you shared, what did you do differently?

kunaldahiya · 2021-06-17T08:23:34Z

The embedding file in our case contains a V x D matrix, where V is the vocabulary dimension and D is the dimensionality. In other words, there is a vector for each token in the dataset. So, the embedding file would be different for each dataset as the vocabulary will be different.

We use the FastText model to compute embedding for each token in vocabulary, which is then passed to our model.

cairomo · 2021-06-23T00:41:07Z

thanks for the clarification. so what I ended up doing was something like:
model = fasttext.train_unsupervised(corpus_file, dim=dim)

and then using vocab = model.words, creating a np array of V (len(vocab)) x D where each row is
wordvec = model.get_word_vector(word) for every word in the vocabulary.

Am I understanding your process correctly?

kunaldahiya · 2021-07-05T17:13:20Z

Hi

I have added an example here which computes embeddings from pre-trained fasttext model. You are free to train your own model provided your corpus is: (i) large enough, (ii) general english or relevant to the task.

kunaldahiya · 2021-07-06T17:36:55Z

Hi

Please re-install pyxclib. The latest version contains required files. See this link.

khatrimann · 2024-11-28T11:08:15Z

Hey,
By any chance is the npy file still available with anyone of you? @kunaldahiya @cairomo
The link above has the file missing in it

kunaldahiya · 2024-11-28T13:44:36Z

Hey, By any chance is the npy file still available with anyone of you? @kunaldahiya @cairomo The link above has the file missing in it

Hi,

You can follow this example to get embeddings for a given vocabulary: https://github.com/kunaldahiya/pyxclib/blob/master/xclib/examples/get_ftx_embeddings.py

khatrimann · 2024-11-28T14:08:09Z

Hey, By any chance is the npy file still available with anyone of you? @kunaldahiya @cairomo The link above has the file missing in it

Hi,

You can follow this example to get embeddings for a given vocabulary: https://github.com/kunaldahiya/pyxclib/blob/master/xclib/examples/get_ftx_embeddings.py

I tried to do this but eurlex only had bow files features and not the text corpus.

cairomo closed this as completed Jul 6, 2021

cairomo reopened this Jul 6, 2021

kunaldahiya closed this as completed Jul 7, 2021

kunaldahiya reopened this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fasttextB_embeddings_300d.npy file #2

fasttextB_embeddings_300d.npy file #2

cairomo commented Jun 16, 2021

kunaldahiya commented Jun 16, 2021

cairomo commented Jun 16, 2021 •

edited

Loading

kunaldahiya commented Jun 17, 2021

cairomo commented Jun 23, 2021 •

edited

Loading

kunaldahiya commented Jul 5, 2021

kunaldahiya commented Jul 6, 2021 •

edited

Loading

khatrimann commented Nov 28, 2024

kunaldahiya commented Nov 28, 2024

khatrimann commented Nov 28, 2024

fasttextB_embeddings_300d.npy file #2

fasttextB_embeddings_300d.npy file #2

Comments

cairomo commented Jun 16, 2021

kunaldahiya commented Jun 16, 2021

cairomo commented Jun 16, 2021 • edited Loading

kunaldahiya commented Jun 17, 2021

cairomo commented Jun 23, 2021 • edited Loading

kunaldahiya commented Jul 5, 2021

kunaldahiya commented Jul 6, 2021 • edited Loading

khatrimann commented Nov 28, 2024

kunaldahiya commented Nov 28, 2024

khatrimann commented Nov 28, 2024

cairomo commented Jun 16, 2021 •

edited

Loading

cairomo commented Jun 23, 2021 •

edited

Loading

kunaldahiya commented Jul 6, 2021 •

edited

Loading