Evaluating word embeddings #1265

osotsia · 2021-01-10T01:16:22Z

osotsia
Jan 10, 2021

I have a newly-trained set of word embeddings (like Glove) and I want to evaluate them on some benchmark datasets (with a basic classifier like random forests or SVM). Ideally the embeddings would remain fixed and only the classifier would be trained.

I just wanted to verify if this is this the right project for this task. I wasn't clear where to begin since the documentation seems more complicated than this. Any chance there is a tutorial for this? Thanks.

Answered by zphang

Jan 10, 2021

Hi osotsia,

My sense is that jiant might not be the appropriate library for your use-case:

If you have a set of newly-trained word embeddings, they likely have a specific tokenization that differs from the models we currently support.
We also do not currently support models such as SVMs and random forests jiant primarily supports transformer-based models from the transformers library.

Instead, I would recommend you look into the following:

Use Hugging Face's datasets library to obtain task data as well as metrics
Use scikit-learn's suite of machine learning models for operating on the embeddings

View full answer

zphang · 2021-01-10T06:30:05Z

zphang
Jan 10, 2021
Maintainer

Hi osotsia,

My sense is that jiant might not be the appropriate library for your use-case:

If you have a set of newly-trained word embeddings, they likely have a specific tokenization that differs from the models we currently support.
We also do not currently support models such as SVMs and random forests jiant primarily supports transformer-based models from the transformers library.

Instead, I would recommend you look into the following:

Use Hugging Face's datasets library to obtain task data as well as metrics
Use scikit-learn's suite of machine learning models for operating on the embeddings

2 replies

osotsia Jan 10, 2021
Author

Many thanks.

jpmcd Apr 8, 2021

Hi,

I'm considering a similar question regarding evaluating word embeddings, and there seems to be a way to do this with the older GLUE library (utilizing the args.word_embs_file and then trying out the baseline models). However the dependencies are pretty old. Does jiant implement the baseline models from GLUE anywhere? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating word embeddings #1265

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Evaluating word embeddings #1265

osotsia Jan 10, 2021

Replies: 1 comment · 2 replies

zphang Jan 10, 2021 Maintainer

osotsia Jan 10, 2021 Author

jpmcd Apr 8, 2021

osotsia
Jan 10, 2021

Replies: 1 comment 2 replies

zphang
Jan 10, 2021
Maintainer

osotsia Jan 10, 2021
Author