Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which text encoding model you are using in this code? #2

Open
crazySyaoran opened this issue Dec 6, 2022 · 5 comments
Open

Which text encoding model you are using in this code? #2

crazySyaoran opened this issue Dec 6, 2022 · 5 comments
Labels
question Further information is requested

Comments

@crazySyaoran
Copy link

In your paper, it says

At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?

@prasunroy prasunroy added the question Further information is requested label Dec 6, 2022
@prasunroy
Copy link
Owner

We use many-hot encoding in our code as it provides a straightforward way to encode the text for our application. However, BERT-encoded text also provides comparable results. We have tested with a pre-trained BERT model uncased_L-24_H-1024_A-16 from the following repository - https://github.com/google-research/bert#pre-trained-models.

@crazySyaoran
Copy link
Author

Thanks a lot, it was a great help. I will try it soon.

@crazySyaoran
Copy link
Author

crazySyaoran commented Dec 7, 2022

Hi I noticed that the encoding length in encodings.csv is 84, while the output of BERT from your provided url is (61,1024). I urgently need to reproduce your results from custom input text. Could you release your many-hot encoding model mentioned above? or could you release the code suits the BERT's encoding shape?

@crazySyaoran crazySyaoran reopened this Dec 7, 2022
@prasunroy
Copy link
Owner

The many-hot encoding was manually collected during data annotation. So, we do not have a model to infer this encoding directly from the image. It needs to be done manually as an interactive user input. In the case of a frozen text encoder, such as BERT, you need to consider the output from the last hidden layer. The final hidden layer output can be projected to the target shape through another linear layer if required. In our experiments, we tested with BERT encoding of length 384. Also, note that for any specific encoding type and/or length, the stage-1 network needs to be retrained.

Check the following resources on text encoding with BERT.
[1] https://medium.com/future-vision/real-time-natural-language-understanding-with-bert-315aff964bfa
[2] https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT

However a more recent and currently recommended way is to use Hugging Face Transformers.
https://github.com/huggingface/transformers

A demo of TIPS with BERT encoding is (temporarily) available at
https://drive.google.com/file/d/1Jsms6hPKg6ESrJyRTdwkgKScIKow1RnU

@prasunroy prasunroy changed the title which text encoding model you are using in this code? Which text encoding model you are using in this code? Dec 8, 2022
@crazySyaoran
Copy link
Author

Thanks for the reply, but I didnt find the BERT encoding of length 384 you mentioned above in hugging face. Berts I found in hugging face are:

  H=128 H=256 H=512 H=768
L=2 2/128 (BERT-Tiny) 2/256 2/512 2/768
L=4 4/128 4/256 (BERT-Mini) 4/512 (BERT-Small) 4/768
L=6 6/128 6/256 6/512 6/768
L=8 8/128 8/256 8/512 (BERT-Medium) 8/768
L=10 10/128 10/256 10/512 10/768
L=12 12/128 12/256 12/512 12/768 (BERT-Base)

from https://huggingface.co/google/bert_uncased_L-2_H-768_A-12

Could you please tell me where I can get your pretrained BERT encoding of length 384 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants