You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while converting tokens to vector for complete sentence in preprocess_and_vectorize method ,got error "'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'"
i tried to convert each token in vector and then to take mean using np.mean..but while converting df['Text'] to vector form getting errors like "Key 'u.s.-based' not present","Key ' ' not present","Key '2018' not present" etc..please help.
I think he used old version of gensim library, from 3.8 to 4.0 a lot of attributes changed. I also facing same issues tried couple of thing but it didnt help at all. Poorly documentated library to be honest im seaching hours and couldnt find anything useful.
`def preprocess_and_vectorize(text):
# remove stop words and lemmatize the text
doc = nlp(text)
filtered_tokens = []
arr = []
for token in doc:
if token.is_stop or token.is_punct:
continue
filtered_tokens.append(token.lemma_)
for token in filtered_tokens:
try:
arr.append(wv[token])
except:
continue
return np.mean(arr,axis=0)`
used this code.used try catch because many words have no vector in WV.
Activity
shiv425 commentedon Nov 3, 2022
i tried to convert each token in vector and then to take mean using np.mean..but while converting df['Text'] to vector form getting errors like "Key 'u.s.-based' not present","Key ' ' not present","Key '2018' not present" etc..please help.
elandil2 commentedon Nov 3, 2022
I think he used old version of gensim library, from 3.8 to 4.0 a lot of attributes changed. I also facing same issues tried couple of thing but it didnt help at all. Poorly documentated library to be honest im seaching hours and couldnt find anything useful.
shiv425 commentedon Nov 4, 2022
`def preprocess_and_vectorize(text):
# remove stop words and lemmatize the text
doc = nlp(text)
filtered_tokens = []
arr = []
for token in doc:
if token.is_stop or token.is_punct:
continue
filtered_tokens.append(token.lemma_)
for token in filtered_tokens:
try:
arr.append(wv[token])
except:
continue
used this code.used try catch because many words have no vector in WV.
meet5398 commentedon May 5, 2023
Solution to the problem
This is the alternative I have found for this problem and it's working
import spacy
import numpy as np
nlp=spacy.load("en_core_web_lg")
def preprocess_and_vectorize(text):
doc = nlp(text)
filtered_tokens = []
for token in doc:
if token.is_punct or token.is_stop:
continue
filtered_tokens.append(token.lemma_)
return np.mean(wv[filtered_tokens])