You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 14, 2024. It is now read-only.
Thank you for your report!
The error means n-gram "WNA" is not trained because the corpus(uniprot trained one) does not contain such sequence,
so you have to make your own corpus and train with it by yourself.
I am facing the exact same error on my end too, but for the n-gram "KQE" instead.
Here's my code snippet -
pv = ProtVec('INPUT.FASTA', corpus_fname='OUTPUT.TXT', n=3) pv["QAT"] sequences = list(df[c]) (df[c] contains the AA sequence from which INPUT.FASTA was constructed) embeddings = [] for i in sequences: embed = pv.to_vecs(i) <- Error occurs here embeddings.append(embed)
Full code block, if it helps -
for d in data: df = pd.read_csv(d) dN = d[:-4] for c in cols: count = 1 with open('sequences_{a}_{b}.fasta'.format(a = c, b = dN), 'w') as f: for i in range(len(df)): print('>' + str(count) + '\n', df[c][i], file = f) count = count + 1 pv = ProtVec('sequences_{a}_{b}.fasta'.format(a = c, b = dN), corpus_fname='output_{a}_{b}.txt'.format(a = c, b = dN), n=3) pv["QAT"] sequences = list(df[c]) embeddings = [] for i in sequences: embed = pv.to_vecs(i) embeddings.append(embed) embedding = np.asarray(embeddings) all_embeddings = np.reshape(embedding, newshape=(embedding.shape[0], 300)) dF = pd.DataFrame(all_embeddings, columns = colN, dtype = object) dF['modification'] = df['modifications'] dF.to_csv('dataset-{a}_{b}.model'.format(a = c, b = dN)) pv.save('sequences_{a}_{b}.model'.format(a = c, b = dN))
(Idk why, but I can't seem to get this code block to indent properly.)
Please help me get past this error.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: