-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarifying last step of the 'transform_roc' function #26
Comments
transform_roc
function
I just found #12 and openai/finetune-transformer-lm#9 Not sure how I missed the first issue, I have looked at the open issues - but it shows that comments in this code will help with understanding. |
Hi @sharpsy, I also think that this comment would help people understand this code. Do you want to add it yourself or would you like me to create the pull request? |
Ok, I'll create the pull request. |
What I don't understand is why we need to start from |
Tokens from 0 to |
The indices from 0 to |
thx! |
Thanks for the implementation, it is a big help!
I'm trying to wrap my head around the code and I had some questions here and there. One of them was the last step in the
transform_roc
function. It was not clear to me why would they augment the input sequence with another dimension containing increasing embedding ids, ids that are greater than tokens from source data vocabulary. Namely:xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)
The explanation is in the paper itself:
We used learned position embeddings instead of the sinusoidal version proposed in the original work.
Those are ids of position embeddings and are added to the input data in theforward()
method of theTransformerModel
. It took me some time to find this and we can help others that struggle with the same line.I would suggest adding comments to the code that will help with understanding.
In the
transform_roc
function, before the last step:# Position information that is added to input embeddings in the TransformerModel
And then in the
forward()
method of theTransformerModel
before theh = e.sum(dim=2)
line:# Add position embeddings to the input embeddings
The text was updated successfully, but these errors were encountered: