You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your paper table 5, the (G,G,G,G) uses the numbers (79.8%) from PVT paper, which uses absolution positional encoding. However, I suppose the other model variants listed in this table use CPE, so they are not directly comparable. Should the accuracy of (G,G,G,G) with CPE be 81.2% as shown in table 1?
In general, I am interested in knowing if there is a benifit of using global attention in the early layers.
Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
In your paper table 5, the (G,G,G,G) uses the numbers (79.8%) from PVT paper, which uses absolution positional encoding. However, I suppose the other model variants listed in this table use CPE, so they are not directly comparable. Should the accuracy of (G,G,G,G) with CPE be 81.2% as shown in table 1?
In general, I am interested in knowing if there is a benifit of using global attention in the early layers.
Thanks.
The text was updated successfully, but these errors were encountered: