-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Hello @lucidrains, thanks for your generous sharing about this implementation. According to Figure 2 of the original paper, there is a rel_pos_bias(q, k) to obtain the final attention weights. Although I can find this function in your FLASH, this operation is missing in GAU. Could you please explain this question, or whether this operation is useless in GAU?
Thanks!
lucidrains
Metadata
Metadata
Assignees
Labels
No labels