Skip to content

Question about LongNet attention map overlap #108

@RmZeta2718

Description

@RmZeta2718

The attention maps of different Dilated Rates overlap with each other. In the figure, the same token is calculated only once, but in the code, the overlapped token is taken into account multiple times (because the output is a weighted sum of all Dilated Rates).

image

Would you have any comments on this? Is this expected behavior? @donglixp @shumingma

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions