Skip to content

I am studying the code related to Flux. Could you please explain the underlying logic behind this core part of the code? #15

@deadsmither5

Description

@deadsmither5

image

I’m curious about why you chose the dimensions below. Looking forward to your reply. Thank you!

(1,24,4608,4608) -> (1,24,4096,512)#?why do this?

    attention_probs = attention_probs[:,:,text_length:,:text_length].cpu()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions