Skip to content

Some questions about your source code #14

@shawnnjupt

Description

@shawnnjupt

in your source code
image
the first time in forward,
you use tokens from 0:1 in each batch,this is not ture for llama2(decoder only)
llama2 can be devided into prefill step and decoder steps ,in prefill step,all tokens should be translated to forward instead of the first token
then the prefill step generate the first new token ,and add it into tokens list and then generate the next.
so the calculation is that ,first cal [sequence_length.dim] then (use kv cache),only cal [1,dim]
you can see the true code in llama repohttps://github.com/meta-llama/llama/blob/main/llama/generation.py
image

also, i think the name of decoder layer is described by encoder layer in your source code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions