Why are the context and target tensors of the same size (max_length) #864
krishnan-duraisamy
started this conversation in
General
Replies: 1 comment
-
Hi there, that's a good question. I assume you are referring to figure 2.13? Like you said, there is still one target per sequence, that's correct. But it would be wasteful to store them as one target at a time. E.g., for the sequence
you could store and feed them as 4 separate training samples to the LLM as shown above. In this method, the model is trained on one target token at a time. Instead, you can do it in one step, where the inputs are
and the targets are stored in a single tensor (they are the same targets IDs as before though)
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In Chapter 2, section 2.6 includes samples for creating input-target pairs. It is clearly mentioned that the target is always a single token (next word output). However, in all worked examples, why then does the target tensor need to be same shape and size as the context tensor?
Beta Was this translation helpful? Give feedback.
All reactions