Why are the context and target tensors of the same size (max_length) #864

krishnan-duraisamy · 2025-10-06T13:19:00Z

krishnan-duraisamy
Oct 6, 2025

In Chapter 2, section 2.6 includes samples for creating input-target pairs. It is clearly mentioned that the target is always a single token (next word output). However, in all worked examples, why then does the target tensor need to be same shape and size as the context tensor?

rasbt · 2025-10-06T17:52:45Z

rasbt
Oct 6, 2025
Maintainer

Hi there,

that's a good question. I assume you are referring to figure 2.13?

Like you said, there is still one target per sequence, that's correct. But it would be wasteful to store them as one target at a time. E.g., for the sequence

[290] ----> 4920
[290, 4920] ----> 2241
[290, 4920, 2241] ----> 287
[290, 4920, 2241, 287] ----> 257

you could store and feed them as 4 separate training samples to the LLM as shown above. In this method, the model is trained on one target token at a time.

Instead, you can do it in one step, where the inputs are

[290, 4920, 2241, 287]

and the targets are stored in a single tensor (they are the same targets IDs as before though)

     [4920, 2241, 287, 257]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why are the context and target tensors of the same size (max_length) #864

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why are the context and target tensors of the same size (max_length) #864

Uh oh!

krishnan-duraisamy Oct 6, 2025

Replies: 1 comment

Uh oh!

rasbt Oct 6, 2025 Maintainer

krishnan-duraisamy
Oct 6, 2025

rasbt
Oct 6, 2025
Maintainer