Skip to content

Single-node 8xH100 examples for 32B models for long-context training (near 32k tokens regime) #279

@vadimkantorov

Description

@vadimkantorov

Hi!

Are there any examples of realistic RL-tuning of 32B model with large model len (~32k with ignore_eos=True to ensure it can run near the maximum-length responses)?

I found

model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-32B-Instruct}

but it's suspiciously uses tp=1 sp=1. Does 32B training fits a single node in this experiment?

Or what is the actual number of response tokens in this experiment?

What would be your advice on setting tp and sp for 32B models? (both for single-node and for multi-node) Should we try sp=2, tp=2, tp=2xsp=2 configs?

Or should we just try sp=8 first?

Thanks!


Sorry, got wrong the label. This should not be marked as a bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions