[BUG] CUDA OutOfMemoryError during backward pass in distributed training

### Bug Description

Please provide a detailed description of the issue you encountered.

### Environment Information

- Python Version: 3.12.4
- GPU: NVIDIA L20-40G * 8
- CUDA Version: 12.4
- Installation Method: git clone
- Trinity-RFT Version: 0.3.0.dev0

### Steps to Reproduce

Please provide a minimal, self-contained, and reproducible example.

1. trinity run --config examples/XXX/XXX.yaml

### Actual Behavior

```kotlin
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.76 GiB. 
GPU 0 has a total capacity of 44.52 GiB of which only 2–3 GiB is free. 
This process has ~41 GiB memory in use. Of the allocated memory, ~36 GiB is used by PyTorch.

ray.exceptions.RayTaskError(OutOfMemoryError): 
WorkerDict.actor_update_actor() failed with CUDA OOM
```

**To reiterate, I am using 8 L20 GPUs with a batch size reduced to 16, max_response_tokens set to 2048, and repeat_times set to 8. However, errors still occur randomly.**

### Log Information

If applicable, include any relevant log output here.

<img width="1748" height="810" alt="Image" src="https://github.com/user-attachments/assets/13b1d16d-59c8-4e59-9194-58f9c10543d5" />

<img width="1749" height="915" alt="Image" src="https://github.com/user-attachments/assets/de2e9bd8-3627-4620-ae5f-28c79d629034" />

<img width="1743" height="1010" alt="Image" src="https://github.com/user-attachments/assets/5e92b6d4-5afc-40a7-a98b-b410c2c09b63" />

<img width="1758" height="600" alt="Image" src="https://github.com/user-attachments/assets/ebb6a65b-0e4d-4ba0-8d03-88d3668f1229" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] CUDA OutOfMemoryError during backward pass in distributed training #284

Bug Description

Environment Information

Steps to Reproduce

Actual Behavior

Log Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] CUDA OutOfMemoryError during backward pass in distributed training #284

Description

Bug Description

Environment Information

Steps to Reproduce

Actual Behavior

Log Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions