Skip to content

Commit 264c6bb

Browse files
authored
README.md clean up before release
1 parent e8ace42 commit 264c6bb

File tree

1 file changed

+5
-10
lines changed

1 file changed

+5
-10
lines changed

README.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ pip install rfdetr
6060
```
6161

6262
<details>
63-
<summary>From source</summary>
63+
<summary>Install from source</summary>
6464

6565
<br>
6666

@@ -278,10 +278,12 @@ model = RFDETRBase()
278278
model.train(dataset_dir=<DATASET_PATH>, epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4, output_dir=<OUTPUT_PATH>)
279279
```
280280

281+
Different GPUs have different VRAM capacities, so adjust batch_size and grad_accum_steps to maintain a total batch size of 16. For example, on a powerful GPU like the A100, use `batch_size=16` and `grad_accum_steps=1`; on smaller GPUs like the T4, use `batch_size=4` and `grad_accum_steps=4`. This gradient accumulation strategy helps train effectively even with limited memory.
282+
281283
</details>
282284

283285
<details>
284-
<summary>Parameters</summary>
286+
<summary>More parameters</summary>
285287

286288
<br>
287289

@@ -418,19 +420,12 @@ model = RFDETRBase()
418420
model.train(dataset_dir=<DATASET_PATH>, epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4, output_dir=<OUTPUT_PATH>, early_stopping=True)
419421
```
420422

421-
### Batch size
422-
423-
Different GPUs have different amounts of VRAM (video memory), which limits how much data they can handle at once during training. To make training work well on any machine, you can adjust two settings: `batch_size` and `grad_accum_steps`. These control how many samples are processed at a time. The key is to keep their product equal to 16 — that’s our recommended total batch size. For example, on powerful GPUs like the A100, set `batch_size=16` and `grad_accum_steps=1`. On smaller GPUs like the T4, use `batch_size=4` and `grad_accum_steps=4`. We use a method called gradient accumulation, which lets the model simulate training with a larger batch size by gradually collecting updates before adjusting the weights.
424-
425423
### Multi-GPU training
426424

427425
You can fine-tune RF-DETR on multiple GPUs using PyTorch’s Distributed Data Parallel (DDP). Create a `main.py` script that initializes your model and calls `.train()` as usual than run it in terminal.
428426

429427
```bash
430-
python -m torch.distributed.launch \
431-
--nproc_per_node=8 \
432-
--use_env \
433-
main.py
428+
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py
434429
```
435430

436431
Replace `8` in the `--nproc_per_node argument` with the number of GPUs you want to use. This approach creates one training process per GPU and splits the workload automatically. Note that your effective batch size is multiplied by the number of GPUs, so you may need to adjust your `batch_size` and `grad_accum_steps` to maintain the same overall batch size.

0 commit comments

Comments
 (0)