Skip to content

Commit 8387ae1

Browse files
authored
fix preprocessing command
1 parent e52bdab commit 8387ae1

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ An example script to prepare data for GPT training is:
9393
python tools/preprocess_data.py \
9494
--input my-corpus.json \
9595
--output-prefix my-gpt2 \
96-
--vocab gpt2-vocab.json \
96+
--vocab-file gpt2-vocab.json \
9797
--dataset-impl mmap \
9898
--tokenizer-type GPT2BPETokenizer \
9999
--merge-file gpt2-merges.txt \
@@ -132,7 +132,7 @@ xz -d oscar-1GB.jsonl.xz
132132
python tools/preprocess_data.py \
133133
--input oscar-1GB.jsonl \
134134
--output-prefix my-gpt2 \
135-
--vocab gpt2-vocab.json \
135+
--vocab-file gpt2-vocab.json \
136136
--dataset-impl mmap \
137137
--tokenizer-type GPT2BPETokenizer \
138138
--merge-file gpt2-merges.txt \
@@ -192,13 +192,13 @@ DATA_ARGS=" \
192192
--data-path $DATA_PATH \
193193
"
194194
195-
CMD="pretrain_gpt.py $GPT_ARGS $OUTPUT_ARGS $DATA_ARGS"
195+
CMD="pretrain_gpt.py GPTARGSGPT_ARGS OUTPUT_ARGS $DATA_ARGS"
196196
197197
N_GPUS=1
198198
199199
LAUNCHER="deepspeed --num_gpus $N_GPUS"
200200
201-
$LAUNCHER $CMD
201+
LAUNCHERLAUNCHER CMD
202202
```
203203

204204
Note, we replaced `python` with `deepspeed --num_gpus 1`. For multi-gpu training update `--num_gpus` to the number of GPUs you have.

0 commit comments

Comments
 (0)