Skip to content

epl单机单卡和单机多卡训练step如何理解 #30

@SueeH

Description

@SueeH

单机单卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0 bash ./scripts/train_dp.sh
image

单机双卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/train_dp.sh
1693045873752

代码修改了一下:去掉了last_step限制,数据集repeat=10,将txt改为py,可执行。
resnet_dp.txt

想请教下,这个如何理解呢?每个卡分别跑了10step?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions