Problems caused by launching multiple pods at the same time

Why do I get an error when I start multiple GPU-resource pods simultaneously (concurrently) using vcuda?

In vcuda loader.c, I add `ferror` to print `errno` related error message,  I get it

<img width="1437" alt="image" src="https://user-images.githubusercontent.com/25273209/155914863-c36891c2-b7a0-435c-958b-3ebb767caf6e.png">

But when I start the pods sequentially, I don't have this problem.  So I guess it may be caused by a gap between the kubelet startup container and the gpu-manager placing the libcuda.so file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems caused by launching multiple pods at the same time #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems caused by launching multiple pods at the same time #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions