Use spawn for multiprocessing start method #3284

zhuhan0 · 2025-08-15T00:08:20Z

Summary:
CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using os.fork()), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via fork(). It's recommended to use spawn or forkserver.

Among the two, forkserver needs to be use carefully and specifically, it's recommended to call multiprocessing.set_start_method('forkserver') at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - post.

It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - code. Among the spawn and forkserver start methods, spawn is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing.

Differential Revision: D80305233

facebook-github-bot · 2025-08-15T00:08:30Z

This pull request was exported from Phabricator. Differential Revision: D80305233

Summary: CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using `os.fork()`), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via `fork()`. It's recommended to use `spawn` or `forkserver`. Among the two, `forkserver` needs to be use carefully and specifically, it's recommended to call `multiprocessing.set_start_method('forkserver')` at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - [post](https://fb.workplace.com/groups/319878845696681/posts/1494595861558301). It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - [code](https://fburl.com/code/27naz2eg). Among the `spawn` and `forkserver` start methods, `spawn` is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing. Reviewed By: adamomainz, weifengpy Differential Revision: D80305233

facebook-github-bot · 2025-08-15T21:00:05Z

This pull request was exported from Phabricator. Differential Revision: D80305233

Summary: Pull Request resolved: meta-pytorch#3284 CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using `os.fork()`), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via `fork()`. It's recommended to use `spawn` or `forkserver`. Among the two, `forkserver` needs to be use carefully and specifically, it's recommended to call `multiprocessing.set_start_method('forkserver')` at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - [post](https://fb.workplace.com/groups/319878845696681/posts/1494595861558301). It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - [code](https://fburl.com/code/27naz2eg). Among the `spawn` and `forkserver` start methods, `spawn` is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing. Reviewed By: adamomainz, weifengpy Differential Revision: D80305233

Summary: CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using `os.fork()`), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via `fork()`. It's recommended to use `spawn` or `forkserver`. Among the two, `forkserver` needs to be use carefully and specifically, it's recommended to call `multiprocessing.set_start_method('forkserver')` at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - [post](https://fb.workplace.com/groups/319878845696681/posts/1494595861558301). It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - [code](https://fburl.com/code/27naz2eg). Among the `spawn` and `forkserver` start methods, `spawn` is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing. Reviewed By: adamomainz, weifengpy Differential Revision: D80305233

Summary: Pull Request resolved: meta-pytorch#3284 CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using `os.fork()`), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via `fork()`. It's recommended to use `spawn` or `forkserver`. Among the two, `forkserver` needs to be use carefully and specifically, it's recommended to call `multiprocessing.set_start_method('forkserver')` at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - [post](https://fb.workplace.com/groups/319878845696681/posts/1494595861558301). It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - [code](https://fburl.com/code/27naz2eg). Among the `spawn` and `forkserver` start methods, `spawn` is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing. Reviewed By: adamomainz, weifengpy Differential Revision: D80305233

facebook-github-bot · 2025-08-15T22:13:13Z

This pull request was exported from Phabricator. Differential Revision: D80305233

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2025

facebook-github-bot added the fb-exported label Aug 15, 2025

zhuhan0 force-pushed the export-D80305233 branch from fb5995c to 249f44b Compare August 15, 2025 20:56

zhuhan0 force-pushed the export-D80305233 branch 2 times, most recently from c9a654b to 528871e Compare August 15, 2025 22:09

zhuhan0 force-pushed the export-D80305233 branch from 528871e to 125ee00 Compare August 15, 2025 22:13

facebook-github-bot closed this in 3b437e6 Aug 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use spawn for multiprocessing start method #3284

Use spawn for multiprocessing start method #3284

Uh oh!

zhuhan0 commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use spawn for multiprocessing start method #3284

Use spawn for multiprocessing start method #3284

Uh oh!

Conversation

zhuhan0 commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants