fix l2 cache set and get #5184

zhaojuanmao · 2025-12-04T07:14:14Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2180

ssd l2 cache uses raw pointers to async parallel threads for set() and get(), raw pointers are not tracked by tensors ref counts, so if tensors are deallocated or their memory allocation changed before parallel threads access the raw pointers, it will crash.

Even though in most cases tensors will not be deallocated when async parallel threads access the tensors, as futures.wait() is called before function is returned, however PyTorch memory allocation may be changed depending on its internal memory management, so raw pointers without ref count tracking could still result in accessing deallocated objects in some rare cases.

passing tensor ref counts to async parallel threads to avoid this case.

Differential Revision: D87893640

Summary: X-link: facebookresearch/FBGEMM#2178 no need to initialize initializers when random init is disabled, it could save cpu memory significantly Reviewed By: q10 Differential Revision: D86874544

Summary: X-link: facebookresearch/FBGEMM#2179 it is recommendated to use blobDB when value size is larger than 1kb, add an option for ssd rockdb wrapper to allow enable blob db Differential Revision: D86874347

Summary: X-link: facebookresearch/FBGEMM#2180 ssd l2 cache uses raw pointers to async parallel threads for set() and get(), raw pointers are not tracked by tensors ref counts, so if tensors are deallocated or their memory allocation changed before parallel threads access the raw pointers, it will crash. Even though in most cases tensors will not be deallocated when async parallel threads access the tensors, as futures.wait() is called before function is returned, however PyTorch memory allocation may be changed depending on its internal memory management, so raw pointers without ref count tracking could still result in accessing deallocated objects in some rare cases. passing tensor ref counts to async parallel threads to avoid this case. Differential Revision: D87893640

meta-codesync · 2025-12-04T07:14:22Z

@zhaojuanmao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87893640.

zhaojuanmao added 3 commits December 3, 2025 23:14

disable initializer when disable random init (pytorch#5182)

f1d9dd4

Summary: X-link: facebookresearch/FBGEMM#2178 no need to initialize initializers when random init is disabled, it could save cpu memory significantly Reviewed By: q10 Differential Revision: D86874544

enable blobDB (pytorch#5183)

9a1071b

Summary: X-link: facebookresearch/FBGEMM#2179 it is recommendated to use blobDB when value size is larger than 1kb, add an option for ssd rockdb wrapper to allow enable blob db Differential Revision: D86874347

meta-cla bot added the cla signed label Dec 4, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix l2 cache set and get #5184

fix l2 cache set and get #5184

Uh oh!

zhaojuanmao commented Dec 4, 2025

Uh oh!

meta-codesync bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix l2 cache set and get #5184

Are you sure you want to change the base?

fix l2 cache set and get #5184

Uh oh!

Conversation

zhaojuanmao commented Dec 4, 2025

Uh oh!

meta-codesync bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant