Skip to content

Conversation

@zhaojuanmao
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2180

ssd l2 cache uses raw pointers to async parallel threads for set() and get(), raw pointers are not tracked by tensors ref counts, so if tensors are deallocated or their memory allocation changed before parallel threads access the raw pointers, it will crash.

Even though in most cases tensors will not be deallocated when async parallel threads access the tensors, as futures.wait() is called before function is returned, however PyTorch memory allocation may be changed depending on its internal memory management, so raw pointers without ref count tracking could still result in accessing deallocated objects in some rare cases.

passing tensor ref counts to async parallel threads to avoid this case.

Differential Revision: D87893640

Summary:

X-link: facebookresearch/FBGEMM#2178

no need to initialize initializers when random init is disabled, it could save cpu memory significantly

Reviewed By: q10

Differential Revision: D86874544
Summary:

X-link: facebookresearch/FBGEMM#2179

it is recommendated to use blobDB when value size is larger than 1kb, add an option for ssd rockdb wrapper to allow enable blob db

Differential Revision: D86874347
Summary:
X-link: facebookresearch/FBGEMM#2180

ssd l2 cache uses raw pointers to async parallel threads for set() and get(), raw pointers are not tracked by tensors ref counts, so if tensors are deallocated or their memory allocation changed before parallel threads access the raw pointers, it will crash. 

Even though in most cases tensors will not be deallocated when async parallel threads access the tensors, as futures.wait() is called before function is returned, however PyTorch memory allocation may be changed depending on its internal memory management, so raw pointers without ref count tracking could still result in accessing deallocated objects in some rare cases. 

passing tensor ref counts to async parallel threads to avoid this case.

Differential Revision: D87893640
@meta-cla meta-cla bot added the cla signed label Dec 4, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 4, 2025

@zhaojuanmao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87893640.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant