-
Notifications
You must be signed in to change notification settings - Fork 415
[WIP] Use VRAM as buffer #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This work-in-progress PR introduces VRAM (Video RAM) as an L1 cache by adding support for GPU memory allocation and management alongside the existing RAM-based storage system.
Key Changes:
- Adds VRAM allocation and deallocation functions using CUDA runtime
- Extends the Segment and ReplicateConfig structures to support VRAM segments
- Implements VRAM-specific put/get operations with automatic eviction
Reviewed Changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| utils.cpp/utils.h | Adds CUDA-based VRAM allocation and aligned free functions |
| types.h | Extends Segment and ReplicateConfig to include VRAM support flags |
| segment.h/segment.cpp | Adds VRAM allocator tracking separate from regular allocators |
| client.h/client.cpp | Implements PutToVRAM method and extends MountSegment with VRAM flag |
| master_service.cpp | Updates allocation logic to use VRAM allocators when requested |
| allocation_strategy.h | Adds VRAM-only allocation constraint for local segments |
| store_py.h/store_py.cpp | Provides Python bindings for VRAM operations with eviction queue |
| CMakeLists.txt | Adds CUDA toolkit dependency |
| test files | Updates test calls to include new VRAM parameter (set to false) |
mooncake-store/include/segment.h
Outdated
| std::unordered_map<std::string, | ||
| std::vector<std::shared_ptr<BufferAllocatorBase>>> |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vram_allocators_by_name_ member should be const and passed by reference like the other members for consistency with the class design pattern.
| std::unordered_map<std::string, | |
| std::vector<std::shared_ptr<BufferAllocatorBase>>> | |
| const std::unordered_map<std::string, | |
| std::vector<std::shared_ptr<BufferAllocatorBase>>>& |
| return preferred_buffer; | ||
| } | ||
|
|
||
| // For now, vram is only for local use |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This early return for VRAM-only allocations lacks explanation. Add a comment explaining why VRAM allocations cannot fall back to random allocation among all allocators.
| // For now, vram is only for local use | |
| // For now, vram is only for local use | |
| // VRAM allocations cannot fall back to random allocation among all allocators | |
| // because VRAM is typically only accessible locally (e.g., on the same device), | |
| // and remote allocators cannot satisfy VRAM allocation requests due to hardware constraints. |
a561e61 to
beea491
Compare
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
| } | ||
| segment_ptrs_.emplace_back(ptr); | ||
| auto mount_result = client_->MountSegment(ptr, segment_size); | ||
| auto mount_result = client_->MountSegment(ptr, segment_size, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't modify the native API.
| tl::expected<void, ErrorCode> DistributedObjectStore::setup_vram_internal( | ||
| size_t vram_buffer_size) { | ||
| auto max_mr_size = globalConfig().max_mr_size; // Max segment size | ||
| uint64_t total_glbseg_size = vram_buffer_size; // For logging | ||
| uint64_t current_glbseg_size = 0; // For logging | ||
| // Normally, The vram buffer is smaller than max_mr_size. | ||
| while (vram_buffer_size > 0) { | ||
| size_t segment_size = std::min(vram_buffer_size, max_mr_size); | ||
| vram_buffer_size -= segment_size; | ||
| current_glbseg_size += segment_size; | ||
| LOG(INFO) << "Mounting VRAM segment: " << segment_size << " bytes, " | ||
| << current_glbseg_size << " of " << total_glbseg_size; | ||
| void *ptr = allocate_vram_buffer_allocator_memory(segment_size); | ||
| if (!ptr) { | ||
| LOG(ERROR) << "Failed to allocate vram segment memory"; | ||
| return tl::unexpected(ErrorCode::INVALID_PARAMS); | ||
| } | ||
| local_vram_segment_ptrs_.emplace_back(ptr); | ||
| auto mount_result = client_->MountSegment(ptr, segment_size, true); | ||
| if (!mount_result.has_value()) { | ||
| LOG(ERROR) << "Failed to mount vram segment: " | ||
| << toString(mount_result.error()); | ||
| return tl::unexpected(mount_result.error()); | ||
| } | ||
| } | ||
| return {}; | ||
| } | ||
|
|
||
| int DistributedObjectStore::setup_vram(size_t vram_buffer_size) { | ||
| return to_py_ret(setup_vram_internal(vram_buffer_size)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge these two functions together?
| // of the server that owns the segment | ||
| uintptr_t base{0}; | ||
| size_t size{0}; | ||
| bool is_vram{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should introduce an additional segment type? such as SegmentType::DRAM / VRAM, so that:
- Function signatures become more self-documenting.
- Future extension is straightforward—e.g., maybe we will add an SSD-backed segment type in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! We can discuss how to refactor it.
|
|
||
| // Start put operation | ||
| ReplicateConfig conf = config; | ||
| conf.local_vram_only = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the incoming const parameter is being modified by constructing a new Config and passing it back, which is semantically odd. Is there a better way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may delete the const expr
| segment_manager_->allocators_by_name_[segment.name].push_back(allocator); | ||
| segment_manager_->client_segments_[client_id].push_back(segment.id); | ||
|
|
||
| if (segment.is_vram) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to modify GetClientSegments to also return vram segment. This function will be used in HA mode to auto unmount segments from clients that may have crashed. Also, you may want to check other segment-related interfaces in master_service for similar issues.
|
see issue 954 |
This PR introduce VRAM as L1 cache.