[WIP] Use VRAM as buffer #710

XucSh · 2025-08-04T02:00:26Z

This PR introduce VRAM as L1 cache.

Copilot

Pull Request Overview

This work-in-progress PR introduces VRAM (Video RAM) as an L1 cache by adding support for GPU memory allocation and management alongside the existing RAM-based storage system.

Key Changes:

Adds VRAM allocation and deallocation functions using CUDA runtime
Extends the Segment and ReplicateConfig structures to support VRAM segments
Implements VRAM-specific put/get operations with automatic eviction

Reviewed Changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
utils.cpp/utils.h	Adds CUDA-based VRAM allocation and aligned free functions
types.h	Extends Segment and ReplicateConfig to include VRAM support flags
segment.h/segment.cpp	Adds VRAM allocator tracking separate from regular allocators
client.h/client.cpp	Implements PutToVRAM method and extends MountSegment with VRAM flag
master_service.cpp	Updates allocation logic to use VRAM allocators when requested
allocation_strategy.h	Adds VRAM-only allocation constraint for local segments
store_py.h/store_py.cpp	Provides Python bindings for VRAM operations with eviction queue
CMakeLists.txt	Adds CUDA toolkit dependency
test files	Updates test calls to include new VRAM parameter (set to false)

mooncake-store/src/utils.cpp

mooncake-store/include/types.h

Copilot · 2025-08-05T06:59:32Z

mooncake-store/include/segment.h

+    std::unordered_map<std::string,
+                       std::vector<std::shared_ptr<BufferAllocatorBase>>>


The vram_allocators_by_name_ member should be const and passed by reference like the other members for consistency with the class design pattern.

Suggested change

std::unordered_map<std::string,

std::vector<std::shared_ptr<BufferAllocatorBase>>>

const std::unordered_map<std::string,

std::vector<std::shared_ptr<BufferAllocatorBase>>>&

mooncake-store/src/master_service.cpp

Copilot · 2025-08-05T06:59:33Z

mooncake-store/include/allocation_strategy.h

            return preferred_buffer;
        }

+        // For now, vram is only for local use


This early return for VRAM-only allocations lacks explanation. Add a comment explaining why VRAM allocations cannot fall back to random allocation among all allocators.

Suggested change

// For now, vram is only for local use

// For now, vram is only for local use

// VRAM allocations cannot fall back to random allocation among all allocators

// because VRAM is typically only accessible locally (e.g., on the same device),

// and remote allocators cannot satisfy VRAM allocation requests due to hardware constraints.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

stmatengss · 2025-08-05T09:39:08Z

mooncake-integration/store/store_py.cpp

        }
        segment_ptrs_.emplace_back(ptr);
-        auto mount_result = client_->MountSegment(ptr, segment_size);
+        auto mount_result = client_->MountSegment(ptr, segment_size, false);


Don't modify the native API.

stmatengss · 2025-08-05T09:51:23Z

mooncake-integration/store/store_py.cpp

+tl::expected<void, ErrorCode> DistributedObjectStore::setup_vram_internal(
+    size_t vram_buffer_size) {
+    auto max_mr_size = globalConfig().max_mr_size;  // Max segment size
+    uint64_t total_glbseg_size = vram_buffer_size;  // For logging
+    uint64_t current_glbseg_size = 0;               // For logging
+    // Normally, The vram buffer is smaller than max_mr_size.
+    while (vram_buffer_size > 0) {
+        size_t segment_size = std::min(vram_buffer_size, max_mr_size);
+        vram_buffer_size -= segment_size;
+        current_glbseg_size += segment_size;
+        LOG(INFO) << "Mounting VRAM segment: " << segment_size << " bytes, "
+                  << current_glbseg_size << " of " << total_glbseg_size;
+        void *ptr = allocate_vram_buffer_allocator_memory(segment_size);
+        if (!ptr) {
+            LOG(ERROR) << "Failed to allocate vram segment memory";
+            return tl::unexpected(ErrorCode::INVALID_PARAMS);
+        }
+        local_vram_segment_ptrs_.emplace_back(ptr);
+        auto mount_result = client_->MountSegment(ptr, segment_size, true);
+        if (!mount_result.has_value()) {
+            LOG(ERROR) << "Failed to mount vram segment: "
+                       << toString(mount_result.error());
+            return tl::unexpected(mount_result.error());
+        }
+    }
+    return {};
+}
+
+int DistributedObjectStore::setup_vram(size_t vram_buffer_size) {
+    return to_py_ret(setup_vram_internal(vram_buffer_size));
+}
+


Merge these two functions together?

SgtPepperr · 2025-08-07T03:44:30Z

mooncake-store/include/types.h

                         // of the server that owns the segment
    uintptr_t base{0};
    size_t size{0};
+    bool is_vram{false};


Perhaps we should introduce an additional segment type? such as SegmentType::DRAM / VRAM, so that:

Function signatures become more self-documenting.

Future extension is straightforward—e.g., maybe we will add an SSD-backed segment type in the future?

Good idea! We can discuss how to refactor it.

SgtPepperr · 2025-08-07T03:49:28Z

mooncake-store/src/client.cpp

+
+    // Start put operation
+    ReplicateConfig conf = config;
+    conf.local_vram_only = true;


Here, the incoming const parameter is being modified by constructing a new Config and passing it back, which is semantically odd. Is there a better way to handle this?

We may delete the const expr

ykwd · 2025-08-07T06:47:45Z

mooncake-store/src/segment.cpp

-    segment_manager_->allocators_by_name_[segment.name].push_back(allocator);
    segment_manager_->client_segments_[client_id].push_back(segment.id);
+
+    if (segment.is_vram)


You may want to modify GetClientSegments to also return vram segment. This function will be used in HA mode to auto unmount segments from clients that may have crashed. Also, you may want to check other segment-related interfaces in master_service for similar issues.

XucSh · 2025-10-24T08:56:38Z

see issue 954

stmatengss requested a review from Copilot August 5, 2025 06:58

stmatengss added the enhancement New feature or request label Aug 5, 2025

Copilot AI reviewed Aug 5, 2025

View reviewed changes

XucSh force-pushed the hbm-devel branch 6 times, most recently from a561e61 to beea491 Compare August 6, 2025 03:33

XucSh added 5 commits August 6, 2025 13:37

Use VRAM as buffer

bfe6167

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

Fix UAM

c68c5f7

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

Format code and update dockerfile

9b61319

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

Add cuda dependency in test wheel

c3c3535

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

Fix some error

0f95238

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh force-pushed the hbm-devel branch from beea491 to 0f95238 Compare August 6, 2025 05:44

Fix lint of other commit

2273f40

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh force-pushed the hbm-devel branch from c5bc31b to 2273f40 Compare August 6, 2025 06:02

stmatengss requested review from stmatengss and xiaguan August 6, 2025 09:09

stmatengss reviewed Aug 6, 2025

View reviewed changes

SgtPepperr reviewed Aug 7, 2025

View reviewed changes

stmatengss mentioned this pull request Aug 7, 2025

[Store]feat: Migrate Persistence Metadata from Client to Master Service #690

Merged

ykwd reviewed Aug 7, 2025

View reviewed changes

stmatengss mentioned this pull request Aug 12, 2025

[RoadMap] Mooncake Store V2 #378

Open

29 tasks

stmatengss requested a review from SgtPepperr August 18, 2025 05:18

XucSh closed this Oct 24, 2025

		std::unordered_map<std::string,
		std::vector<std::shared_ptr<BufferAllocatorBase>>>

-        // For now, vram is only for local use
+        // For now, vram is only for local use
+        // VRAM allocations cannot fall back to random allocation among all allocators
+        // because VRAM is typically only accessible locally (e.g., on the same device),
+        // and remote allocators cannot satisfy VRAM allocation requests due to hardware constraints.

[WIP] Use VRAM as buffer #710

[WIP] Use VRAM as buffer #710

Uh oh!

Conversation

XucSh commented Aug 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

stmatengss Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

stmatengss Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

SgtPepperr Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

XucSh Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

SgtPepperr Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

XucSh Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ykwd Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

XucSh commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants