Skip to content

Conversation

@XucSh
Copy link
Collaborator

@XucSh XucSh commented Aug 4, 2025

This PR introduce VRAM as L1 cache.

@stmatengss stmatengss requested a review from Copilot August 5, 2025 06:58
@stmatengss stmatengss added the enhancement New feature or request label Aug 5, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This work-in-progress PR introduces VRAM (Video RAM) as an L1 cache by adding support for GPU memory allocation and management alongside the existing RAM-based storage system.

Key Changes:

  • Adds VRAM allocation and deallocation functions using CUDA runtime
  • Extends the Segment and ReplicateConfig structures to support VRAM segments
  • Implements VRAM-specific put/get operations with automatic eviction

Reviewed Changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
utils.cpp/utils.h Adds CUDA-based VRAM allocation and aligned free functions
types.h Extends Segment and ReplicateConfig to include VRAM support flags
segment.h/segment.cpp Adds VRAM allocator tracking separate from regular allocators
client.h/client.cpp Implements PutToVRAM method and extends MountSegment with VRAM flag
master_service.cpp Updates allocation logic to use VRAM allocators when requested
allocation_strategy.h Adds VRAM-only allocation constraint for local segments
store_py.h/store_py.cpp Provides Python bindings for VRAM operations with eviction queue
CMakeLists.txt Adds CUDA toolkit dependency
test files Updates test calls to include new VRAM parameter (set to false)

Comment on lines 151 to 152
std::unordered_map<std::string,
std::vector<std::shared_ptr<BufferAllocatorBase>>>
Copy link

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vram_allocators_by_name_ member should be const and passed by reference like the other members for consistency with the class design pattern.

Suggested change
std::unordered_map<std::string,
std::vector<std::shared_ptr<BufferAllocatorBase>>>
const std::unordered_map<std::string,
std::vector<std::shared_ptr<BufferAllocatorBase>>>&

Copilot uses AI. Check for mistakes.
return preferred_buffer;
}

// For now, vram is only for local use
Copy link

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return for VRAM-only allocations lacks explanation. Add a comment explaining why VRAM allocations cannot fall back to random allocation among all allocators.

Suggested change
// For now, vram is only for local use
// For now, vram is only for local use
// VRAM allocations cannot fall back to random allocation among all allocators
// because VRAM is typically only accessible locally (e.g., on the same device),
// and remote allocators cannot satisfy VRAM allocation requests due to hardware constraints.

Copilot uses AI. Check for mistakes.
@XucSh XucSh force-pushed the hbm-devel branch 6 times, most recently from a561e61 to beea491 Compare August 6, 2025 03:33
XucSh added 5 commits August 6, 2025 13:37
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
}
segment_ptrs_.emplace_back(ptr);
auto mount_result = client_->MountSegment(ptr, segment_size);
auto mount_result = client_->MountSegment(ptr, segment_size, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't modify the native API.

Comment on lines +295 to +260
tl::expected<void, ErrorCode> DistributedObjectStore::setup_vram_internal(
size_t vram_buffer_size) {
auto max_mr_size = globalConfig().max_mr_size; // Max segment size
uint64_t total_glbseg_size = vram_buffer_size; // For logging
uint64_t current_glbseg_size = 0; // For logging
// Normally, The vram buffer is smaller than max_mr_size.
while (vram_buffer_size > 0) {
size_t segment_size = std::min(vram_buffer_size, max_mr_size);
vram_buffer_size -= segment_size;
current_glbseg_size += segment_size;
LOG(INFO) << "Mounting VRAM segment: " << segment_size << " bytes, "
<< current_glbseg_size << " of " << total_glbseg_size;
void *ptr = allocate_vram_buffer_allocator_memory(segment_size);
if (!ptr) {
LOG(ERROR) << "Failed to allocate vram segment memory";
return tl::unexpected(ErrorCode::INVALID_PARAMS);
}
local_vram_segment_ptrs_.emplace_back(ptr);
auto mount_result = client_->MountSegment(ptr, segment_size, true);
if (!mount_result.has_value()) {
LOG(ERROR) << "Failed to mount vram segment: "
<< toString(mount_result.error());
return tl::unexpected(mount_result.error());
}
}
return {};
}

int DistributedObjectStore::setup_vram(size_t vram_buffer_size) {
return to_py_ret(setup_vram_internal(vram_buffer_size));
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge these two functions together?

// of the server that owns the segment
uintptr_t base{0};
size_t size{0};
bool is_vram{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should introduce an additional segment type? such as SegmentType::DRAM / VRAM, so that:

  • Function signatures become more self-documenting.
  • Future extension is straightforward—e.g., maybe we will add an SSD-backed segment type in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! We can discuss how to refactor it.


// Start put operation
ReplicateConfig conf = config;
conf.local_vram_only = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the incoming const parameter is being modified by constructing a new Config and passing it back, which is semantically odd. Is there a better way to handle this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may delete the const expr

segment_manager_->allocators_by_name_[segment.name].push_back(allocator);
segment_manager_->client_segments_[client_id].push_back(segment.id);

if (segment.is_vram)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to modify GetClientSegments to also return vram segment. This function will be used in HA mode to auto unmount segments from clients that may have crashed. Also, you may want to check other segment-related interfaces in master_service for similar issues.

@stmatengss stmatengss mentioned this pull request Aug 12, 2025
29 tasks
@stmatengss stmatengss requested a review from SgtPepperr August 18, 2025 05:18
@XucSh
Copy link
Collaborator Author

XucSh commented Oct 24, 2025

see issue 954

@XucSh XucSh closed this Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants