Skip to content

Conversation

@hufumans
Copy link
Contributor

@hufumans hufumans commented Aug 26, 2025

Purpose

What this PR does / why we need it?

This PR integrates support for Mooncake Store as a unified cache backend in unified-cache-management.

It introduces a new UcmMooncakeStore connector that wraps MooncakeDistributedStore, enabling seamless dump/load/lookup operations for KV cache tensors in vLLM and related systems.

This provides:

  • Improved extensibility for distributed cache offloading
  • Full async event loop support and task scheduling
  • Compatibility with safetensors-based serialization
  • A consistent interface aligned with UcmKVStoreBase

Modifications

This PR adds the following files:

  • unifiedcache/ucm_connector/ucm_mooncake.py: Mooncake connector implementation.
  • test/test_mooncake.py: Unit tests for dump/load/lookup logic. Load Mooncake config from dict.
  • docs/source/getting-started/example/mooncake_conn.md: How to use Mooncake Store in UCM.
  • docs/source/getting-started/images/:
    • mooncake_performance.png: mooncake_performance.
    • mooncake_default_performance.png: default performance, used for comparing with mooncake.

This PR adjusts the following files:

  • docs/source/getting-started/example/index.md: add mooncake_conn.md.
  • unifieddcache/ucm_connector/factory.py: add mooncake config.

Test

Unit test

This patch was tested via:

  • ✅ Unit tests in test/test_mooncake.py:
    • test_lookup_not_found
    • test_lookup_found
    • test_dump_once
    • test_dump_repeated
    • test_load_existing_data
    • test_load_non_existent_data
image

Precision test

  • ✅ Precision test in example/offine_inference.py
Pasted image 20250829094152

End to end test

This test follows the step in mooncake_conn.md to start vLLM server.

  • model: QWQ32B
tokens mooncake-first mooncake-second default
2k 1.9231491860002279 0.8265988459810615 0.5419427898712457
4k 3.9460434830747544 1.5273493870627135 0.991630249004811
8k 7.577957597002387 2.7632693520281464 2.0716467570047827
16k 16.823639799049126 5.515289016952738 4.742832682048902
32k 81.98759594326839 14.217441103421152 12.310140203218907

Use mooncake fig:

docqa_TTFT_QwQ-32B_MoonCake_connector_MoonCacke1

@propanone1006 propanone1006 changed the title Add mooncake store [Feature] Add mooncake store Aug 26, 2025
@propanone1006 propanone1006 requested a review from ygwpz August 26, 2025 08:16
@propanone1006 propanone1006 requested review from flesher0813 and ygwpz and removed request for flesher0813 and ygwpz August 27, 2025 02:15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tested the performance of using ucm mooncake and default (without using ucm) separately in this test without enabling prefix_caching. Do we need to merge the test charts or use other forms of processing?

# Mooncake only has get and put interfaces, this operation is not supported
pass

def shutdown(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be del()method?

@qyh111 qyh111 merged commit ccbee78 into develop Aug 30, 2025
6 checks passed
@ygwpz ygwpz deleted the develop_mooncake branch September 1, 2025 08:19
ygwpz added a commit that referenced this pull request Sep 4, 2025
* fix issue#26 and issue#36 (#55)

* [Doc] Add vllm institution (#61)

* [CI] Add issue and pull request template; [Fix][Doc] Fix nfs doc error. (#62) (#64)

* [CI] Add issue template

* [CI] Add pr template

* [Fix][Doc] Fix nfs doc error, close #57

Co-authored-by: harrisonyhq <harrisonyhq@gmail.com>

* [Doc] update install doc using patch to build from source code (#68)

* [Feat] Merge 0.0.1 back into develop (#72)

* [CI] Add issue and pull request template; [Fix][Doc] Fix nfs doc error. (#62)

* [CI] Add issue template

* [CI] Add pr template

* [Fix][Doc] Fix nfs doc error, close #57

* [CI][Style] Add Github workflow for pre commit and format the codestyle (#70)

* [CI] Add github flow for pre-commit and unittest

* [Style] Fix typo and sytle problem in repo

---------

Co-authored-by: harrisonyhq <harrisonyhq@gmail.com>

* [Style] Fix codestyle problems and typo in develop (#75)

* [Style] Fix codestyle problems and typo

* [Fix] Fix CI bug

* [CI] Add workflow trigger on push

* [CI] Add support pyproject.toml to enable using python -m build to compile whl package

* ucm_sparse framework v1.0 (#79)

* [Fix] Fix cant find cmake error when using pip install -e .

* Revert "ucm_sparse framework v1.0 (#79)" (#82)

This reverts commit b965dc8.

* [Feature] add Mooncake Store

* [Fix bug] fix docker build err and installation.md (#87)

* adapt deepseek (#89)

* [Feature][P/D] add example for disaggregated prefill (#90)

* [Perf] Pipelined ucmnfsstore (#97)

* pipelined ucmnfsstore

* update default stream number

* Revert "[Feature] add Mooncake Store" (#98)

* [Fix bug] fix uc_connector ut and change hash generation method

* [Fix] Fix .so build error (#104)

[Fix] Fix so file import error in build and edit mode

[Fix] format the code

[Feat] Add device recognize function

* [Fix] Fix ascend compile error (#106)

* ESA 1.0

fix typo

ESA: add vllm and vllm-ascend patch

add vllm and vllm_ascend patch

* fix typo

* [fix] compatible with prefix cache

* add sparse_attn example

* add sparse_attn docs

* Modify start_load_kv (#103)

* [Fix] Fix duplicate create/commit errors upon preemption (#109)

* [refact] format

* adapt for vllm 0.9.1 (#113)

Co-authored-by: y00945504 <yuhui87@huawei.com>

* add patch

* fix: uc_connector,rm .gitkeep ucm_oceanstor.py

* rename vllm-adapt-2 to vllm-adapt-sparse

* [Fix] Fix spelling issues with PR templates (#119)

* remove load_tasks

* [bugfix] bugfix in ucmnfsstore (#123)

* trans task timeout support

* [Fix] posix file open interface bugfix

* add config parameter

* Fix rank handling in multi-node PP setup (#129)

* [Feat]Support UCM Sparse on cuda (#126)

* [Feat]Support UCM Sparse on cuda

* [DOCS]Add doc for format code.

* [Feature] Add mooncake store (#117)

* 暂存

* [Feature] Monncake connector support both config and file

* [Doc] Add docs for Ucm Mooncake Connector

* [Feature] Add mooncake to ucm factory

* [Doc][Fix] Modify the description of configuration to match usage.

* [Feature] [Fix] Load Mooncake config from dict, when lack params, load from env config file.

* [Doc] update the performance and modify description.

* [Test] Example config file for Mooncake test `test_mooncake_env.py`.

* [Test] [Del] Removed unnecessary tests that do not match the current functionality

* [Feat!] [Del] Adjust the mooncake configuration method, remove the configuration file method, and only retain the parameter transmission method

* [Doc] [Fix] modifiy the performance figure of Mooncake Store.

* [Feat] add __del__() to shutdown all the mooncake components

---------

Co-authored-by: z00452769 <zhangyichen@huawei.com>
Co-authored-by: propanone1006 <1035097916@qq.com>
Co-authored-by: propanone1006 <1035067916@qq.com>

* [bugfix]modify mla dump (#128)

* modify mla dump

* fix ci problem

* [BugFix] aggregate work ouputs to decide dumped blocks

* [BugFix] Modify npu worker for aggregating modelrunner_outputs

* [CI] Add vllm patch for sparse in dockerfile (#134)

* [CI] Add vllm patch for sparse in dockerfile

* [Fix] Add patch in dockerfile and pip mirror.

* [Fix] Update version 0.0.2

* ESA: skip processing for short requests (#147)

* ucm_sparse: skip processing for  short requests

* add comments

---------

Co-authored-by: flesher0813 <33923823+flesher0813@users.noreply.github.com>
Co-authored-by: harrisonyhq <harrisonyhq@gmail.com>
Co-authored-by: hek14 <1023129548@qq.com>
Co-authored-by: Chen Deng <120033622+propanone1006@users.noreply.github.com>
Co-authored-by: propanone1006 <1035067916@qq.com>
Co-authored-by: qyh111 <qiuyuhao1@huawei.com>
Co-authored-by: Mag1c.H <hemajun815@163.com>
Co-authored-by: t00939662 <tianxuehan@huawei.com>
Co-authored-by: Fate469434 <58885253+Fate469434@users.noreply.github.com>
Co-authored-by: y00945504 <yuhui87@huawei.com>
Co-authored-by: Zbm1996 <370478722@qq.com>
Co-authored-by: NaganooMei <290992347@qq.com>
Co-authored-by: NaganooMei <104300720+NaganooMei@users.noreply.github.com>
Co-authored-by: f00943869 <fenghao0720@outlook.com>
Co-authored-by: hufumans <113507465+hufumans@users.noreply.github.com>
Co-authored-by: z00452769 <zhangyichen@huawei.com>
Co-authored-by: propanone1006 <1035097916@qq.com>
Co-authored-by: zhou-haitao <74044944+zhou-haitao@users.noreply.github.com>
Co-authored-by: flesher0813 <1208954694@qq.com>
Co-authored-by: AooooooA-C <chenaozhu@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants