-
Notifications
You must be signed in to change notification settings - Fork 34
[Feature] Add mooncake store #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d from env config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tested the performance of using ucm mooncake and default (without using ucm) separately in this test without enabling prefix_caching. Do we need to merge the test charts or use other forms of processing?
| # Mooncake only has get and put interfaces, this operation is not supported | ||
| pass | ||
|
|
||
| def shutdown(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be del()method?
…nfiguration file method, and only retain the parameter transmission method
* fix issue#26 and issue#36 (#55) * [Doc] Add vllm institution (#61) * [CI] Add issue and pull request template; [Fix][Doc] Fix nfs doc error. (#62) (#64) * [CI] Add issue template * [CI] Add pr template * [Fix][Doc] Fix nfs doc error, close #57 Co-authored-by: harrisonyhq <harrisonyhq@gmail.com> * [Doc] update install doc using patch to build from source code (#68) * [Feat] Merge 0.0.1 back into develop (#72) * [CI] Add issue and pull request template; [Fix][Doc] Fix nfs doc error. (#62) * [CI] Add issue template * [CI] Add pr template * [Fix][Doc] Fix nfs doc error, close #57 * [CI][Style] Add Github workflow for pre commit and format the codestyle (#70) * [CI] Add github flow for pre-commit and unittest * [Style] Fix typo and sytle problem in repo --------- Co-authored-by: harrisonyhq <harrisonyhq@gmail.com> * [Style] Fix codestyle problems and typo in develop (#75) * [Style] Fix codestyle problems and typo * [Fix] Fix CI bug * [CI] Add workflow trigger on push * [CI] Add support pyproject.toml to enable using python -m build to compile whl package * ucm_sparse framework v1.0 (#79) * [Fix] Fix cant find cmake error when using pip install -e . * Revert "ucm_sparse framework v1.0 (#79)" (#82) This reverts commit b965dc8. * [Feature] add Mooncake Store * [Fix bug] fix docker build err and installation.md (#87) * adapt deepseek (#89) * [Feature][P/D] add example for disaggregated prefill (#90) * [Perf] Pipelined ucmnfsstore (#97) * pipelined ucmnfsstore * update default stream number * Revert "[Feature] add Mooncake Store" (#98) * [Fix bug] fix uc_connector ut and change hash generation method * [Fix] Fix .so build error (#104) [Fix] Fix so file import error in build and edit mode [Fix] format the code [Feat] Add device recognize function * [Fix] Fix ascend compile error (#106) * ESA 1.0 fix typo ESA: add vllm and vllm-ascend patch add vllm and vllm_ascend patch * fix typo * [fix] compatible with prefix cache * add sparse_attn example * add sparse_attn docs * Modify start_load_kv (#103) * [Fix] Fix duplicate create/commit errors upon preemption (#109) * [refact] format * adapt for vllm 0.9.1 (#113) Co-authored-by: y00945504 <yuhui87@huawei.com> * add patch * fix: uc_connector,rm .gitkeep ucm_oceanstor.py * rename vllm-adapt-2 to vllm-adapt-sparse * [Fix] Fix spelling issues with PR templates (#119) * remove load_tasks * [bugfix] bugfix in ucmnfsstore (#123) * trans task timeout support * [Fix] posix file open interface bugfix * add config parameter * Fix rank handling in multi-node PP setup (#129) * [Feat]Support UCM Sparse on cuda (#126) * [Feat]Support UCM Sparse on cuda * [DOCS]Add doc for format code. * [Feature] Add mooncake store (#117) * 暂存 * [Feature] Monncake connector support both config and file * [Doc] Add docs for Ucm Mooncake Connector * [Feature] Add mooncake to ucm factory * [Doc][Fix] Modify the description of configuration to match usage. * [Feature] [Fix] Load Mooncake config from dict, when lack params, load from env config file. * [Doc] update the performance and modify description. * [Test] Example config file for Mooncake test `test_mooncake_env.py`. * [Test] [Del] Removed unnecessary tests that do not match the current functionality * [Feat!] [Del] Adjust the mooncake configuration method, remove the configuration file method, and only retain the parameter transmission method * [Doc] [Fix] modifiy the performance figure of Mooncake Store. * [Feat] add __del__() to shutdown all the mooncake components --------- Co-authored-by: z00452769 <zhangyichen@huawei.com> Co-authored-by: propanone1006 <1035097916@qq.com> Co-authored-by: propanone1006 <1035067916@qq.com> * [bugfix]modify mla dump (#128) * modify mla dump * fix ci problem * [BugFix] aggregate work ouputs to decide dumped blocks * [BugFix] Modify npu worker for aggregating modelrunner_outputs * [CI] Add vllm patch for sparse in dockerfile (#134) * [CI] Add vllm patch for sparse in dockerfile * [Fix] Add patch in dockerfile and pip mirror. * [Fix] Update version 0.0.2 * ESA: skip processing for short requests (#147) * ucm_sparse: skip processing for short requests * add comments --------- Co-authored-by: flesher0813 <33923823+flesher0813@users.noreply.github.com> Co-authored-by: harrisonyhq <harrisonyhq@gmail.com> Co-authored-by: hek14 <1023129548@qq.com> Co-authored-by: Chen Deng <120033622+propanone1006@users.noreply.github.com> Co-authored-by: propanone1006 <1035067916@qq.com> Co-authored-by: qyh111 <qiuyuhao1@huawei.com> Co-authored-by: Mag1c.H <hemajun815@163.com> Co-authored-by: t00939662 <tianxuehan@huawei.com> Co-authored-by: Fate469434 <58885253+Fate469434@users.noreply.github.com> Co-authored-by: y00945504 <yuhui87@huawei.com> Co-authored-by: Zbm1996 <370478722@qq.com> Co-authored-by: NaganooMei <290992347@qq.com> Co-authored-by: NaganooMei <104300720+NaganooMei@users.noreply.github.com> Co-authored-by: f00943869 <fenghao0720@outlook.com> Co-authored-by: hufumans <113507465+hufumans@users.noreply.github.com> Co-authored-by: z00452769 <zhangyichen@huawei.com> Co-authored-by: propanone1006 <1035097916@qq.com> Co-authored-by: zhou-haitao <74044944+zhou-haitao@users.noreply.github.com> Co-authored-by: flesher0813 <1208954694@qq.com> Co-authored-by: AooooooA-C <chenaozhu@outlook.com>
Purpose
What this PR does / why we need it?
This PR integrates support for Mooncake Store as a unified cache backend in
unified-cache-management.It introduces a new
UcmMooncakeStoreconnector that wrapsMooncakeDistributedStore, enabling seamless dump/load/lookup operations for KV cache tensors in vLLM and related systems.This provides:
UcmKVStoreBaseModifications
This PR adds the following files:
unifiedcache/ucm_connector/ucm_mooncake.py: Mooncake connector implementation.test/test_mooncake.py: Unit tests for dump/load/lookup logic. Load Mooncake config from dict.docs/source/getting-started/example/mooncake_conn.md: How to use Mooncake Store in UCM.docs/source/getting-started/images/:mooncake_performance.png: mooncake_performance.mooncake_default_performance.png: default performance, used for comparing with mooncake.This PR adjusts the following files:
docs/source/getting-started/example/index.md: addmooncake_conn.md.unifieddcache/ucm_connector/factory.py: add mooncake config.Test
Unit test
This patch was tested via:
test/test_mooncake.py:Precision test
example/offine_inference.pyEnd to end test
This test follows the step in
mooncake_conn.mdto start vLLM server.model: QWQ32BUse mooncake fig: