Skip to content

Commit a7538ac

Browse files
authored
Merge pull request #48 from flesher0813/0.0.1-tag
[Feat] Merge 0.0.1 back into main
2 parents f0f9455 + b17e77f commit a7538ac

File tree

100 files changed

+6038
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+6038
-1
lines changed

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Development Enviroment
2+
.vscode/**
3+
.idea/**
4+
.git/**
5+
**/build/**
6+
**/output/**
7+
.venv/**

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License
2+
3+
Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in
13+
all copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21+
THE SOFTWARE.

README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
<p align="center">
2+
<picture>
3+
<source media="(prefers-color-scheme: dark)" srcset="docs/source/logos/UCM.png">
4+
<img alt="UCM" src="docs/source/logos/UCM.png" width=70%>
5+
</picture>
6+
</p>
7+
8+
<p align="center">
9+
| <a href="docs/source/index.md"><b>Documentation</b></a> | <a href="https://github.com/ModelEngine-Group/unified-cache-management/issues/16"><b>Roadmap</b></a> |
10+
</p>
11+
12+
---
13+
14+
*Latest News* 🔥
15+
- [2025/08/01] We are excited to announce the alpha release of Unified Cache Manager.
16+
17+
---
18+
19+
## Performance
20+
nfs connector has reached about 4x TTFT accelerate.
21+
22+
![perf](docs/source/images/nfs_performance.png)
23+
24+
## Overview
25+
26+
### Motivation
27+
With the increase of model size, the KV cache became larger and sparser, especially for long sequence requests. To reduce the GPU memory used, offload full KV to external storage and only keep partial or compressed KV in GPU memory became the popular direction. This can also reduce the GPU calculation, increase the sequence length and batch size of decoding.
28+
29+
Sparse KV cache have many different choices. Recently paper point out that there is no common way can fit all scenarios and all models. So better to build a common framework then different sparse algorithms can be plugin to it like KV connector for PC.
30+
31+
### Proposed Change
32+
![idea](docs/source/images/idea.png)
33+
34+
All gray boxes are current classes in 0.9.2. Green boxes are proposed to add. Light green ones show out the future sub classes base on this framework.
35+
36+
SpareKVBase is the base class of different algorithms. Just like KV connector design, it will hook few places of scheduler and layer.py to allow sparse algorithms do additional load, dump and calculate sparse KV blocks.
37+
38+
SparseKVManager provide different KV block allocation methods for different algorithms. To keep all implementation under SpareKVBase, it will call SparseKVBase and real implementation will happen in sub class of sparse algorithms.
39+
40+
KVStoreBase helps decoupling sparse algorithms and external storage. It defined the methods how to talk to external storage, so any sparse algorithms can work with any external storage. Concepts here is blocks identify by ID with offset. This is not only for sparse but also naturally for prefix cache also. KVStoreConnector connect it with current KVConnectorBase_V1 to provide PC function.
41+
42+
NFSStore is sample implementation here provide ability to store blocks in local file system or NFS mount point in multi-server case.
43+
44+
LocalCachedStore can refence any store to provide local DRAM read cache layer.
45+
46+
---
47+
48+
## Quick Start
49+
please refer to [installation](docs/source/getting-started/installation.md) and [example](docs/source/getting-started/example/dram_conn.md)
50+
51+
---
52+
53+
## Branch Policy
54+
Unified Cache has main branch, develop branch and release branch.
55+
- **main**: main is the most stable branch. Only the release branch can be integrated. The tag is attached to the main branch.
56+
- **develop**: develop is a daily development branch, new features will be merged in this branch.
57+
- **x.x.x-release**: each time we decide to release a new version, we checkout a release branch and test on this branch, this branch only accepted [bugfix]. When the branch passed test, we merge the branch into develop and main, tag the corresponding x.x.x tag based on the main branch, and finish the release.
58+
59+
Usually, a commit should be ONLY first merged in the develop branch.
60+
61+
---
62+
63+
## Contributing
64+
When you want to contribute some features to the Unified Cache Community, first fork a branch (usually develop) to your own repository, then commit in your own repository, and finally submit a pull request to the community.
65+
66+
---
67+
68+
## License
69+
70+
Apache License 2.0, as found in the [LICENSE](./LICENSE) file.

docker/Dockerfile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Set to other image if needed
2+
FROM vllm/vllm-openai:v0.9.2
3+
4+
WORKDIR /workspace
5+
6+
# ReInstall vLLM for editting
7+
RUN pip uninstall -y vllm && rm -rf /vllm-workspace/*
8+
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
9+
ARG VLLM_TAG=v0.9.2
10+
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm
11+
12+
# Set other VLLM_TARGET_DEVICE or other extra-index if needed
13+
ENV VLLM_USE_PRECOMPILED=1
14+
RUN VLLM_TARGET_DEVICE=cuda pip install -v -e /vllm-workspace/vllm --extra-index=https://download.pytorch.org/whl/nightly/cu128
15+
16+
# Install unified-cache-management
17+
COPY . /vllm-workspace/unified-cache-management
18+
19+
RUN export PLATFORM="cuda" && \
20+
pip install -v -e /vllm-workspace/unified-cache-management
21+
22+
# Apply patch for vLLM
23+
RUN cd /vllm-workspace/vllm \
24+
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch
25+
26+
ENTRYPOINT ["/bin/bash"]

docker/Dockerfile-NPU

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Set to other image if needed
2+
FROM quay.io/ascend/vllm-ascend:v0.9.2rc1
3+
4+
WORKDIR /workspace
5+
6+
# Install unified-cache-management
7+
COPY . /vllm-workspace/unified-cache-management
8+
9+
RUN export PLATFORM="ascend" && \
10+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
11+
pip install -v -e /vllm-workspace/unified-cache-management
12+
13+
# Apply patch for vLLM
14+
RUN cd /vllm-workspace/vllm \
15+
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch
16+
17+
# Apply patch for vLLM-Ascend
18+
RUN cd /vllm-workspace/vllm-ascend \
19+
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-ascend-adapt.patch
20+
21+
22+
CMD ["/bin/bash"]

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Unified Cache Manager documents
2+
3+
Live doc: Coming soon
4+
5+
## Build the docs
6+
7+
```bash
8+
# Install dependencies.
9+
pip install -r requirements-docs.txt
10+
11+
# Build the docs.
12+
make clean
13+
make html
14+
15+
16+
# Open the docs with your browser
17+
python -m http.server -d build/html/
18+
```
19+
20+
Launch your browser and open:
21+
- English version: http://localhost:8000

docs/make.bat

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/requirements-docs.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
sphinx
2+
sphinx-argparse
3+
sphinx-book-theme
4+
sphinx-copybutton
5+
sphinx-design
6+
sphinx-togglebutton
7+
myst-parser
8+
msgspec
9+
sphinx-substitution-extensions
10+
sphinx-intl

docs/source/about.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# About Us

0 commit comments

Comments
 (0)