ModelEngine-Group
diff --git a/‎.gitignore‎
Lines changed: 7 additions & 0 deletions b/‎.gitignore‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 70 additions & 0 deletions b/‎README.md‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎docker/Dockerfile‎
Lines changed: 26 additions & 0 deletions b/‎docker/Dockerfile‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎docker/Dockerfile-NPU‎
Lines changed: 22 additions & 0 deletions b/‎docker/Dockerfile-NPU‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎docs/Makefile‎
Lines changed: 20 additions & 0 deletions b/‎docs/Makefile‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/README.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/README.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/make.bat‎
Lines changed: 35 additions & 0 deletions b/‎docs/make.bat‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/requirements-docs.txt‎
Lines changed: 10 additions & 0 deletions b/‎docs/requirements-docs.txt‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/source/about.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/about.md‎
Lines changed: 1 addition & 0 deletions
@@ -0,0 +1,7 @@
+# Development Enviroment
+.vscode/**
+.idea/**
+.git/**
+**/build/**
+**/output/**
+.venv/**
@@ -0,0 +1,21 @@
+The MIT License
+
+Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
@@ -0,0 +1,70 @@
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="docs/source/logos/UCM.png">
+    <img alt="UCM" src="docs/source/logos/UCM.png" width=70%>
+  </picture>
+</p>
+
+<p align="center">
+| <a href="docs/source/index.md"><b>Documentation</b></a> | <a href="https://github.com/ModelEngine-Group/unified-cache-management/issues/16"><b>Roadmap</b></a> |
+</p>
+
+---
+
+*Latest News* 🔥
+- [2025/08/01] We are excited to announce the alpha release of Unified Cache Manager.
+
+---
+
+## Performance
+nfs connector has reached about 4x TTFT accelerate.
+
+![perf](docs/source/images/nfs_performance.png)
+
+## Overview
+
+### Motivation
+With the increase of model size, the KV cache became larger and sparser, especially for long sequence requests. To reduce the GPU memory used, offload full KV to external storage and only keep partial or compressed KV in GPU memory became the popular direction. This can also reduce the GPU calculation, increase the sequence length and batch size of decoding.
+
+Sparse KV cache have many different choices. Recently paper point out that there is no common way can fit all scenarios and all models. So better to build a common framework then different sparse algorithms can be plugin to it like KV connector for PC.
+
+### Proposed Change
+![idea](docs/source/images/idea.png)
+
+All gray boxes are current classes in 0.9.2. Green boxes are proposed to add. Light green ones show out the future sub classes base on this framework.
+
+SpareKVBase is the base class of different algorithms. Just like KV connector design, it will hook few places of scheduler and layer.py to allow sparse algorithms do additional load, dump and calculate sparse KV blocks.
+
+SparseKVManager provide different KV block allocation methods for different algorithms. To keep all implementation under SpareKVBase, it will call SparseKVBase and real implementation will happen in sub class of sparse algorithms.
+
+KVStoreBase helps decoupling sparse algorithms and external storage. It defined the methods how to talk to external storage, so any sparse algorithms can work with any external storage. Concepts here is blocks identify by ID with offset. This is not only for sparse but also naturally for prefix cache also. KVStoreConnector connect it with current KVConnectorBase_V1 to provide PC function.
+
+NFSStore is sample implementation here provide ability to store blocks in local file system or NFS mount point in multi-server case.
+
+LocalCachedStore can refence any store to provide local DRAM read cache layer.
+
+---
+
+## Quick Start
+please refer to [installation](docs/source/getting-started/installation.md) and [example](docs/source/getting-started/example/dram_conn.md)。
+
+---
+
+## Branch Policy
+Unified Cache has main branch, develop branch and release branch.
+- **main**: main is the most stable branch. Only the release branch can be integrated. The tag is attached to the main branch.
+- **develop**: develop is a daily development branch, new features will be merged in this branch.
+- **x.x.x-release**: each time we decide to release a new version, we checkout a release branch and test on this branch, this branch only accepted [bugfix]. When the branch passed test, we merge the branch into develop and main, tag the corresponding x.x.x tag based on the main branch, and finish the release.
+
+Usually, a commit should be ONLY first merged in the develop branch.
+
+---
+
+## Contributing
+When you want to contribute some features to the Unified Cache Community, first fork a branch (usually develop) to your own repository, then commit in your own repository, and finally submit a pull request to the community.
+
+---
+
+## License
+
+Apache License 2.0, as found in the [LICENSE](./LICENSE) file.
@@ -0,0 +1,26 @@
+# Set to other image if needed
+FROM vllm/vllm-openai:v0.9.2
+
+WORKDIR /workspace
+
+# ReInstall vLLM for editting
+RUN pip uninstall -y vllm && rm -rf /vllm-workspace/*
+ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
+ARG VLLM_TAG=v0.9.2
+RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm
+
+# Set other VLLM_TARGET_DEVICE or other extra-index if needed
+ENV VLLM_USE_PRECOMPILED=1
+RUN VLLM_TARGET_DEVICE=cuda pip install -v -e /vllm-workspace/vllm --extra-index=https://download.pytorch.org/whl/nightly/cu128
+
+# Install unified-cache-management
+COPY . /vllm-workspace/unified-cache-management
+
+RUN export PLATFORM="cuda" && \
+     pip install -v -e /vllm-workspace/unified-cache-management
+
+# Apply patch for vLLM
+RUN cd /vllm-workspace/vllm \
+    && git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch
+
+ENTRYPOINT ["/bin/bash"]
@@ -0,0 +1,22 @@
+# Set to other image if needed
+FROM quay.io/ascend/vllm-ascend:v0.9.2rc1
+
+WORKDIR /workspace
+
+# Install unified-cache-management
+COPY . /vllm-workspace/unified-cache-management
+
+RUN export PLATFORM="ascend" && \
+    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
+    pip install -v -e /vllm-workspace/unified-cache-management
+
+# Apply patch for vLLM
+RUN cd /vllm-workspace/vllm \
+    && git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch
+
+# Apply patch for vLLM-Ascend
+RUN cd /vllm-workspace/vllm-ascend \
+    && git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-ascend-adapt.patch
+
+
+CMD ["/bin/bash"]
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -0,0 +1,21 @@
+# Unified Cache Manager documents
+
+Live doc: Coming soon
+
+## Build the docs
+
+```bash
+# Install dependencies.
+pip install -r requirements-docs.txt
+
+# Build the docs.
+make clean
+make html
+
+
+# Open the docs with your browser
+python -m http.server -d build/html/
+```
+
+Launch your browser and open:
+- English version: http://localhost:8000
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
@@ -0,0 +1,10 @@
+sphinx
+sphinx-argparse
+sphinx-book-theme
+sphinx-copybutton
+sphinx-design
+sphinx-togglebutton
+myst-parser
+msgspec
+sphinx-substitution-extensions
+sphinx-intl
@@ -0,0 +1 @@
+# About Us