Skip to content

GGML: Fix leak of backend buffer memory address in RPC #14882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

struct
Copy link
Contributor

@struct struct commented Jul 26, 2025

The RPC server sends back raw memory addresses for backend buffers via RPC in fields named remote_ptr. Some backends allocate this memory in the host process which can be used to defeat Address Space Layout Randomization (ASLR), an important exploit mitigation, and make it easier to exploit other vulnerabilities in the server. This change swaps out the existing unordered_map for storing buffers in the rpc_server class for an unordered_map<uint64_t, ggml_backend_buffer_t>. This new data structures will store an opaque random ID as a handle the client can send in future commands that the server can use to lookup the corresponding backend buffer.

Note that there are other ASLR leaks in the server implementation. I haven't attempted to fix them in this PR.

Local testing with llama-bench and loading Llama-3-8b via RPC with both CPU and Metal backends. I cannot test other backends at this time but theoretically they should all just work given these changes are local to the RPC server.

$ build/bin/llama-bench -rpc localhost:50052
llama-bench(85842,0x20aa59f00) malloc: nano zone abandoned due to inability to reserve vm space.
warning: asserts enabled, performance may be affected
warning: debug build, performance may be affected
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M4 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M4 Max)
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           pp512 |      3249.54 ± 67.20 |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           tg128 |         89.78 ± 0.53 |

build: 990e815b (5991)

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 26, 2025
@CISC CISC requested a review from rgerganov July 26, 2025 07:46
@rgerganov
Copy link
Collaborator

We are also using remote addresses when serializing tensors and this indirection will bring noticeable overhead. I prefer to keep things simple and use the real addresses.

@struct
Copy link
Contributor Author

struct commented Jul 27, 2025

Yes there will certainly be a small amount of overhead vs the raw addresses, but it's practically not measurable relative to the overhead of the network communication overall. The unordered_map insertion and search is O(1) in the average case and we don't store a lot of these pointer values. It's a very small tradeoff for an essential security control. By disclosing the memory address over the protocol other memory safety issues in the implementation will be easier to exploit. No other IPC or RPC implementations do this. If improving the security of the server is a long term goal this, or a similar design, will be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants