GGML: Fix leak of backend buffer memory address in RPC #14882

struct · 2025-07-26T02:58:38Z

The RPC server sends back raw memory addresses for backend buffers via RPC in fields named remote_ptr. Some backends allocate this memory in the host process which can be used to defeat Address Space Layout Randomization (ASLR), an important exploit mitigation, and make it easier to exploit other vulnerabilities in the server. This change swaps out the existing unordered_map for storing buffers in the rpc_server class for an unordered_map<uint64_t, ggml_backend_buffer_t>. This new data structures will store an opaque random ID as a handle the client can send in future commands that the server can use to lookup the corresponding backend buffer.

Note that there are other ASLR leaks in the server implementation. I haven't attempted to fix them in this PR.

Local testing with llama-bench and loading Llama-3-8b via RPC with both CPU and Metal backends. I cannot test other backends at this time but theoretically they should all just work given these changes are local to the RPC server.

$ build/bin/llama-bench -rpc localhost:50052
llama-bench(85842,0x20aa59f00) malloc: nano zone abandoned due to inability to reserve vm space.
warning: asserts enabled, performance may be affected
warning: debug build, performance may be affected
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M4 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M4 Max)
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           pp512 |      3249.54 ± 67.20 |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           tg128 |         89.78 ± 0.53 |

build: 990e815b (5991)

rgerganov · 2025-07-27T05:47:26Z

We are also using remote addresses when serializing tensors and this indirection will bring noticeable overhead. I prefer to keep things simple and use the real addresses.

struct · 2025-07-27T12:00:39Z

Yes there will certainly be a small amount of overhead vs the raw addresses, but it's practically not measurable relative to the overhead of the network communication overall. The unordered_map insertion and search is O(1) in the average case and we don't store a lot of these pointer values. It's a very small tradeoff for an essential security control. By disclosing the memory address over the protocol other memory safety issues in the implementation will be easier to exploit. No other IPC or RPC implementations do this. If improving the security of the server is a long term goal this, or a similar design, will be needed.

fix aslr leaks of buffer

b5cfe56

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 26, 2025

CISC requested a review from rgerganov July 26, 2025 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GGML: Fix leak of backend buffer memory address in RPC #14882

GGML: Fix leak of backend buffer memory address in RPC #14882

Uh oh!

struct commented Jul 26, 2025

Uh oh!

rgerganov commented Jul 27, 2025

Uh oh!

struct commented Jul 27, 2025

Uh oh!

Uh oh!

GGML: Fix leak of backend buffer memory address in RPC #14882

Are you sure you want to change the base?

GGML: Fix leak of backend buffer memory address in RPC #14882

Uh oh!

Conversation

struct commented Jul 26, 2025

Uh oh!

rgerganov commented Jul 27, 2025

Uh oh!

struct commented Jul 27, 2025

Uh oh!

Uh oh!