server: split HTTP into its own interface #17216

ngxson · 2025-11-12T17:50:01Z

How it works:

sequenceDiagram
    participant User
    participant server_http_context
    participant server_http_res
    
    User->>server_http_context: request
    server_http_context->>server_http_req: create request
    server_http_req->>handler:
    handler->>server_http_res: create response
    
    loop for each result
        server_http_res->>server_http_context: response chunk
        server_http_context->>User: response chunk
        server_http_context->>server_http_res: next()
    end

    server_http_res->>server_http_context: terminate
    server_http_context->>User: close connection

Each endpoint handler returns a server_res_generator, which is a derived class from server_http_res
The server_res_generator indicates one of 2 modes: stream or non-stream
- In non-stream mode, we simply return the data back to user
- In stream mode, we call server_res_generator::next() until it returns false. Each time we call next(), we get a new chunk of data

TODO:

fix error handling
add exception handler at server_routes level

Testing:

passed automated tests.sh
test normal usage with web UI (with multimodal input)
test usage with web UI, with concurrent requests and random interruptions

ngxson · 2025-11-13T10:27:35Z

No rush for reviewing this, would appreciate if you can do some testings on your side @ggerganov

In the next PR, I'll try to break the server.cpp into smaller pieces, the rough plan will be:

server-context.cpp
server-queue.cpp
server-task.cpp (containing both task + response + queue)
server-common.cpp (everything else)

While working on this, I'm also thinking about maybe re-using server code in llama-cli (I made a demo here); the main benefit will be to bring the same webui experience into CLI, including multimodal support, conversation control (delete/regenerate message), tool call, etc. The old CLI can be moved to llama-completion and the chat support will be removed from it. What do you think about this idea?

github-actions bot added examples server labels Nov 12, 2025

ngxson mentioned this pull request Nov 12, 2025

server: (refactor) implement generator-based API for task results #17174

Merged

server: split HTTP into its own interface

45b2fe1

ngxson force-pushed the xsn/split_http_server_context branch from 0594df9 to 45b2fe1 Compare November 12, 2025 17:53

ngxson added 5 commits November 12, 2025 21:00

move server-http and httplib to its own file

fe98058

add the remaining endpoints

473b0e5

fix exception/error handling

a2e6a00

renaming

66c6fe2

missing header

92a150f

ngxson mentioned this pull request Nov 12, 2025

PoC llama-cli using server code ngxson/llama.cpp#35

Draft

ngxson added 8 commits November 12, 2025 23:22

fix missing windows header

d990534

fix error responses from http layer

f428fe5

fix slot save/restore handler

25cc7eb

fix case where only one stream chunk is returned

3be8a3a

add NOMINMAX

9917e04

do not call sink.write on empty data

fc35e91

use safe_json_to_str for SSE

8c7fbec

clean up

da458d6

ngxson marked this pull request as ready for review November 13, 2025 10:21

ngxson requested a review from ggerganov as a code owner November 13, 2025 10:21

add some comments

cd10470

This was referenced Nov 13, 2025

cmake : move OpenSSL linking to vendor/cpp-httplib #17177

Merged

server: fixing naming conflict res_error #17243

Merged

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #17243: server: fixing naming conflict res_error auroralabs-loci/llama.cpp#195

Open

ngxson added 2 commits November 14, 2025 15:03

Merge branch 'master' into xsn/split_http_server_context

8dbe547

improve usage of next()

1bc41f6

DajanaV mentioned this pull request Nov 14, 2025

UPSTREAM PR #17216: server: split HTTP into its own interface auroralabs-loci/llama.cpp#208

Open

5 tasks

bring back the "server is listening on" message

55ccf46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: split HTTP into its own interface #17216

server: split HTTP into its own interface #17216

ngxson commented Nov 12, 2025 •

edited

Loading

Uh oh!

ngxson commented Nov 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

server: split HTTP into its own interface #17216

Are you sure you want to change the base?

server: split HTTP into its own interface #17216

Conversation

ngxson commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngxson commented Nov 12, 2025 •

edited

Loading

ngxson commented Nov 13, 2025 •

edited

Loading