A lightweight local LLM chat with a web UI and a C‑based server that runs any LLM chat executable as a child and communicates via pipes.
- General Information
- Technologies Used
- Features
- Screenshots
- Setup
- Usage
- Project Status
- Room for Improvement
- Acknowledgements
- Contact
- License
LLMux makes running a local LLM chat easier by providing Tailwind‑powered web UI + a minimal C server that simply spawns any compatible chat executable and talks to it over UNIX pipes. Everything runs on your machine — no third‑party services — so you retain full privacy and control. LLMux is good for:
- Privacy‑conscious users who want a self‑hosted, browser‑based chat interface.
- Developers who need to prototype a chat front‑end around a custom model without writing HTTP or JavaScript plumbing from scratch.
- llama.cpp — tag
b5391
- CivetWeb — commit
85d361d85dd3992bf5aaa04a392bc58ce655ad9d
- Tailwind CSS —
v3.4.16
- C++ 17 for the example chat executable
- GNU Make / Bash for build orchestration
- Browser‑based chat UI served by a tiny C HTTP server
- Pluggable LLM chat executable — just point at any compatible binary
- Configurable model name, context length, server port and max response length via
#define
inserver.c
andllm.cpp
- Build script (
build.sh
) to compile everything intoout/
and runclang-format
on sources
- Obtain a model compatible with
llama.cpp
( e.g. a.gguf
file ) and place it in themodels/
directory. - ( Optional ) If you don't use the example C++ chat app (
llm_chat
akallm.cpp
), update itsLLM_CHAT_EXECUTABLE_NAME
to match your chosen binary. - Get llama.cpp and CivetWeb.
- Run:
./build.sh
This will:
- Compile the C server and C++ example chat app
- Place all outputs under
out/
- Format the source files with
clang-format
- In
out/
, set theLLM_CHAT_EXECUTABLE_NAME
macro inserver.c
to your chat binary name and re‑build if needed. - Start the server:
./out/server
- Note the printed port number ( e.g.
Server started on port 8080
). - Open your browser at
http://localhost:<port>
to start chatting.
Project is complete.All planned functionality — spawning the LLM, piping I/O, rendering a chat UI — is implemented.
To do:
- Dynamic response buffer: Switch from fixed buffers to dynamic allocation in
server.c
. - Prompt unescape: Properly unescape JSON‑style sequences (
\"
,\\\
, etc. ) in incoming prompts before forwarding.
- Inspired by the simple‑chat example in llama.cpp
Created by @lurkydismal - feel free to contact me!
This project is open source and available under the GNU Affero General Public License v3.0.