Skip to content

Commit c556418

Browse files
authored
llama-bench : use local GPUs along with RPC servers (#14917)
Currently if RPC servers are specified with '--rpc' and there is a local GPU available (e.g. CUDA), the benchmark will be performed only on the RPC device(s) but the backend result column will say "CUDA,RPC" which is incorrect. This patch is adding all local GPU devices and makes llama-bench consistent with llama-cli.
1 parent db16e28 commit c556418

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

tools/llama-bench/llama-bench.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -950,6 +950,7 @@ struct cmd_params_instance {
950950
}
951951
static std::vector<ggml_backend_dev_t> devices;
952952
devices.clear();
953+
// RPC devices should always come first for performance reasons
953954
for (const std::string & server : rpc_servers) {
954955
ggml_backend_dev_t dev = ggml_backend_rpc_add_device_fn(server.c_str());
955956
if (dev) {
@@ -959,6 +960,20 @@ struct cmd_params_instance {
959960
exit(1);
960961
}
961962
}
963+
// add local GPU devices if any
964+
for (size_t i = 0; i < ggml_backend_dev_count(); ++i) {
965+
ggml_backend_dev_t dev = ggml_backend_dev_get(i);
966+
switch (ggml_backend_dev_type(dev)) {
967+
case GGML_BACKEND_DEVICE_TYPE_CPU:
968+
case GGML_BACKEND_DEVICE_TYPE_ACCEL:
969+
// skip CPU backends since they are handled separately
970+
break;
971+
972+
case GGML_BACKEND_DEVICE_TYPE_GPU:
973+
devices.push_back(dev);
974+
break;
975+
}
976+
}
962977
devices.push_back(nullptr);
963978
mparams.devices = devices.data();
964979
}

0 commit comments

Comments
 (0)