-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
I deployed multiple models on the Triton server(docker), BERT models are using GPU, and XGBoost models are using CPU. Now I want to limit the number of CPU cores used by fil backend to avoid affecting other services. So what should I do?
- Edit model configuration in config.pbtxt? The Rate Limiter does not seem to accurately limit the number of CPU cores used.
- Use tritonserver --backend-config=fil,xxx:xxx? It seems that only onnx backend and tensorrt backend provides command-line configuration, I didn't find similar information in document of fil backend.
- limit the cpus when docker run? If the XGBoost model runs under high load, will it affect the overall performance of the tritonserver?
Metadata
Metadata
Assignees
Labels
No labels