This is an NVIDIA AI Workbench project to try out the observability capabilities of NVIDIA NIMs. It lets you:
- Run inference locally using Llama 3.2 3B Instruct microservices via NVIDIA Inference Microservices (NIMs) on Docker
- Use the Gradio UI Chat application to interact with the hosted NIM
- Use Jupyter notebook to interact with the model using langchain
- Visualize the performance metrics and traces using Prometheus, Jaeger, and Grafana.
This Project has two main capabilities:
-
Docker compose with the following:
- NVIDIA llama 3.2 3B Instruct NIM
- Opentelemetry collector
- Prometheus
- Grafana for visualization
- Zipkin for Distributed Tracing Visualization
-
Juptyerlab notebook to interact with the deployed NVIDIA llama 3.2 3b NIM
-
Before Getting Started, create your NGC API key and add it to the environment variable under Environment -> Project Container -> variables.
-
The Next Step is to deploy our llama NIM and the rest of the stack using Docker Compose. Go to Environment -> Project Container -> Compose. Select
all
from the profiles to deploy the entire stack. You can also choose just thenim
profile to deploy the Llama 3B model Inference Microservice. Click on theStart
button to deploy the stack. -
Wait for the containers to be up and running; this might take a while if you deploy it for the first time, as the llama model will take some time to start. Once all the containers are up and running, you will be able to access the following components using localhost.
Grafana can be accessed locally by navigating to the following URL in your web browser:
For first-time login, default credentials (
admin/admin
) are included.Zipkin can be accessed locally by navigating to the following URL in your web browser: