-
Notifications
You must be signed in to change notification settings - Fork 586
Run Petals server on Windows
Petals doesn't support Windows natively at the moment - you have to use WSL and/or Docker. In this guide, we'll show how to set up Petals on WSL.
-
On Windows admin console, install WSL:
wsl --install -
Open WSL, check that GPUs are available:
nvidia-smi
-
In WSL, install basic Python stuff:
sudo apt update sudo apt install python3-pip python-is-python3
-
Then, install Petals:
python -m pip install git+https://github.com/bigscience-workshop/petals
-
Run the Petals server:
python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b
This will host a part of LLaMA-65B with optional Guanaco adapters on your machine. You can also host
meta-llama/Llama-2-70b-hf,meta-llama/Llama-2-70b-chat-hf,bigscience/bloom,bigscience/bloomz, and other compatible models from 🤗 Model Hub, or add support for new model architectures.If you want to share multiple GPUs, you should run a Petals server for each. Open a separate WSL console for each, then run this in the first console:
CUDA_VISIBLE_DEVICES=0 python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b
Do the same for each console, replacing
CUDA_VISIBLE_DEVICES=0withCUDA_VISIBLE_DEVICES=1,CUDA_VISIBLE_DEVICES=2, etc. -
Once all blocks are loaded, check that your server is available on https://health.petals.dev/
Petals will use NAT traversal via relays by default, but you can make it available directly if your computer has a public IP address.
-
In WSL, find out the IP address of your WSL container (
172.X.X.X):sudo apt install net-tools ifconfig
-
Allow traffic to be routed into the WSL container (replace
172.X.X.Xwith your actual IP):netsh interface portproxy add v4tov4 listenport=31330 listenaddress=0.0.0.0 connectport=31330 connectaddress=172.X.X.X
-
Set up your firewall (e.g., Windows Defender) to allow traffic from the outside world to the port 31330/tcp.
-
If you have a router, set it up to allow connections from the outside world (port 31330/tcp) to your computer (port 31330/tcp).
-
Run the Petals server with the parameter
--port 31330:python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b --port 31330
-
Ensure that the server prints
This server is available directly(notvia relays) after startup.
-
I get this error:
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of otherson WSL. What should I do?Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
ntpdate pool.ntp.org -
I get this error:
torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?If you use an Anaconda env, run this before starting the server:
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128If you use Docker, add this argument after
--rmin the Docker command:-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"