Skip to content

Run Petals server on Windows

Alexander Borzunov edited this page Jul 22, 2023 · 27 revisions

Petals doesn't support Windows natively at the moment - you have to use WSL and/or Docker. In this guide, we'll show how to set up Petals on WSL.

Tutorial

  1. On Windows admin console, install WSL:

    wsl --install
  2. Open WSL, check that GPUs are available:

    nvidia-smi
  3. In WSL, install basic Python stuff:

    sudo apt update
    sudo apt install python3-pip python-is-python3
  4. Then, install Petals:

    python -m pip install git+https://github.com/bigscience-workshop/petals
  5. Run the Petals server:

    python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b

    This will host a part of LLaMA-65B with optional Guanaco adapters on your machine. You can also host meta-llama/Llama-2-70b-hf, meta-llama/Llama-2-70b-chat-hf, bigscience/bloom, bigscience/bloomz, and other compatible models from 🤗 Model Hub, or add support for new model architectures.

    If you want to share multiple GPUs, you should run a Petals server for each. Open a separate WSL console for each, then run this in the first console:

    CUDA_VISIBLE_DEVICES=0 python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b

    Do the same for each console, replacing CUDA_VISIBLE_DEVICES=0 with CUDA_VISIBLE_DEVICES=1, CUDA_VISIBLE_DEVICES=2, etc.

  6. Once all blocks are loaded, check that your server is available on https://health.petals.dev/

Making a server directly available

Petals will use NAT traversal via relays by default, but you can make it available directly if your computer has a public IP address.

  1. In WSL, find out the IP address of your WSL container (172.X.X.X):

    sudo apt install net-tools
    ifconfig
  2. Allow traffic to be routed into the WSL container (replace 172.X.X.X with your actual IP):

    netsh interface portproxy add v4tov4 listenport=31330 listenaddress=0.0.0.0 connectport=31330 connectaddress=172.X.X.X
  3. Set up your firewall (e.g., Windows Defender) to allow traffic from the outside world to the port 31330/tcp.

  4. If you have a router, set it up to allow connections from the outside world (port 31330/tcp) to your computer (port 31330/tcp).

  5. Run the Petals server with the parameter --port 31330:

    python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b --port 31330
  6. Ensure that the server prints This server is available directly (not via relays) after startup.

Troubleshooting

  1. I get this error: hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others on WSL. What should I do?

    Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server: ntpdate pool.ntp.org

  2. I get this error: torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?

    If you use an Anaconda env, run this before starting the server:

    export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

    If you use Docker, add this argument after --rm in the Docker command:

    -e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"

Clone this wiki locally