Skip to content

HTTP proxies via env variables no longer working in the main branch of transformers #41301

@jerryzh168

Description

@jerryzh168

System Info

The problem was encountered when trying to use the main branch of the transformers repo which depends on the package huggingface-hub==1.0.0.rc2 with an HTTP proxy supplied by standard environment variables (HTTP_PROXY, HTTPS_PROXY, etc). This worked before this dependency upgrade was introduced.

This PR mentions that HTTP proxies are now supposed to be configured via environment variables, so it seems it should be supported: #40889

This is also officially supported by httpx, which is the new http library introduced in the PR: https://www.python-httpx.org/environment_variables/#proxies

However, it seems like for some reason httpx will ignore the proxy if a custom Transport is used: https://github.com/encode/httpx/blob/c0b46ebf4cf724d813c8595602e6a5b59aef5177/httpx/_client.py#L685

huggingface_hub provides a custom Transport to httpx, so it seems like it is not possible to use a proxy using environment variables using the default transport configured here: https://github.com/huggingface/huggingface_hub/blob/f3334dd2da172093264dcc7a681d6eea14c7793d/src/huggingface_hub/utils/_http.py#L133C1-L141C6

It seems like this dependency update broke using HTTP proxies completely with transformers because of the removal of the proxy arguments and the environment variable setting no longer working.

A way to workaround for this can be done using the set_client_factory functions in huggingface_hub, but it would preferable if this could be done transparently by the library if the environment variable(s) for proxies are present which is how it worked before.

Workaround:

import os
import httpx
from huggingface_hub import set_client_factory, set_async_client_factory, HfHubTransport, constants
from huggingface_hub.utils._http import default_client_factory, default_async_client_factory


def env_proxy_client_factory():
    return httpx.Client(
        transport=HfHubTransport(),
        follow_redirects=True,
        timeout=httpx.Timeout(constants.DEFAULT_REQUEST_TIMEOUT, write=60.0),
        proxy=os.environ.get("HTTP_PROXY"),
    )


def async_env_proxy_client_factory():
    return httpx.AsyncClient(
        transport=HfHubTransport(),
        follow_redirects=True,
        timeout=httpx.Timeout(constants.DEFAULT_REQUEST_TIMEOUT, write=60.0),
        proxy=os.environ.get("HTTP_PROXY"),
    )


set_client_factory(env_proxy_client_factory)
set_async_client_factory(async_env_proxy_client_factory)

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')

Who can help?

cc @Wauplin @ArthurZucker @ydshieh

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This only happens in my internal dev server setup which uses proxy, and the proxy no longer works so there is network error

python -c "from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')"

or
hf download TinyLlama/TinyLlama-1.1B-Chat-v1.0

Got error:

httpcore.ConnectError: [Errno 101] Network is unreachable

Expected behavior

we should be able to download models successfully, with internet proxy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions