-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Description
System Info
The problem was encountered when trying to use the main branch of the transformers repo which depends on the package huggingface-hub==1.0.0.rc2
with an HTTP proxy supplied by standard environment variables (HTTP_PROXY
, HTTPS_PROXY
, etc). This worked before this dependency upgrade was introduced.
This PR mentions that HTTP proxies are now supposed to be configured via environment variables, so it seems it should be supported: #40889
This is also officially supported by httpx
, which is the new http library introduced in the PR: https://www.python-httpx.org/environment_variables/#proxies
However, it seems like for some reason httpx
will ignore the proxy if a custom Transport
is used: https://github.com/encode/httpx/blob/c0b46ebf4cf724d813c8595602e6a5b59aef5177/httpx/_client.py#L685
huggingface_hub
provides a custom Transport
to httpx
, so it seems like it is not possible to use a proxy using environment variables using the default transport configured here: https://github.com/huggingface/huggingface_hub/blob/f3334dd2da172093264dcc7a681d6eea14c7793d/src/huggingface_hub/utils/_http.py#L133C1-L141C6
It seems like this dependency update broke using HTTP proxies completely with transformers
because of the removal of the proxy arguments and the environment variable setting no longer working.
A way to workaround for this can be done using the set_client_factory
functions in huggingface_hub
, but it would preferable if this could be done transparently by the library if the environment variable(s) for proxies are present which is how it worked before.
Workaround:
import os
import httpx
from huggingface_hub import set_client_factory, set_async_client_factory, HfHubTransport, constants
from huggingface_hub.utils._http import default_client_factory, default_async_client_factory
def env_proxy_client_factory():
return httpx.Client(
transport=HfHubTransport(),
follow_redirects=True,
timeout=httpx.Timeout(constants.DEFAULT_REQUEST_TIMEOUT, write=60.0),
proxy=os.environ.get("HTTP_PROXY"),
)
def async_env_proxy_client_factory():
return httpx.AsyncClient(
transport=HfHubTransport(),
follow_redirects=True,
timeout=httpx.Timeout(constants.DEFAULT_REQUEST_TIMEOUT, write=60.0),
proxy=os.environ.get("HTTP_PROXY"),
)
set_client_factory(env_proxy_client_factory)
set_async_client_factory(async_env_proxy_client_factory)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')
Who can help?
cc @Wauplin @ArthurZucker @ydshieh
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
This only happens in my internal dev server setup which uses proxy, and the proxy no longer works so there is network error
python -c "from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')"
or
hf download TinyLlama/TinyLlama-1.1B-Chat-v1.0
Got error:
httpcore.ConnectError: [Errno 101] Network is unreachable
Expected behavior
we should be able to download models successfully, with internet proxy.