-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
As you can see on https://github.com/search?q=repo%3ABerriAI%2Flitellm%20routing%23fallbacks&type=code
That page https://docs.litellm.ai/docs/routing#fallbacks is mentioned but that section fallbacks doest' not exists.
Also suggests to look for context_window_fallback
but in page doesn't exists but exists context_window_fallback_dict
in that page.
Looking at the litellm code instead exists https://github.com/search?q=repo%3ABerriAI%2Flitellm+context_window_fallback&type=code
So it is not clear to me how to set a maximum context, in this way instead of an exception the request is blocked by litellm.
Relevant log output
09:19:56 - LiteLLM Proxy:ERROR: proxy_server.py:3681 - litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.ContextWindowExceededError: litellm.BadRequestError: ContextWindowExceededError: Hosted_vllmException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
model=LongWriter-Zero-32B. context_window_fallbacks=None. fallbacks=None.
Set 'context_window_fallback' - https://docs.litellm.ai/docs/routing#fallbacks
Received Model Group=LongWriter-Zero-32B
Available Model Group Fallbacks=None LiteLLM Retried: 1 times, LiteLLM Max Retries: 2
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 950, in async_streaming
headers, response = await self.make_openai_chat_completion_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<4 lines>...
)
^
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_utils.py", line 131, in async_wrapper
result = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 439, in make_openai_chat_completion_request
raise e
File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 421, in make_openai_chat_completion_request
await openai_aclient.chat.completions.with_raw_response.create(
**data, timeout=timeout
)
File "/usr/lib/python3.13/site-packages/openai/_legacy_response.py", line 381, in wrapped
return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/openai/resources/chat/completions/completions.py", line 2000, in create
return await self._post(
^^^^^^^^^^^^^^^^^
...<43 lines>...
)
^
File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1767, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1461, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
)
^
File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1562, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 467, in acompletion
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 998, in async_streaming
raise OpenAIError(
...<3 lines>...
)
litellm.llms.openai.common_utils.OpenAIError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py", line 3568, in chat_completion
responses = await llm_responses
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 894, in acompletion
raise e
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 870, in acompletion
response = await self.async_function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3048, in async_function_with_fallbacks
raise original_exception
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 2866, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3239, in async_function_with_retries
raise original_exception
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3132, in async_function_with_retries
response = await self.make_call(original_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3248, in make_call
response = await response
^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1019, in _acompletion
raise e
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 987, in _acompletion
response = await _response
^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1395, in wrapper_async
raise e
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1254, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 486, in acompletion
raise exception_type(
~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<3 lines>...
extra_kwargs=kwargs,
^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2201, in exception_type
raise e
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 282, in exception_type
raise ContextWindowExceededError(
...<5 lines>...
)
litellm.exceptions.ContextWindowExceededError: litellm.ContextWindowExceededError: litellm.BadRequestError: ContextWindowExceededError: Hosted_vllmException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
model=LongWriter-Zero-32B. context_window_fallbacks=None. fallbacks=None.
Set 'context_window_fallback' - https://docs.litellm.ai/docs/routing#fallbacks
Received Model Group=LongWriter-Zero-32B
Available Model Group Fallbacks=None LiteLLM Retried: 1 times, LiteLLM Max Retries: 2
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
latest
Twitter / LinkedIn details
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working