Skip to content

[Bug]: Link to documentation fallback for context doest' not exists #12680

@Mte90

Description

@Mte90

What happened?

As you can see on https://github.com/search?q=repo%3ABerriAI%2Flitellm%20routing%23fallbacks&type=code
That page https://docs.litellm.ai/docs/routing#fallbacks is mentioned but that section fallbacks doest' not exists.

Also suggests to look for context_window_fallback but in page doesn't exists but exists context_window_fallback_dict in that page.
Looking at the litellm code instead exists https://github.com/search?q=repo%3ABerriAI%2Flitellm+context_window_fallback&type=code

So it is not clear to me how to set a maximum context, in this way instead of an exception the request is blocked by litellm.

Relevant log output

09:19:56 - LiteLLM Proxy:ERROR: proxy_server.py:3681 - litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.ContextWindowExceededError: litellm.BadRequestError: ContextWindowExceededError: Hosted_vllmException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
model=LongWriter-Zero-32B. context_window_fallbacks=None. fallbacks=None.
Set 'context_window_fallback' - https://docs.litellm.ai/docs/routing#fallbacks
Received Model Group=LongWriter-Zero-32B
Available Model Group Fallbacks=None LiteLLM Retried: 1 times, LiteLLM Max Retries: 2
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 950, in async_streaming
    headers, response = await self.make_openai_chat_completion_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/logging_utils.py", line 131, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 439, in make_openai_chat_completion_request
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 421, in make_openai_chat_completion_request
    await openai_aclient.chat.completions.with_raw_response.create(
        **data, timeout=timeout
    )
  File "/usr/lib/python3.13/site-packages/openai/_legacy_response.py", line 381, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/openai/resources/chat/completions/completions.py", line 2000, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
    ...<43 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/openai/_base_client.py", line 1562, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 467, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/openai/openai.py", line 998, in async_streaming
    raise OpenAIError(
    ...<3 lines>...
    )
litellm.llms.openai.common_utils.OpenAIError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py", line 3568, in chat_completion
    responses = await llm_responses
                ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 894, in acompletion
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 870, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3048, in async_function_with_fallbacks
    raise original_exception
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 2866, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3239, in async_function_with_retries
    raise original_exception
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3132, in async_function_with_retries
    response = await self.make_call(original_function, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3248, in make_call
    response = await response
               ^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1019, in _acompletion
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 987, in _acompletion
    response = await _response
               ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1395, in wrapper_async
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1254, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 486, in acompletion
    raise exception_type(
          ~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<3 lines>...
        extra_kwargs=kwargs,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2201, in exception_type
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 282, in exception_type
    raise ContextWindowExceededError(
    ...<5 lines>...
    )
litellm.exceptions.ContextWindowExceededError: litellm.ContextWindowExceededError: litellm.BadRequestError: ContextWindowExceededError: Hosted_vllmException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33077 tokens (1077 in the messages, 32000 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
model=LongWriter-Zero-32B. context_window_fallbacks=None. fallbacks=None.
Set 'context_window_fallback' - https://docs.litellm.ai/docs/routing#fallbacks
Received Model Group=LongWriter-Zero-32B
Available Model Group Fallbacks=None LiteLLM Retried: 1 times, LiteLLM Max Retries: 2

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

latest

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions