Skip to content

Conversation

@saar-win
Copy link
Contributor

@saar-win saar-win commented Dec 2, 2025

Title

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

Add support for ElevenLabs' /with-timestamps endpoint, which returns
character-level timing information alongside the generated audio. This
enables use cases like lip-sync, captions, and audio-text synchronization.

Changes

Core Implementation

  • litellm/llms/elevenlabs/text_to_speech/transformation.py:

    • Add ELEVENLABS_WITH_TIMESTAMPS_KEY constant
    • Modify get_complete_url() to append /with-timestamps when requested
    • Update transform_text_to_speech_response() to handle JSON responses
    • Extract with_timestamps from optional_params in map_openai_params()
  • litellm/main.py:

    • Add with_timestamps parameter to speech() function signature
    • Extract and pass with_timestamps flag to litellm_params_dict
    • Update return type to Union[HttpxBinaryResponseContent, dict]
  • litellm/proxy/proxy_server.py:

    • Handle dict responses in audio_speech endpoint
    • Return ORJSONResponse for with_timestamps requests instead of StreamingResponse

Supporting Changes

  • litellm/types/llms/elevenlabs.py (new file):

    • Add type definitions for ElevenLabsAlignment
    • Add type definitions for ElevenLabsNormalizedAlignment
    • Add ElevenLabsTextToSpeechWithTimestampsResponse TypedDict
  • litellm/llms/base_llm/text_to_speech/transformation.py:

    • Update transform_text_to_speech_response return type to support dict
  • litellm/llms/custom_httpx/llm_http_handler.py:

    • Update handler return types to support dict responses

Tests

  • tests/llm_translation/test_elevenlabs.py:
    • Add test_with_timestamps_parameter
    • Add test_without_timestamps_parameter
    • Add test_transform_response_json
    • Add test_transform_response_binary

Documentation

  • docs/my-website/docs/providers/elevenlabs.md:
    • Add "Text-to-Speech with Timestamps" section
    • Include Python SDK and curl examples
    • Document response format with alignment data

Usage

# Python SDK
response = litellm.speech(
    model="elevenlabs/eleven_turbo_v2",
    input="Hello world",
    voice="alloy",
    with_timestamps=True
)
print(response["audio_base64"])
print(response["alignment"])
# curl via LiteLLM Proxy
curl -X POST "http://localhost:4000/v1/audio/speech" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model": "eleven_turbo_v2", "input": "Hello", "voice": "alloy", "with_timestamps": true}'

Response Format

When with_timestamps=True, returns JSON instead of binary audio:

  • audio_base64: Base64-encoded audio data
  • alignment: Character-level start/end times
  • normalized_alignment: Normalized timing data
Screenshot 2025-12-02 at 10 15 03

@vercel
Copy link

vercel bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Error Error Dec 2, 2025 8:29am

@saar-win saar-win changed the title Elevenlabs/enrichment feat(elevenlabs): Add with_timestamps support for TTS alignment data Dec 2, 2025
raw_response: httpx.Response,
logging_obj: LiteLLMLoggingObj,
) -> "HttpxBinaryResponseContent":
) -> Union["HttpxBinaryResponseContent", Dict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saar-win wouldn't this be non-openai compatible?

let me know if there is a scenario where openai returns a dictionary like object

Copy link
Contributor Author

@saar-win saar-win Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.json
ElevenLabs supports a with_timestamps option which uses their /with-timestamps endpoint. This returns a JSON dict containing audio_base64 (base64-encoded audio) and an alignment object with character-level timing data. This is a provider-specific extension—OpenAI TTS has no equivalent and always returns binary audio.

@krrishdholakia
Copy link
Contributor

  • @Sameerlite can you monitor this PR and ensure we get this / something like this in main

@saar-win
Copy link
Contributor Author

saar-win commented Dec 7, 2025

@krrishdholakia, something new with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants