-
-
Notifications
You must be signed in to change notification settings - Fork 5k
feat(elevenlabs): Add with_timestamps support for TTS alignment data
#17344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
with_timestamps support for TTS alignment data
…itellm into elevenlabs/enrichment
| raw_response: httpx.Response, | ||
| logging_obj: LiteLLMLoggingObj, | ||
| ) -> "HttpxBinaryResponseContent": | ||
| ) -> Union["HttpxBinaryResponseContent", Dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saar-win wouldn't this be non-openai compatible?
let me know if there is a scenario where openai returns a dictionary like object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json.json
ElevenLabs supports a with_timestamps option which uses their /with-timestamps endpoint. This returns a JSON dict containing audio_base64 (base64-encoded audio) and an alignment object with character-level timing data. This is a provider-specific extension—OpenAI TTS has no equivalent and always returns binary audio.
|
|
@krrishdholakia, something new with that? |
Title
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
🆕 New Feature
Changes
Add support for ElevenLabs' /with-timestamps endpoint, which returns
character-level timing information alongside the generated audio. This
enables use cases like lip-sync, captions, and audio-text synchronization.
Changes
Core Implementation
litellm/llms/elevenlabs/text_to_speech/transformation.py:
litellm/main.py:
litellm/proxy/proxy_server.py:
Supporting Changes
litellm/types/llms/elevenlabs.py (new file):
litellm/llms/base_llm/text_to_speech/transformation.py:
litellm/llms/custom_httpx/llm_http_handler.py:
Tests
Documentation
Usage
Response Format
When with_timestamps=True, returns JSON instead of binary audio: