Skip to content

Chat UI Component ‐ Speech to text

ivanvpetrov edited this page Oct 14, 2025 · 2 revisions

Chat UI Component - Speech to text Specification

Contents

  1. Overview
  2. User Stories
  3. Functionality
  4. Test Scenarios
  5. Accessibility
  6. Assumptions and Limitations
  7. References

Owned by

CodeX Team

Ivan Petrov

Designer Name

Requires approval from

  • Peer Developer Name | Date:
  • Design Manager Name | Date:

Signed off by

  • Product Owner Name | Date:
  • Platform Architect Name | Date:

Revision History

Version Users Date Notes
1 Ivan Petrov 14.10.2025 Initial specification

Objectives

Add speech-to-text (STT) functionality to the Chat UI Component, allowing users to dictate messages using their voice. The feature supports two STT modes:

  1. Backend Transcription Mode – Audio is streamed via WebSocket/SignalR to a backend service that integrates with a 3rd party transcription service: Google Speech-to-Text / Vertex AI / etc.. *This backend service is provided as nuget package. Repository here
  2. Frontend (Web Speech API) Mode – Browser-native transcription handled entirely in the frontend (no server dependency).

PoC: https://github.com/IgniteUI/igniteui-webcomponents/pull/1893

Complementary Backend project: https://github.com/IgniteUI/igniteui-speech-to-text-server

Acceptance criteria

Must-have before we can consider the feature a sprint candidate

  1. Users can record and transcribe voice messages directly in the chat input in real-time.
  2. Developers can configure which STT provider is used.
  3. Transcription output appears in the message input field in real-time.
  4. The system automatically stops on silence timeout.
  5. Works across Chrome, Edge, Safari (Web Speech fallback).

Elaborate more on the multi-facetted use cases

Developer stories:

  • Story 1: As a developer, I want to enable STT via a component options so I don’t need to write custom integration code.
  • Story 2: As a developer, I want to choose between backend or frontend transcription providers.

End-user stories:

  • Story 1: As an end-user, I want to dictate a message in the chat box using my microphone.
  • Story 2: As an end-user, I want visual feedback (mic pulse and silence countdown) during recording.
  • Story 3: As an end-user, I want the transcription to stop automatically when I stop speaking and auto-submit the message.
  • Story 3: As an end-user, I want to have the ability to manually stop the transcription. This should not trigger auto-submitting the message so that it's available for further editing.

Describe behavior, design, look and feel of the implemented feature. Always include visual mock-up

3.1. End-User Experience

** A microphone icon is displayed next to the message input field. ** Clicking the icon starts recording. The microphone icon is replaced by a stop icon. ** Visual feedback begins when voice is detected - pulsing stop icon. ** Live transcription text appears in the message input field. ** When silence is detected, a timeout animation is presented (countdown circle). If during countdown, voice is again detected, the countdown resets. ** When silence timeout ends or the user clicks stop, recording stops and transcription is finalized. ** When transcription finishes due to silence timeout, the message is auto-submitted.

3.2. Developer Experience Frontend setup: Add speech to text options in the chat component options

speakPlaceholder: 'Speak...',
...
speechToText: {
    enable: true,
    lang: 'en-US',
    serviceProvider: 'webspeech', // 'webspeech' | 'backend'
    serviceUri: 'https://localhost:5000/sttHub',
  },

|Name | Description | Type | Default | Valid values |enable | Enables speech-to-text | Boolean | false | true / false |lang | Language for transcription | String | null | e.g. "en-US", "de-DE" |serviceProvider| Which transcription provider to use. Requires serviceUri | String | null| "backend" / "webspeech" |serviceUri| Backend Hub endpoint (SignalR) | String | null | URL

3.3. Globalization/Localization

Language setting controls transcription locale.

3.4. Keyboard Navigation

Keys Description

3.5. API

Options

Name Description Type Default value Valid values
SILENCE_TIMEOUT_MS Timeout before automatic stop in ms Number 4000 Any integer ≥ 0
SILENCE_GRACE_PERIOD Time before silence countdown animation starts Number 1000 Any integer < SILENCE_TIMEOUT_MS

Methods

Name Description Return type Parameters
start() Begin recording and transcription Promise language?: string
stop() Stop recording and finalize transcription void

Events

Name Description Cancelable Parameters
onPulseSignal Fired when STT detects voice (actually fired when a transcription of that voice is received, for simplification) No
onStartCountdown Fired when silence countdown animation should start No { ms: number | null}
onTranscript Fired when transcription text updates No { text: string }
onStopInProgress Fired when user clicks stop, but service awaits final transcription result No
onFinishedTranscribing Fired when transcription completes No { finish: 'auto' | 'manual' }

Automation

  • Scenario 1:
  • scenario 2:

ARIA Support

RTL Support

Assumptions Limitation Notes

Specify all referenced external sources

Clone this wiki locally