Live transcription #10

colbytimm · 2025-10-15T04:33:18Z

No description provided.

…th custom key bindings and toolbar

…ance interrupt timer functionality

…es and update related templates and tests

…-related commands and cleaning up unnecessary sections

…y and functionality

…or consistency

…in transcription process

- Introduced `live_types.py` to define data structures for audio frames, speech chunks, transcript segments, and dashboard events. - Implemented `vad_chunker.py` for voice activity detection and chunking of audio streams. - Updated README with debugging instructions for live transcription. - Created `debug_live_transcript.py` for inspecting audio chunks. - Added `test_live_vad.py` to test VAD functionality with audio capture. - Enhanced `test_device_manager.py` to include tests for aggregate device detection. - Developed `test_live_transcriber.py` to validate live transcription events and exports. - Modified `test_settings.py` to reflect updated default settings. - Added `test_vad_chunker.py` to ensure VAD chunker emits chunks correctly. - Updated `whisper_transcriber.py` to support segment callbacks during transcription. - Enhanced `batch_processor.py` to allow segment callbacks during batch processing. - Updated dependency management in `uv.lock` for new packages and versions.

github-actions · 2025-10-15T04:36:04Z

📦 Package Published to TestPyPI

Version: 0.1.0.dev47
Repository: testpypi

🧪 This is a test release. Install with:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ chirp-notes-ai

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-15T04:37:23Z

recorder/live_transcriber.py

+    def _process_chunk(self, chunk: SpeechChunk):
+        self._publish_event("chunk", {"duration": chunk.end - chunk.start})
+        self._pcm_buffer.extend(chunk.data)
+        self._last_chunk_end = max(self._last_chunk_end, chunk.end)
+
+        self._maybe_transcribe(force=False)
+
+    def _publish_event(self, event_type: str, payload: dict):
+        event = DashboardEvent(type=event_type, payload=payload)
+        try:
+            self.event_queue.put_nowait(event)
+        except queue.Full:
+            pass
+
+    @staticmethod
+    def _convert_chunk_to_array(chunk_bytes: bytes) -> np.ndarray:
+        if not chunk_bytes:
+            return np.array([], dtype=np.float32)
+        pcm = np.frombuffer(chunk_bytes, dtype=np.int16).astype(np.float32)
+        if pcm.size == 0:
+            return np.array([], dtype=np.float32)
+        normalized = pcm / 32768.0
+        return np.ascontiguousarray(normalized, dtype=np.float32)
+
+    @staticmethod
+    def _resample_audio(
+        audio: np.ndarray, original_rate: int, target_rate: int
+    ) -> np.ndarray:
+        if original_rate == target_rate or audio.size == 0:
+            return audio
+        duration = audio.shape[0] / float(original_rate)
+        target_length = max(1, int(round(duration * target_rate)))
+        x_old = np.linspace(0, duration, num=audio.shape[0], endpoint=False)
+        x_new = np.linspace(0, duration, num=target_length, endpoint=False)
+        resampled = np.interp(x_new, x_old, audio)
+        return np.ascontiguousarray(resampled.astype(np.float32))
+
+    def _maybe_transcribe(self, force: bool):
+        if not self._pcm_buffer:
+            return
+
+        if not force and self.transcription_interval > 0:
+            if (
+                self._last_chunk_end - self._last_transcribe_at
+                < self.transcription_interval
+            ):
+                return
+
+        pcm_bytes = bytes(self._pcm_buffer)
+
+        with tempfile.NamedTemporaryFile(
+            suffix=".wav",
+            delete=False,
+            dir="/tmp" if Path("/tmp").exists() else None,
+        ) as tmp:
+            temp_path = Path(tmp.name)
+            with wave.open(tmp, "wb") as fh:
+                fh.setnchannels(1)
+                fh.setsampwidth(2)
+                fh.setframerate(self.sample_rate)
+                fh.writeframes(pcm_bytes)
+
+        try:
+            result = self.transcriber.transcribe_file(
+                temp_path,
+                fast_mode=True,
+                language=self._language,
+            )
+        finally:
+            if temp_path.exists():
+                temp_path.unlink(missing_ok=True)
+
+        metadata = result.get("metadata", {})
+        if metadata and metadata.get("language") and not self._language:
+            self._language = metadata.get("language")
+
+        segments = result.get("segments", [])
+        new_segments: list[TranscriptSegment] = []
+
+        max_end = self._last_chunk_end
+        for seg in segments:
+            text = seg.get("text", "").strip()
+            if not text:
+                continue
+            start = float(seg.get("start", 0.0))
+            end = float(seg.get("end", start))
+
+            absolute_start = self._buffer_offset_seconds + start
+            absolute_end = self._buffer_offset_seconds + end


Use actual chunk timestamps when computing transcript offsets

The live transcriber builds absolute timestamps from self._buffer_offset_seconds (the length of audio already sent to Whisper) and ignores the real start time of each SpeechChunk. Because the VAD chunker strips silence before queuing a chunk, self._buffer_offset_seconds only advances by speech duration, so any gap between chunks is dropped. For example, if a user speaks for 1s, stays silent for 5s and then speaks again, the second segment will be emitted around 2s after recording started instead of ~6s. This causes the dashboard and exported transcript to drift whenever there is silence. Track the actual wall‑clock offset (e.g. the chunk’s start time) when appending to the buffer and base absolute_start/absolute_end on that instead of accumulated audio length.

Useful? React with 👍 / 👎.

colbytimm added 10 commits September 30, 2025 12:25

Add prompt-toolkit dependency and enhance interactive chat session wi…

fa736db

…th custom key bindings and toolbar

Remove commented-out code for clarity and maintainability

a42d99c

Refactor interactive chat tests to remove exit attempts logic and enh…

92ccaa8

…ance interrupt timer functionality

Refactor note generation to use XML format for structured meeting not…

2af0e3f

…es and update related templates and tests

Refactor Makefile commands for clarity and consistency, updating note…

98ca226

…-related commands and cleaning up unnecessary sections

Refactor test configuration and coverage settings for improved clarit…

f13602f

…y and functionality

Update branding from 🐦 to 🐣 in Makefile, README, CLI, and templates f…

0448d16

…or consistency

Add constants for default meeting name and enhance metadata handling …

d91d40c

…in transcription process

Merge branch 'main' into live-transcription

9ba503e

colbytimm temporarily deployed to release October 15, 2025 04:35 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Live transcription #10

Live transcription #10

Uh oh!

colbytimm commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Live transcription #10

Are you sure you want to change the base?

Live transcription #10

Uh oh!

Conversation

colbytimm commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

📦 Package Published to TestPyPI

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants