Skip to content

Conversation

@colbytimm
Copy link
Contributor

No description provided.

…-related commands and cleaning up unnecessary sections
- Introduced `live_types.py` to define data structures for audio frames, speech chunks, transcript segments, and dashboard events.
- Implemented `vad_chunker.py` for voice activity detection and chunking of audio streams.
- Updated README with debugging instructions for live transcription.
- Created `debug_live_transcript.py` for inspecting audio chunks.
- Added `test_live_vad.py` to test VAD functionality with audio capture.
- Enhanced `test_device_manager.py` to include tests for aggregate device detection.
- Developed `test_live_transcriber.py` to validate live transcription events and exports.
- Modified `test_settings.py` to reflect updated default settings.
- Added `test_vad_chunker.py` to ensure VAD chunker emits chunks correctly.
- Updated `whisper_transcriber.py` to support segment callbacks during transcription.
- Enhanced `batch_processor.py` to allow segment callbacks during batch processing.
- Updated dependency management in `uv.lock` for new packages and versions.
@github-actions
Copy link

📦 Package Published to TestPyPI

Version: 0.1.0.dev47
Repository: testpypi

🧪 This is a test release. Install with:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ chirp-notes-ai

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +92 to +180
def _process_chunk(self, chunk: SpeechChunk):
self._publish_event("chunk", {"duration": chunk.end - chunk.start})
self._pcm_buffer.extend(chunk.data)
self._last_chunk_end = max(self._last_chunk_end, chunk.end)

self._maybe_transcribe(force=False)

def _publish_event(self, event_type: str, payload: dict):
event = DashboardEvent(type=event_type, payload=payload)
try:
self.event_queue.put_nowait(event)
except queue.Full:
pass

@staticmethod
def _convert_chunk_to_array(chunk_bytes: bytes) -> np.ndarray:
if not chunk_bytes:
return np.array([], dtype=np.float32)
pcm = np.frombuffer(chunk_bytes, dtype=np.int16).astype(np.float32)
if pcm.size == 0:
return np.array([], dtype=np.float32)
normalized = pcm / 32768.0
return np.ascontiguousarray(normalized, dtype=np.float32)

@staticmethod
def _resample_audio(
audio: np.ndarray, original_rate: int, target_rate: int
) -> np.ndarray:
if original_rate == target_rate or audio.size == 0:
return audio
duration = audio.shape[0] / float(original_rate)
target_length = max(1, int(round(duration * target_rate)))
x_old = np.linspace(0, duration, num=audio.shape[0], endpoint=False)
x_new = np.linspace(0, duration, num=target_length, endpoint=False)
resampled = np.interp(x_new, x_old, audio)
return np.ascontiguousarray(resampled.astype(np.float32))

def _maybe_transcribe(self, force: bool):
if not self._pcm_buffer:
return

if not force and self.transcription_interval > 0:
if (
self._last_chunk_end - self._last_transcribe_at
< self.transcription_interval
):
return

pcm_bytes = bytes(self._pcm_buffer)

with tempfile.NamedTemporaryFile(
suffix=".wav",
delete=False,
dir="/tmp" if Path("/tmp").exists() else None,
) as tmp:
temp_path = Path(tmp.name)
with wave.open(tmp, "wb") as fh:
fh.setnchannels(1)
fh.setsampwidth(2)
fh.setframerate(self.sample_rate)
fh.writeframes(pcm_bytes)

try:
result = self.transcriber.transcribe_file(
temp_path,
fast_mode=True,
language=self._language,
)
finally:
if temp_path.exists():
temp_path.unlink(missing_ok=True)

metadata = result.get("metadata", {})
if metadata and metadata.get("language") and not self._language:
self._language = metadata.get("language")

segments = result.get("segments", [])
new_segments: list[TranscriptSegment] = []

max_end = self._last_chunk_end
for seg in segments:
text = seg.get("text", "").strip()
if not text:
continue
start = float(seg.get("start", 0.0))
end = float(seg.get("end", start))

absolute_start = self._buffer_offset_seconds + start
absolute_end = self._buffer_offset_seconds + end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use actual chunk timestamps when computing transcript offsets

The live transcriber builds absolute timestamps from self._buffer_offset_seconds (the length of audio already sent to Whisper) and ignores the real start time of each SpeechChunk. Because the VAD chunker strips silence before queuing a chunk, self._buffer_offset_seconds only advances by speech duration, so any gap between chunks is dropped. For example, if a user speaks for 1s, stays silent for 5s and then speaks again, the second segment will be emitted around 2s after recording started instead of ~6s. This causes the dashboard and exported transcript to drift whenever there is silence. Track the actual wall‑clock offset (e.g. the chunk’s start time) when appending to the buffer and base absolute_start/absolute_end on that instead of accumulated audio length.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants