fix: Enable Base64 audio input/output handling for API endpoints #2719

neon-aiart · 2025-11-24T03:02:18Z

Pull request checklist

The PR has a proper title. Use Semantic Commit Messages.
Make sure this is ready to be merged into the relevant branch.
Ensure you can run the codes you submitted succesfully. These submissions will be prioritized for review:

Fix existing bugs reported by user feedback (or you met);
Introduce more convenient user operations.

PR type

Bug fix / new feature

Description

This PR implements essential fixes and extensions to enable robust Base64 audio
data transmission for API endpoints (e.g., infer_convert), directly addressing existing
input failures.

Rationale and Value

This change addresses a critical gap where Base64 handling, despite being implied in API documentation,
was non-functional.

Enabling Extensibility: Allows programs that cannot access local file systems (such as
JavaScript running in a browser) to utilize RVC's API, significantly boosting integration
potential. This is already proven utility with the 'Neon Spitch Link' UserScript.
Achieving API Functionality: Resolves the issue that prevented Base64 input from being
processed correctly as per API expectations, making the feature fully functional as intended
by the API design.

Detailed Changes

This change revolves around resolving input failures and providing a clean output path.

1. Core Logic Fix (modules.py)

Input Fix: Corrected vc_single to properly handle file-like objects (Base64 input).
Output Feature & Backward Compatibility (Key Point): Added Base64 data as the 3rd return parameter in vc_single. This approach ensures no modification to existing code relying only on the 1st and 2nd return parameters (text info and audio component tuple). This preserves compatibility.

2. Path Reliability Fix (audio.py)

Implemented glob wildcard search in load_audio to successfully resolve temporary audio file paths containing random suffixes, preventing 'file not found' errors.

3. WebUI Component Update (infer-web.py)

Added the necessary hidden Gradio JSON output component to carry the Base64 data from the backend to the frontend.

What will it affect

Positive: Enables functional Base64 audio input and output for API users.
Minimal: Controlled changes to three core files.

Screenshot

Please include a screenshot if applicable
- Not applicable (API/Backend feature).

Implement Base64 audio encoding/decoding logic in vc_single This commit focuses on the core logic within modules.py to fully enable Base64 audio data flow for API endpoints (e.g., infer_convert). 1. Fix Base64 Input Handling (f0_file object): - Corrects the bug where Base64 input failed because the function expected a file path. - If f0_file is a file-like object, the code now extracts the path via the '.name' attribute to ensure compatibility with subsequent file-based processing. 2. Implement Base64 Output: - The converted audio (audio_opt) is now safely written to a temporary WAV file using soundfile.write. - The WAV content is read and encoded into a Base64 Data URI (data:audio/wav;base64,...). - The function now returns this Base64 data as the 3rd element in the return tuple, allowing external tools to bypass the Gradio Audio component and receive raw data. 3. Fix vc_multi Compatibility: - The return value of vc_single was increased from 2 to 3 elements. - Updated the call site in vc_multi to accept the 3rd element as 'base64_opt' to prevent runtime errors and ensure code clarity in batch conversion mode.

Implement wildcard file search to resolve audio path mismatch This commit addresses a file path resolution issue in audio.py, specifically when dealing with audio files generated with random suffixes in their filenames (e.g., temporary files). 1. Add glob import: Imports the standard 'glob' module for wildcard searching. 2. Wildcard Path Resolution: If os.path.exists(file) is False, a wildcard search is performed by inserting '*' before the file extension (e.g., 'file.wav' becomes 'file*.wav'). This correctly finds the temporary audio file even if it contains a random suffix, preventing the original 'file not found' error.

Implement Base64 audio encoding/decoding logic in vc_single This commit focuses on the core logic within modules.py to fully enable Base64 audio data flow for API endpoints (e.g., infer_convert). 1. Fix Base64 Input Handling (f0_file object): - Corrects the bug where Base64 input failed because the function expected a file path. - If f0_file is a file-like object, the code now extracts the path via the '.name' attribute to ensure compatibility with subsequent file-based processing. 2. Implement Base64 Output: - The converted audio (audio_opt) is now safely written to a temporary WAV file using soundfile.write. - The WAV content is read and encoded into a Base64 Data URI (data:audio/wav;base64,...). - The function now returns this Base64 data as the 3rd element in the return tuple, allowing external tools to bypass the Gradio Audio component and receive raw data. 3. Fix vc_multi Compatibility: - The return value of vc_single was increased from 2 to 3 elements. - Updated the call site in vc_multi to accept the 3rd element as 'base64_opt' to prevent runtime errors and ensure code clarity in batch conversion mode.

neon-aiart added 5 commits November 24, 2025 10:09

feat: Update infer-web.py to include Base64 output component

fda7ec5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Enable Base64 audio input/output handling for API endpoints #2719

fix: Enable Base64 audio input/output handling for API endpoints #2719

Uh oh!

neon-aiart commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Enable Base64 audio input/output handling for API endpoints #2719

Are you sure you want to change the base?

fix: Enable Base64 audio input/output handling for API endpoints #2719

Uh oh!

Conversation

neon-aiart commented Nov 24, 2025

Pull request checklist

PR type

Description

Rationale and Value

Detailed Changes

1. Core Logic Fix (modules.py)

2. Path Reliability Fix (audio.py)

3. WebUI Component Update (infer-web.py)

What will it affect

Screenshot

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant