Skip to content

Conversation

@neon-aiart
Copy link

Pull request checklist

  • The PR has a proper title. Use Semantic Commit Messages.

  • Make sure this is ready to be merged into the relevant branch.

  • Ensure you can run the codes you submitted succesfully. These submissions will be prioritized for review:

    Fix existing bugs reported by user feedback (or you met);
    Introduce more convenient user operations.

PR type

  • Bug fix / new feature

Description

This PR implements essential fixes and extensions to enable robust Base64 audio
data transmission for API endpoints (e.g., infer_convert), directly addressing existing
input failures.

Rationale and Value

This change addresses a critical gap where Base64 handling, despite being implied in API documentation,
was non-functional.

  1. Enabling Extensibility: Allows programs that cannot access local file systems (such as
    JavaScript running in a browser) to utilize RVC's API, significantly boosting integration
    potential. This is already proven utility with the 'Neon Spitch Link' UserScript.
  2. Achieving API Functionality: Resolves the issue that prevented Base64 input from being
    processed correctly as per API expectations, making the feature fully functional as intended
    by the API design.

Detailed Changes

This change revolves around resolving input failures and providing a clean output path.

1. Core Logic Fix (modules.py)

  • Input Fix: Corrected vc_single to properly handle file-like objects (Base64 input).
  • Output Feature & Backward Compatibility (Key Point): Added Base64 data as the 3rd return parameter in vc_single. This approach ensures no modification to existing code relying only on the 1st and 2nd return parameters (text info and audio component tuple). This preserves compatibility.

2. Path Reliability Fix (audio.py)

  • Implemented glob wildcard search in load_audio to successfully resolve temporary audio file paths containing random suffixes, preventing 'file not found' errors.

3. WebUI Component Update (infer-web.py)

  • Added the necessary hidden Gradio JSON output component to carry the Base64 data from the backend to the frontend.

What will it affect

  • Positive: Enables functional Base64 audio input and output for API users.
  • Minimal: Controlled changes to three core files.

Screenshot

  • Please include a screenshot if applicable
    • Not applicable (API/Backend feature).

Implement Base64 audio encoding/decoding logic in vc_single

This commit focuses on the core logic within modules.py to fully enable Base64 audio 
data flow for API endpoints (e.g., infer_convert).

1. Fix Base64 Input Handling (f0_file object):
   - Corrects the bug where Base64 input failed because the function expected a file path.
   - If f0_file is a file-like object, the code now extracts the path via the '.name' attribute 
     to ensure compatibility with subsequent file-based processing.

2. Implement Base64 Output:
   - The converted audio (audio_opt) is now safely written to a temporary WAV file using soundfile.write. 
   - The WAV content is read and encoded into a Base64 Data URI (data:audio/wav;base64,...).
   - The function now returns this Base64 data as the 3rd element in the return tuple, 
     allowing external tools to bypass the Gradio Audio component and receive raw data.

3. Fix vc_multi Compatibility:
   - The return value of vc_single was increased from 2 to 3 elements. 
   - Updated the call site in vc_multi to accept the 3rd element as 'base64_opt' 
     to prevent runtime errors and ensure code clarity in batch conversion mode.
Implement wildcard file search to resolve audio path mismatch

This commit addresses a file path resolution issue in audio.py, specifically when 
dealing with audio files generated with random suffixes in their filenames (e.g., temporary files).

1. Add glob import: Imports the standard 'glob' module for wildcard searching.
2. Wildcard Path Resolution: If os.path.exists(file) is False, a wildcard search is performed 
   by inserting '*' before the file extension (e.g., 'file.wav' becomes 'file*.wav').
   This correctly finds the temporary audio file even if it contains a random suffix, 
   preventing the original 'file not found' error.
Implement wildcard file search to resolve audio path mismatch

This commit addresses a file path resolution issue in audio.py, specifically when 
dealing with audio files generated with random suffixes in their filenames (e.g., temporary files).

1. Add glob import: Imports the standard 'glob' module for wildcard searching.
2. Wildcard Path Resolution: If os.path.exists(file) is False, a wildcard search is performed 
   by inserting '*' before the file extension (e.g., 'file.wav' becomes 'file*.wav').
   This correctly finds the temporary audio file even if it contains a random suffix, 
   preventing the original 'file not found' error.
Implement Base64 audio encoding/decoding logic in vc_single

This commit focuses on the core logic within modules.py to fully enable Base64 audio 
data flow for API endpoints (e.g., infer_convert).

1. Fix Base64 Input Handling (f0_file object):
   - Corrects the bug where Base64 input failed because the function expected a file path.
   - If f0_file is a file-like object, the code now extracts the path via the '.name' attribute 
     to ensure compatibility with subsequent file-based processing.

2. Implement Base64 Output:
   - The converted audio (audio_opt) is now safely written to a temporary WAV file using soundfile.write. 
   - The WAV content is read and encoded into a Base64 Data URI (data:audio/wav;base64,...).
   - The function now returns this Base64 data as the 3rd element in the return tuple, 
     allowing external tools to bypass the Gradio Audio component and receive raw data.

3. Fix vc_multi Compatibility:
   - The return value of vc_single was increased from 2 to 3 elements. 
   - Updated the call site in vc_multi to accept the 3rd element as 'base64_opt' 
     to prevent runtime errors and ensure code clarity in batch conversion mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant