Skip to content

Commit d01c1f0

Browse files
committed
Add OpenAI-compatible Web API and loudness normalization
Implements a new FastAPI-based web API in the `webapi/` directory, providing an OpenAI-compatible endpoint for TTS generation. Includes: - API implementation and dependencies. - Unit tests (`test/test_webapi.py`). - Documentation (`docs/webapi.md`) and updates to main docs. Also integrates loudness normalization (based on myshell-ai#221) to improve audio output consistency (`melo/api.py`, `melo/utils.py`). Additional updates include: - New Android deployment documentation. - Training guide and script adjustments. - Updated requirements.
1 parent 2091453 commit d01c1f0

File tree

14 files changed

+622
-33
lines changed

14 files changed

+622
-33
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ multilingual_ckpts
66
basetts_outputs_package/
77
build/
88
*.egg-info/
9+
melo/data/Teto/
910

1011
*.zip
12+
*.mp3
13+
*.flac
1114
*.wav

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ Some other features include:
2727
## Usage
2828
- [Use without Installation](docs/quick_use.md)
2929
- [Install and Use Locally](docs/install.md)
30+
- [Deploy on Android](docs/deploy_on_android.md)
31+
- [OpenAI-compatible Web API](docs/webapi.md)
3032
- [Training on Custom Dataset](docs/training.md)
3133

3234
The Python API and model cards can be found in [this repo](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) or on [HuggingFace](https://huggingface.co/myshell-ai).

changes.txt

15.5 KB
Binary file not shown.

docs/deploy_on_android.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# Tutorial: Deploying MeloTTS on Android using Termux + proot Debian + Micromamba
2+
3+
MeloTTS is designed to be lightweight and efficient, making it an viable choice for deployment on Android devices.
4+
5+
This guide details how to install and run MeloTTS on an Android device using Termux, a Debian environment managed by `proot-distro`, and Micromamba for isolated Python environment management.
6+
7+
**Disclaimer:** This setup is primarily for experimental purposes, don't use it for production.
8+
9+
## Prerequisites
10+
11+
1. **Android Device:** A reasonably capable Android phone. (recommended to use flagship phones released after 2022)
12+
- **Important for Android 12+ users:** Before installing Termux, you should disable phantom process killing to prevent background processes from being terminated unexpectedly. This requires either root access or ADB with proper permissions. See the [Troubleshooting section](#troubleshooting-and-tips) for detailed instructions.
13+
2. **Termux App:** Installed on your device. Download the latest release from the [official GitHub repository](https://github.com/termux/termux-app/releases)
14+
3. **Internet Connection**
15+
4. **Storage Space:** Sufficient free space for Termux, Debian rootfs, Micromamba environments, Python dependencies (PyTorch), and MeloTTS source/models
16+
5. **Basic Linux Command Line Familiarity:** Helpful
17+
18+
## Step 1: Install and Prepare Termux
19+
20+
1. **Download and Install Termux:** Go to the [Termux GitHub Releases page](https://github.com/termux/termux-app/releases), download the latest `.apk` file appropriate for your device's architecture (usually `arm64-v8a`), and install it. Enable installation from unknown sources in Android settings if needed.
21+
2. **Open Termux.**
22+
3. **Update and upgrade Termux packages:** Run this command and answer `Y` (yes) to any prompts.
23+
```bash
24+
pkg update && pkg upgrade -y
25+
```
26+
4. **Install `proot-distro`, `git`, and `curl`:** `proot-distro` manages Linux distributions, `git` clones repositories, and `curl` downloads files.
27+
```bash
28+
pkg install proot-distro git curl -y
29+
```
30+
5. **Grant Storage Access:** Allows Termux/Debian to access your phone's shared storage.
31+
```bash
32+
termux-setup-storage
33+
```
34+
Confirm the permission request from Android. Shared storage is typically at `~/storage/shared/`.
35+
36+
## Step 2: Install Debian Environment
37+
38+
1. **Install Debian using `proot-distro`:** Downloads the Debian filesystem.
39+
```bash
40+
proot-distro install debian
41+
```
42+
43+
## Step 3: Enter Debian, Install Micromamba and System Dependencies
44+
45+
1. **Log in to Debian:**
46+
```bash
47+
proot-distro login debian
48+
```
49+
Your prompt should change (e.g., `root@localhost:~#`). **All subsequent commands in Steps 3-5 are run inside this Debian environment unless stated otherwise.**
50+
2. **Update Debian's package list and upgrade packages:**
51+
```bash
52+
apt update && apt upgrade -y
53+
```
54+
3. **Install essential build tools and runtime dependencies:**
55+
```bash
56+
yes | apt install build-essential libsndfile1 ffmpeg curl bzip2 git nano mecab libmecab-dev mecab-ipadic-utf8
57+
```
58+
4. **Install Micromamba:** Run the official installation script.
59+
```bash
60+
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
61+
```
62+
*Follow any on-screen instructions. Defaults are usually fine.*
63+
5. **Initialize Shell for Micromamba:** Ensure the `micromamba` command is accessible.
64+
```bash
65+
source ~/.bashrc
66+
```
67+
*(Or exit and re-login to Debian: `exit`, then `proot-distro login debian`)*.
68+
6. **Verify Micromamba Installation:**
69+
```bash
70+
micromamba --version
71+
```
72+
73+
## Step 4: Create Micromamba Environment and Install MeloTTS
74+
75+
1. **Create a dedicated environment:** Use Python 3.10.
76+
```bash
77+
micromamba create -n melotts python=3.10 -c conda-forge -y
78+
```
79+
2. **Activate the environment:** **Crucial step before proceeding.**
80+
```bash
81+
micromamba activate melotts
82+
```
83+
Your prompt should now be prefixed with `(melotts)`.
84+
3. **Clone the MeloTTS Repository:** Navigate to a suitable directory (e.g., `~/`) and clone the repo.
85+
```bash
86+
# Example: Clone into ~/MeloTTS
87+
cd ~
88+
git clone https://github.com/not-hanjo-mei/MeloTTS.git
89+
```
90+
4. **Navigate into the Cloned Directory:**
91+
```bash
92+
cd MeloTTS
93+
```
94+
5. **Install MeloTTS and Dependencies:**
95+
```bash
96+
pip install -e .
97+
```
98+
6. **Download Japanese Dictionary Data (UniDic):**
99+
```bash
100+
python -m unidic download
101+
```
102+
7. ***(Optional) Install eunjeon (for Korean support):***
103+
```bash
104+
pip install eunjeon python-mecab-ko python-mecab-ko-dic
105+
```
106+
8. **Download NLTK tagger:**
107+
```bash
108+
python -m nltk.downloader averaged_perceptron_tagger_eng
109+
```
110+
If the NLTK download fails, try this alternative method:
111+
```bash
112+
# Ensure you're in the MeloTTS directory and have NLTK installed
113+
python webapi/nltk_res.py
114+
```
115+
## Step 5: Use MeloTTS (Inside Activated Environment)
116+
117+
**IMPORTANT:** Ensure the `melotts` environment is active (`micromamba activate melotts`). Check for the `(melotts)` prefix in your prompt.
118+
119+
**Note:** Example scripts and test resources are available in the `test/` directory of the MeloTTS repository. You can use these files (such as `test_base_model_tts_package.py` and various example text files) to quickly verify your installation or experiment with the TTS functionality. See the contents of the `test/` folder for ready-to-use scripts and sample inputs.
120+
121+
For the latest and most detailed usage instructions (including WebUI, CLI, and Python API), please refer to the **Usage** section in [install.md](./install.md#usage).
122+
123+
This section covers how to:
124+
125+
- Launch and use the WebUI
126+
- Use the command-line interface (CLI) for TTS
127+
- Use the Python API for programmatic access
128+
- Find example scripts and test resources
129+
130+
## Step 6: Accessing Output Files from Android (Inside Debian)
131+
132+
1. **Identify File Location:** Use `pwd` (e.g., `~/MeloTTS/`).
133+
2. **Copy Files to Shared Storage:**
134+
* Example: Copy `output.wav` to Downloads folder:
135+
```bash
136+
# Adjust path if needed
137+
cp ./output.wav /sdcard/Download/output.wav
138+
```
139+
3. **Access on Android:** Use a File Manager app.
140+
141+
## Step 7: Exiting and Re-entering (Inside Debian)
142+
143+
1. **To fully exit:** `exit` at Termux prompt or close app.
144+
2. **To re-enter:**
145+
* Open Termux.
146+
* `proot-distro login debian`
147+
* `micromamba activate melotts`
148+
* `cd ~/MeloTTS` (if needed)
149+
* Run commands.
150+
* `micromamba deactivate`, `exit` when done.
151+
152+
## Troubleshooting and Tips
153+
154+
* **Check Environment Activation:** Always ensure `(melotts)` prefix is present.
155+
* **MeCab Issues:** If you see "RuntimeError: Could not configure working env. Have you installed MeCab?" during `pip install -e .` or runtime, this could be due to:
156+
- Missing system packages: Ensure `mecab`, `libmecab-dev`, and `mecab-ipadic-utf8` are properly installed via `apt`.
157+
- Python version mismatch: Make sure you created the Micromamba environment with Python 3.10 as specified in [Step 4](#step-4-create-micromamba-environment-and-install-melotts).
158+
* **Phantom Process Killing (Android 12+):** Android 12 and newer versions limit background processes, which can affect Termux. If you experience processes being killed unexpectedly:
159+
- **Using ADB from a PC (Recommended):** Connect your Android device to a PC with ADB installed and run:
160+
```bash
161+
# Disable phantom process killing
162+
adb shell settings put global settings_enable_monitor_phantom_procs 0
163+
164+
# Set max_phantom_processes to maximum value to permanently disable killing of phantom processes
165+
adb shell "/system/bin/device_config put activity_manager max_phantom_processes 2147483647"
166+
```
167+
- Alternatively, for Android 12+, you can disable phantom process killing by running these commands in Termux (not in Debian):
168+
```bash
169+
# Disable phantom process killing
170+
settings put global settings_enable_monitor_phantom_procs 0
171+
172+
# Disable device config sync to prevent settings from being reset
173+
device_config set_sync_disabled_for_tests persistent
174+
175+
# Verify settings
176+
settings get global settings_enable_monitor_phantom_procs
177+
device_config get_sync_disabled_for_tests
178+
```
179+
- These commands require either root access or ADB with proper permissions.
180+
- This is especially important for long-running processes or when multiple processes are spawned.
181+
- For more detailed instructions on disabling phantom process killing, refer to [this comprehensive guide](https://github.com/agnostic-apollo/Android-Docs/blob/master/en/docs/apps/processes/phantom-cached-and-empty-processes.md#commands-to-disable-phantom-process-killing-and-tldr).
182+
* **Performance/RAM:** Significant limitations remain on mobile devices.

docs/install.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@
55
- [Docker Install for Windows and macOS](#docker-install)
66
- [Usage](#usage)
77
- [Web UI](#webui)
8+
- [Web API (OpenAI Compatible)](#web-api-openai-compatible)
89
- [CLI](#cli)
910
- [Python API](#python-api)
1011

1112
### Linux and macOS Install
12-
The repo is developed and tested on `Ubuntu 20.04` and `Python 3.9`.
13+
**Tested Environments:**
14+
- [Original repository](https://github.com/myshell-ai/MeloTTS): Ubuntu 20.04 + Python 3.9
15+
- [This fork](https://github.com/not-hanjo-mei/MeloTTS): Ubuntu 24.04 + Python 3.10(conda 24.9.2), Debian 12 + Python 3.10(Micromamba 2.1.0)
1316
```bash
14-
git clone https://github.com/myshell-ai/MeloTTS.git
17+
git clone https://github.com/not-hanjo-mei/MeloTTS.git
1518
cd MeloTTS
1619
pip install -e .
1720
python -m unidic download
@@ -25,7 +28,7 @@ To avoid compatibility issues, for Windows users and some macOS users, we sugges
2528

2629
This could take a few minutes.
2730
```bash
28-
git clone https://github.com/myshell-ai/MeloTTS.git
31+
git clone https://github.com/not-hanjo-mei/MeloTTS.git
2932
cd MeloTTS
3033
docker build -t melotts .
3134
```
@@ -51,6 +54,10 @@ melo-ui
5154
# Or: python melo/app.py
5255
```
5356

57+
### Web API (OpenAI Compatible)
58+
59+
See [webapi.md](./webapi.md) for more details.
60+
5461
### CLI
5562

5663
You may use the MeloTTS CLI to interact with MeloTTS. The CLI may be invoked using either `melotts` or `melo`. Here are some examples:

docs/training.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
## Training
22

3-
Before training, please install MeloTTS in dev mode and go to the `melo` folder.
4-
```
3+
Before training, please install MeloTTS in dev mode and required dependencies, then go to the `melo` folder. Note: This training process assumes a proper Linux environment. For debugging issues during installation, you may want to check the [deploy_on_android.md](deploy_on_android.md) guide for additional troubleshooting tips.
4+
```bash
55
pip install -e .
6+
pip install matplotlib==3.5.3
67
cd melo
78
```
89

@@ -16,14 +17,14 @@ path/to/audio_002.wav |<speaker_name>|<language_code>|<text_002>
1617
The transcribed text can be obtained by ASR model, (e.g., [whisper](https://github.com/openai/whisper)). An example metadata can be found in `data/example/metadata.list`
1718

1819
We can then run the preprocessing code:
19-
```
20+
```bash
2021
python preprocess_text.py --metadata data/example/metadata.list
2122
```
2223
A config file `data/example/config.json` will be generated. Feel free to edit some hyper-parameters in that config file (for example, you may decrease the batch size if you have encountered the CUDA out-of-memory issue).
2324

2425
### Training
2526
The training can be launched by:
26-
```
27+
```bash
2728
bash train.sh <path/to/config.json> <num_of_gpus>
2829
```
2930

@@ -34,4 +35,3 @@ Simply run:
3435
```
3536
python infer.py --text "<some text here>" -m /path/to/checkpoint/G_<iter>.pth -o <output_dir>
3637
```
37-

docs/webapi.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# MeloTTS Web API
2+
3+
[This fork](https://github.com/not-hanjo-mei/MeloTTS) of MeloTTS provides an OpenAI-compatible web API for text-to-speech conversion, allowing you to use MeloTTS with the OpenAI Python SDK or any other client that supports the OpenAI API format.
4+
5+
## Starting the Web API Server
6+
7+
To start the web API server, run the following command from the MeloTTS root directory:
8+
9+
```bash
10+
python webapi/webapi.py
11+
```
12+
13+
This will start the server on port 18000 by default. You can access a simple API documentation at `http://localhost:18000/docs`.
14+
15+
## API Endpoint
16+
17+
The API implements the OpenAI-compatible endpoint for text-to-speech:
18+
19+
```
20+
POST /v1/audio/speech
21+
```
22+
23+
### Request Parameters
24+
25+
| Parameter | Type | Description | Default |
26+
|-----------|------|-------------|--------|
27+
| `model` | string | The model to use for text-to-speech, currently does nothing | `"tts-1"` |
28+
| `input` | string | The text to convert to speech | Required |
29+
| `voice` | string | The voice to use, format can be `"lang/speaker"` or just a speaker ID | `"EN/EN-Default"` |
30+
| `response_format` | string | The format of the response (mp3, flac, wav) | `"mp3"` |
31+
| `speed` | float | The speed of the speech | `1.0` |
32+
33+
### Voice Format
34+
35+
The `voice` parameter can be specified in two formats:
36+
37+
1. `"language/speaker"` - e.g., `"EN/EN-Default"`, `"ZH/ZH"`, etc.
38+
2. Just the speaker ID - e.g., `"EN-Default"`, `"ZH"`, etc.
39+
40+
If only the speaker ID is provided, the language will be auto-detected from the input text.
41+
42+
### Supported Languages and Voices
43+
44+
The API supports the following languages and voices:
45+
46+
- English (EN): `EN-Default`, `EN-US`, `EN-BR`, `EN_INDIA`, `EN-AU`
47+
- Spanish (ES): `ES`
48+
- French (FR): `FR`
49+
- Chinese (ZH): `ZH` (supports mixed Chinese and English)
50+
- Japanese (JP): `JP`
51+
- Korean (KR): `KR`
52+
53+
## Example Usage with OpenAI Python SDK
54+
55+
You can use the MeloTTS web API with the OpenAI Python SDK as follows:
56+
57+
```python
58+
# You might want run this file in other environment with the OpenAI Python SDK
59+
60+
from pathlib import Path
61+
import openai
62+
63+
client = openai.OpenAI(api_key="sk-xxx", base_url="http://localhost:18000/v1")
64+
65+
speech_file_path = Path(__file__).parent / "speech.mp3"
66+
67+
with client.audio.speech.with_streaming_response.create(
68+
model="tts-1",
69+
voice="",
70+
input="Dirty deeds done dirt cheap.",
71+
) as response:
72+
response.stream_to_file(speech_file_path)
73+
```
74+
75+
## Language Auto-detection
76+
77+
The API includes automatic language detection. If the language is not specified in the `voice` parameter, it will be detected from the input text. If the detected language doesn't match the specified language, the API will use the appropriate model for the detected language.
78+
79+
## Error Handling
80+
81+
If an error occurs during speech generation, the API will return a 500 error with details about the error.
82+
83+
## Notes
84+
85+
- The API automatically selects the appropriate hardware (CPU/GPU) for inference.
86+
- Temporary files are automatically cleaned up after streaming.
87+
- For best performance, specify the language in the `voice` parameter.

melo/api.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,10 @@ def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0.2, noise_s
121121
length_scale=1. / speed,
122122
)[0][0, 0].data.cpu().float().numpy()
123123
del x_tst, tones, lang_ids, bert, ja_bert, x_tst_lengths, speakers
124-
#
125-
audio_list.append(audio)
124+
125+
# Ref:
126+
# https://github.com/myshell-ai/MeloTTS/pull/221
127+
audio_list.append(utils.fix_loudness(audio,self.hps.data.sampling_rate))
126128
torch.cuda.empty_cache()
127129
audio = self.audio_numpy_concat(audio_list, sr=self.hps.data.sampling_rate, speed=speed)
128130

melo/text/chinese_mix.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ def _g2p_v2(segments):
238238

239239
text = "NFT啊!chemistry 但是《原神》是由,米哈\游自主, [研发]的一款全.新开放世界.冒险游戏"
240240
text = '我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。'
241-
text = '今天下午,我们准备去shopping mall购物,然后晚上去看一场movie。'
241+
text = '你们有一个好,全世界跑到什么地方,你们比其他的西方记者啊,跑得还快。但是呢,问来问去的问题啊,都 too simple , sometimes naive !'
242242
text = '我们现在 also 能够 help 很多公司 use some machine learning 的 algorithms 啊!'
243243
text = text_normalize(text)
244244
print(text)

0 commit comments

Comments
 (0)