PaddleSpeech Architecture

Layout of Source File

Full directory structure as follows:

paddlespeech
├── cli
│   ├── asr
│   ├── cls
│   ├── st
│   ├── text
│   └── tts
├── cls
│   ├── exps
│   │   └── panns
│   │       └── deploy
│   └── models
│       └── panns
├── s2t
│   ├── decoders
│   │   ├── beam_search
│   │   ├── ctcdecoder
│   │   │   └── tests
│   │   └── scorers
│   ├── exps
│   │   ├── deepspeech2
│   │   │   └── bin
│   │   │       └── deploy
│   │   ├── lm
│   │   │   └── transformer
│   │   │       └── bin
│   │   ├── u2
│   │   │   └── bin
│   │   ├── u2_kaldi
│   │   │   └── bin
│   │   └── u2_st
│   │       └── bin
│   ├── frontend
│   │   ├── augmentor
│   │   └── featurizer
│   ├── io
│   ├── models
│   │   ├── ds2
│   │   ├── ds2_online
│   │   ├── lm
│   │   ├── u2
│   │   └── u2_st
│   ├── modules
│   ├── training
│   │   ├── extensions
│   │   ├── triggers
│   │   └── updaters
│   ├── transform
│   └── utils
├── t2s
│   ├── audio
│   ├── data
│   ├── datasets
│   ├── exps
│   │   ├── fastspeech2
│   │   ├── gan_vocoder
│   │   │   ├── hifigan
│   │   │   ├── multi_band_melgan
│   │   │   ├── parallelwave_gan
│   │   │   └── style_melgan
│   │   ├── new_tacotron2
│   │   ├── speedyspeech
│   │   ├── tacotron2
│   │   ├── transformer_tts
│   │   ├── voice_cloning
│   │   │   └── tacotron2_ge2e
│   │   └── waveflow
│   ├── frontend
│   │   ├── normalizer
│   │   └── zh_normalization
│   ├── models
│   │   ├── fastspeech2
│   │   ├── hifigan
│   │   ├── melgan
│   │   ├── new_tacotron2
│   │   ├── parallel_wavegan
│   │   ├── speedyspeech
│   │   └── transformer_tts
│   ├── modules
│   │   ├── conformer
│   │   ├── predictor
│   │   ├── tacotron2
│   │   └── transformer
│   ├── training
│   │   ├── extensions
│   │   ├── triggers
│   │   └── updaters
│   └── utils
├── text
│   ├── exps
│   │   └── ernie_linear
│   └── models
│       ├── ernie_crf
│       └── ernie_linear
└── vector
    ├── exps
    │   └── ge2e
    └── models

97 directories

Speech Task and Toolbox

speech task as follows:

paddlespeech
├── cls (audio classfication/detection, emotion/gender/age recognition and so on)
├── s2t (speech to text task, e.g ASR, ST)
├── t2s (text to speech, e.g TTS, Voice Cloning, Voice Conversion, Music, Sing Voice Synthesis)
├── text (speech related text task, e.g. punctuation restoration, text corrector, model based text front-end)
└── vector (speech task which need extracting vector feature, e.g Speaker Verification/Identification, Language Identification, Speaker Dirazation)

speech toolbox as follows:

paddlespeech
├── cli (CLI toolbox for multi speech task)
└── server (Server/Client for speech task)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleSpeech Architecture

Layout of Source File

Speech Task and Toolbox

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally