The Enterprise AI includes XIM (Xeon Inference Microservice) and scalable cloud native framework which is part of OPEA(Open Platform Enterprise AI).
Xeon Inference Microservice (XIM) is a scalable and stateless container service exposing standard resful APIs. It allow Intel accelerators to optimize the inference engine and customized model for AIGC workload.
Layer name | Description |
---|---|
Accelerators | A XIM could be optimized by any of Intel Accelerators like AMX/VNNI/AVX512 etc |
Optimized Engine | Intel provide many engine for different purposes like OneAPI, xFT, IPEX |
Models | A model can be customized for xFT format in different Quantization like BF16/INT8/FP4 etc |
Microservices | A container services with stateless design to support scalable ochrestartion |
API | LangChain/LlamaIndex and existing vendor like OpenAI provide industrial standard restfule API to expsoe service |
Please refer here for more details.
More Business pipeline please refer to OPEA's GenAIExamples
Name | Description | Registry |
---|---|---|
ASR (whisper) | Auto Speech Recognition | registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisper-server |
ASR + Diarize (whisperx) | Speech Recognition + Speaker Recognition | registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisperx-server |
ASR (fast-whisper) | Accelerated ASR | registry.cn-hangzhou.aliyuncs.com/kenplusplus/faster-whisper-server |
FastChat | AMX opted IPEX based LLM | registry.cn-hangzhou.aliyuncs.com/kenplusplus/fastchat-server |
TTS (OpenVoice) | Text to Speech | registry.cn-hangzhou.aliyuncs.com/kenplusplus/openvoice-server |
TTS (OpenTTS) | Text to Speech | registry.cn-hangzhou.aliyuncs.com/kenplusplus/opentts-server |
Following models are used:
Name | Size | Micro Services | Description |
---|---|---|---|
THUDM/chatglm2-6b | 12G | FastChat | LLM model |
Trelis/Llama-2-7b-chat-hf-shared-bf16 | 25G | FastChat | LLM model using BF16 for AMX |
lmsys/vicuna-7b-v1.3 | 13.5G | FastChat | LLM model using INT8 for VNNI |
Systran/faster-whisper-tiny | 75M | faster-whisper | Speech Recognition model |
pyannote/speaker-diarization-3.1 | 14M | whisperx-server | Speaker Diarize |
pyannote/segmentation-3.0 | 5.8M | whisperx-server | Speech Segmentation |
jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn | 2.4G | whisperx-server | Chinese Speech to vector |
pyannote/wespeaker-voxceleb-resnet34-LM | 51M | whisperx-server | Extract embedding |
silero-vad | 17M | openvoice-server | Voice Activity Detector |
whisper(small) | 244M | whisper-server | OpenAI whisper model |
TBD
TBD
TBD
TBD
TBD
TBD
TBD