Skip to content

kenplusplus/enterprise-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enterprise AI

1. Overall

The Enterprise AI includes XIM (Xeon Inference Microservice) and scalable cloud native framework which is part of OPEA(Open Platform Enterprise AI).

Xeon Inference Microservice (XIM) is a scalable and stateless container service exposing standard resful APIs. It allow Intel accelerators to optimize the inference engine and customized model for AIGC workload.

Layer name Description
Accelerators A XIM could be optimized by any of Intel Accelerators like AMX/VNNI/AVX512 etc
Optimized Engine Intel provide many engine for different purposes like OneAPI, xFT, IPEX
Models A model can be customized for xFT format in different Quantization like BF16/INT8/FP4 etc
Microservices A container services with stateless design to support scalable ochrestartion
API LangChain/LlamaIndex and existing vendor like OpenAI provide industrial standard restfule API to expsoe service

Please refer here for more details.

2. Business Pipeline

2.1 ChatBot Pipeline

2.2 Meeting Summary Pipeline

More Business pipeline please refer to OPEA's GenAIExamples

3. XIM (Xeon Inference Microservice)

Name Description Registry
ASR (whisper) Auto Speech Recognition registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisper-server
ASR + Diarize (whisperx) Speech Recognition + Speaker Recognition registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisperx-server
ASR (fast-whisper) Accelerated ASR registry.cn-hangzhou.aliyuncs.com/kenplusplus/faster-whisper-server
FastChat AMX opted IPEX based LLM registry.cn-hangzhou.aliyuncs.com/kenplusplus/fastchat-server
TTS (OpenVoice) Text to Speech registry.cn-hangzhou.aliyuncs.com/kenplusplus/openvoice-server
TTS (OpenTTS) Text to Speech registry.cn-hangzhou.aliyuncs.com/kenplusplus/opentts-server

Following models are used:

Name Size Micro Services Description
THUDM/chatglm2-6b 12G FastChat LLM model
Trelis/Llama-2-7b-chat-hf-shared-bf16 25G FastChat LLM model using BF16 for AMX
lmsys/vicuna-7b-v1.3 13.5G FastChat LLM model using INT8 for VNNI
Systran/faster-whisper-tiny 75M faster-whisper Speech Recognition model
pyannote/speaker-diarization-3.1 14M whisperx-server Speaker Diarize
pyannote/segmentation-3.0 5.8M whisperx-server Speech Segmentation
jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn 2.4G whisperx-server Chinese Speech to vector
pyannote/wespeaker-voxceleb-resnet34-LM 51M whisperx-server Extract embedding
silero-vad 17M openvoice-server Voice Activity Detector
whisper(small) 244M whisper-server OpenAI whisper model

4. Business Pipeline Orchestration

4.1 Flowise

TBD

4.2 Dify

TBD

5. Cloud Native Services Orchestration

TBD

5.1 Scalability for Concurrent

TBD

5.2 Sustainability

TBD

5.3 Confidentiality

TBD

6. Deployment

TBD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 5