Skip to content

PFCCLab/ERNIE4.5-Developer-Resource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚀 ERNIE 4.5: The Developer's Resource Guide 🤖

Welcome to the developer resource guide for ERNIE 4.5, a powerful family of open-source models from Baidu. This guide provides all the essential information, links, and code examples to help you get started with deploying ERNIE 4.5 models.

🔗 Quick Links

Resource URL
📝 Blog https://yiyan.baidu.com/blog
📄 Technical Report https://yiyan.baidu.com/blog/publication
🤗 Hugging Face https://huggingface.co/baidu
🔧 ERNIEKit https://github.com/PaddlePaddle/ERNIE
⚡ FastDeploy https://www.modelscope.cn/studios/PaddlePaddle
💡 Baidu AI Studio https://aistudio.baidu.com/
🔅 ModelScope https://www.modelscope.cn/studios/PaddlePaddle

📦 Open Source Models

ERNIE 4.5 is available under the Apache 2.0 License. The open-source release includes 10 models across 3 series, along with code for pre-training, fine-tuning, and inference deployment.

Series Activated Parameters Model Name Suffix Description
0.3B Series ~300 Million -0.3B Lightweight models suitable for local and on-device deployment.
A3B Series ~3 Billion -A3B Efficient models offering a balance of performance and resource usage.
A47B Series ~47 Billion -A47B State-of-the-art models for maximum performance on complex tasks.

🏷️ Naming Conventions:

  • -Base: The foundational pre-trained model.
  • (no suffix): The instruction-tuned chat model.
  • -VL: The Vision-Language multimodal model.
  • Hybrid Thinking: The VL model features a "thinking mode" (controlled by a parameter) that enhances reasoning, alongside a standard non-thinking mode for fast perception.

👩‍💻 Getting Started: Running ERNIE 4.5 Locally

You can run the lightweight ERNIE 4.5 models on your local machine. Below are examples using llama.cpp for general CPU inference and MNN for optimized on-device deployment.

🍎 Example 1: Running with llama.cpp (for ERNIE-4.5-0.3B)

The llama.cpp project supports the ERNIE 4.5 0.3B models, allowing you to run them efficiently on a CPU.

Step 1️⃣: Clone and Build llama.cpp First, get the latest version of llama.cpp which includes support for ERNIE 4.5.

# Clone the repository
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

# Build the project
mkdir build  
cd build
cmake ..
make

Step 2️⃣: Download the ERNIE 4.5 GGUF Model download the .gguf file.

# Install huggingface_hub
pip install -U huggingface_hub
huggingface-cli download --resume-download unsloth/ERNIE-4.5-0.3B-PT-GGUF --local-dir path/to/dir
# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com

Step 3️⃣: Run Inference Use the main executable from llama.cpp to run the model.

# Run the model in interactive mode
cd llama.cpp/build/bin
./llama-cli -m /path/to/dir/ERNIE-4.5-0.3B-PT.gguf --jinja -p "Hello, who are you?" -n 128
  • -m: Specifies the path to your GGUF model file.
  • -p: Provides an initial prompt.
  • -n: Sets the number of tokens to generate.

🍏 Example 2: Running with MNN (for ERNIE-4.5-0.3B-PT-MNN)

Reference project: https://huggingface.co/taobao-mnn/ERNIE-4.5-0.3B-PT-MNN, welcome to visit the original author link

MNN is a highly efficient deep learning inference engine, perfect for edge and mobile devices. A 4-bit quantized version of ERNIE 4.5 is available specifically for MNN.

Step 1️⃣: Download the MNN Model You can download the model from Hugging Face or ModelScope.

# Install Hugging Face Hub
pip install -U huggingface_hub
# Download the model files
# shell download
huggingface-cli download --resume-download taobao-mnn/ERNIE-4.5-0.3B-PT-MNN --local-dir path/to/dir
# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com
# SDK download
from huggingface_hub import snapshot_download
model_dir = snapshot_download('taobao-mnn/ERNIE-4.5-0.3B-PT-MNN')
# git clone
git clone https://www.modelscope.cn/MNN/ERNIE-4.5-0.3B-PT-MNN

Step 2️⃣: Clone and Compile MNN You need to compile the MNN engine from the source with the correct flags to enable LLM support.

# Clone the MNN repository
git clone https://github.com/alibaba/MNN.git
cd MNN

# Create build directory and compile
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j

Step 3️⃣: Run the Demo Use the llm_demo application to run the model.

# Run the MNN demo
./llm_demo /path/to/ERNIE-4.5-0.3B-PT-MNN/config.json prompt.txt

🍊 Example 3: Running with mlx (for ERNIE-4.5-0.3B-PT-bf16)

Reference project: https://huggingface.co/mlx-community/ERNIE-4.5-0.3B-PT-bf16, welcome to visit the original author link

MLX LM is a Python package for generating text and fine-tuning large language models on Apple silicon with MLX.

This model mlx-community/ERNIE-4.5-0.3B-PT-bf16 was converted to MLX format from baidu/ERNIE-4.5-0.3B-PT using mlx-lm version 0.25.2.

Step 1️⃣: Download the mlx Model

# Install Hugging Face Hub
pip install -U huggingface_hub
# Download the model files
# shell download
huggingface-cli download --resume-download mlx-community/ERNIE-4.5-0.3B-PT-bf16 --local-dir path/to/dir
# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com

Step 2️⃣: Use with mlx

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/ERNIE-4.5-0.3B-PT-bf16")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

🌍 Developer Ecosystem and Tools

🛠️ Official Toolkits (PaddlePaddle Based)

  • ERNIEKit: An industrial-grade toolkit for the full development lifecycle of ERNIE models. It supports high-performance pre-training, SFT, DPO, LoRA, and quantization (QAT/PTQ).
  • FastDeploy: A production-ready inference and deployment toolkit. It features advanced acceleration (speculative decoding, MTP), comprehensive quantization support, and compatibility with numerous hardware backends (NVIDIA, Kunlunxin, Ascend, etc.).

🤝 Friends of OSS Projects (Third-Party Integrations)

ERNIE 4.5 is being actively integrated into the wider open-source ecosystem. Here is the current status of support in popular projects:

Project Status
transformers Merged 🎉 ! Ernie 0.3B and MoE models are now integrated! Directly usable. ⚙️ (Repo)(PR #39228)
In Progress: Ernie 4.5 VL models #39585 (Draft)
vLLM Merged 🎉 ! Native support for ERNIE 4.5 text models is now available in the main branch. (PR #20220)
Merged 🎉 ! Added ERNIE 4.5 VL Model Support (PR #22514)
Merged 🎉 !: Enable EPLB on ernie4.5-moe (PR #22100)
sglang Merged 🎉 ! ERNIE 4.5 is now supported in sglang, enabling streamlined usage in structured generation and multi-agent orchestration scenarios. (PR #7657)
llama.cpp/ollama Merged 🎉 ! 0.3B models and Ernie4.5 MoE are already supported in llama.cpp — efficient local CPU inference available. (PR #14408)(PR #14746)
ms-swift Merged 🎉 ! Support for ERNIE 4.5 has been integrated, enabling streamlined fine-tuning and inference within the ModelScope ecosystem. (PR #4757)
Open: Add ERNIE VL support (PR #4763)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •