Release v3.7.0 · modelscope/ms-swift

中文版

新特性

GRPO：
a. 支持GSPO算法，在GRPO训练中使用参数--importance_sampling_level sequence，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO server mode 支持多机 rollout，支持传入多个 vllm_server_host/port，参考脚本：https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout 兼容 GYM 环境规范（感谢开发者Mouse的贡献），参考文档 https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/GYM%E7%8E%AF%E5%A2%83%E8%AE%AD%E7%BB%83.html
d. GRPO 支持 entropy_mask 来过滤低熵token损失计算，同时logger支持记录熵值动态，参考文档https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. 支持多轮算法DeepEyes训练，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO 支持--truncation_strategy delete，删除输入长度超过max_length的数据，并重新采样。
Megatron-SWIFT：
a. 支持使LoRA训练，现支持CPT/SFT/DPO，显著加速MoE训练速度。
- 文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html#lora
- 训练脚本：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. 支持loss scale，方便Agent训练，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. 默认megatron-core版本升级至0.13。
d. 支持bshd格式，方便自定义attention_mask。
e. 日志优化：新增GPU占用、剩余训练时间等信息打印，并输出logging.jsonl存储训练日志。
f. 模型加载与转换速度优化，并增加模型加载进度条。
训练：
a. 支持Flash-Attention-3（含Megatron-SWIFT），训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. 新增--new_speical_tokens参数，方便新增特殊tokens。训练脚本参考: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. 新增--cached_dataset参数，支持CPT/SFT的离线tokenize。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. 序列Packing模块重构。加速Packing速度，并对多模态packing的磁盘存储问题优化。
e. 支持Qwen2.5-VL混合模态数据（即单条数据中含多种模态） + deepspeed训练。
f. 多模态模型训练支持 loss_scale。
g. rope_scaling 支持传入字典，此外支持设置 max_model_len 对 rope_scaling 的 factor 自动调整。
h. 支持DeepSpeed-AutoTP（该技术不支持LoRA）。
i. 多模态Packing兼容 transformers>=4.53；序列并行兼容 transformers>=4.52。
j. resume_only_model默认将进行数据跳过，并使用ignore_data_skip参数进行控制。
k. MoE模型训练支持 router_aux_loss_coef 参数。
l. template新增max_length裁剪保护机制，不对图像/视频等tokens进行裁剪。
m. tuner_backend unsloth 支持moe模型、device_map和DDP。
n. embedding训练支持liger_kernel。
RLHF：
a. 支持MPO训练，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. 多模态DPO支持了拒绝图片输入，在数据集中加入rejected_images列。
推理部署：
a. 支持embedding系列模型的推理部署，包括pt/vllm/sglang的infer_backend。部署脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b. InferEngine支持return_details参数，以输出prompt_token_ids和token_ids。
c. vLLM推理引擎兼容更多多模态模型：ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4。
d. vLLM参数重构，参数名前加入vllm_前缀。GRPO模块复用vLLM参数。
导出：
a. QLoRA支持Merge-LoRA，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. 支持MoE/多模态模型的FP8/BNB量化，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize

新模型

纯文本模型：
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, Qwen/Qwen3-4B-[Instruct/Thinking]-2507系列（含Megatron-SWIFT），训练脚本参考：#5033
b. openai-mirror/gpt-oss-20b系列，最佳实践参考：#5277
c. ZhipuAI/GLM-4.5系列（含Megatron-SWIFT），训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct系列，最佳实践参考：#5236
e. mistralai/Devstral-Small-2505
多模态模型：
a. OpenBMB/MiniCPM-V-4，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh

English Version

New Features

GRPO
a. Added support for the GSPO algorithm. Use --importance_sampling_level sequence during GRPO training. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO “server mode” now supports multi-node rollout; pass in multiple vllm_server_host/port. Example script: https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout is now GYM-compatible (thanks to contributor Mouse). Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/DeveloperGuide/gym_env.html
d. Added entropy_mask for filtering low-entropy tokens during loss computation, and the logger now tracks entropy dynamics. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. Added support for the multi-round DeepEyes algorithm. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO supports --truncation_strategy delete: remove samples whose input length exceeds max_length and resample.
Megatron-SWIFT
a. Added LoRA training (CPT/SFT/DPO) to significantly accelerate MoE training.
- Docs: https://swift.readthedocs.io/en/latest/Instruction/Megatron-SWIFT-Training.html#lora-training
- Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. Added loss-scaling to simplify Agent training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. Default megatron-core upgraded to 0.13.
d. Added bshd tensor format to facilitate custom attention_mask.
e. Logging improvements: prints GPU memory, estimated remaining time, and writes logging.jsonl.
f. Faster model loading & conversion plus a progress bar.
Training
a. Added Flash-Attention-3 support (including Megatron-SWIFT). Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. New --new_special_tokens flag for adding special tokens. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. New --cached_dataset flag for offline tokenization in CPT/SFT. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. Re-implemented the sequence-packing module for faster packing and better multimodal disk I/O.
e. Qwen2.5-VL hybrid-modal data (multiple modalities in a single sample) + DeepSpeed training supported.
f. Multimodal training now supports loss-scaling.
g. rope_scaling now accepts a dict; max_model_len can auto-adjust the scaling factor.
h. Added DeepSpeed-AutoTP (not compatible with LoRA).
i. Multimodal packing is compatible with transformers ≥ 4.53; sequence parallelism with transformers ≥ 4.52.
j. With resume_only_model, data skipping is enabled by default; control via ignore_data_skip.
k. MoE training supports router_aux_loss_coef.
l. Template files get a max_length clipping safeguard (no clipping of image/video tokens).
m. tuner_backend unsloth now supports MoE models, device_map, and DDP.
n. Embedding training supports liger_kernel.
RLHF
a. Added MPO training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. Multimodal DPO can now reject image inputs by adding a rejected_images column.
Inference & Deployment
a. Added deployment for embedding models across pt/vllm/sglang back-ends. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b. InferEngine supports return_details to output prompt_token_ids and token_ids.
c. vLLM back-end now supports more multimodal models: ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4.
d. vLLM arguments refactored: all start with the vllm_ prefix. GRPO module reuses the same options.
Export
a. QLoRA now supports Merge-LoRA. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. Added FP8 / BNB quantization for MoE and multimodal models. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize

New Models

Text-only
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, and Qwen/Qwen3-4B-[Instruct/Thinking]-2507 (Megatron-SWIFT supported). Training script: #5033
b. openai-mirror/gpt-oss-20b family. Best-practice: #5277
c. ZhipuAI/GLM-4.5 family (Megatron-SWIFT supported). Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct family. Best-practice: #5236
e. mistralai/Devstral-Small-2505
Multimodal
a. OpenBMB/MiniCPM-V-4. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh

What's Changed

[grpo] fix server arg check by @hjh0119 in #4865
[SP] clean up imports by @hjh0119 in #4878
fix loss_scale sp by @tastelikefeet in #4880
fix seq_cls generation_config by @Jintao-Huang in #4882
optimize imports by @tastelikefeet in #4883
[model] fix qwen eos_token by @Jintao-Huang in #4888
Fix: Correct training hang for Keye-VL on DeepSpeed with mixed data by @0russwest0 in #4889
[megatron] support LoRA & support loss_scale by @Jintao-Huang in #4812
update framework.txt by @Jintao-Huang in #4896
[megatron] fix pp mla by @Jintao-Huang in #4904
[megatron] update to mcore 0.13 by @Jintao-Huang in #4903
Fix the template suffix of qwen3 embedding by @tastelikefeet in #4909
Fix FlorenceTemplate for florence2 by @yaqiangsun in #4871
[megatron] add logging jsonl by @Jintao-Huang in #4908
[megatron] Support dpo lora by @Jintao-Huang in #4913
feat: rlhf generation samples log to swanlab by @Zeyi-Lin in #4907
[bugfix] fix profiling patch by @hjh0119 in #4915
[doc] fix reranker index by @hjh0119 in #4921
[megatron] support lora modules_to_save by @Jintao-Huang in #4916
[model] support Kimi-K2 template by @Jintao-Huang in #4925
Support infer and deploy of embedding models by @tastelikefeet in #4927
Add more comments on embedding deployment by @tastelikefeet in #4929
Fix the missing eval_acc issue (when use_logits_to_keep is True) by @Jintao-Huang in #4938
[train] fix channel_loss (qwen2.5-vl & packing) by @Jintao-Huang in #4941
[train] fix packing/padding_free & predict_with_generate by @Jintao-Huang in #4942
Support new special tokens by @Jintao-Huang in #4945
[sp] remove deprecated tokenizer by @hjh0119 in #4934
fix new_special_tokens by @Jintao-Huang in #4947
fix internvl3 eos_token by @Jintao-Huang in #4948
[loss_scale] fix loss_scale when meeting ,,
fix format reward by @tpx818 in #4949
[deploy/rollout] fix deploy ddp eval_human by @Jintao-Huang in #4952
Fix: Defer MathAccuracy initialization to init to avoid import-time dependency check by @hjh0119 in #4954
[infer] support more vllm multimodal model (vllm==0.9.2) by @Jintao-Huang in #4959
[megatron] support new_special_tokens by @Jintao-Huang in #4956
[rlhf] support rejected images by @hjh0119 in #4964
[megatron] support resume_lora by @Jintao-Huang in #4968
[doc] update grpo document by @hjh0119 in #4971
[grpo] fix save last-checkpoting by @hjh0119 in #4969
[Feature] 支持类似GYM环境训练接口，实现端到端的RL训练 by @woshixiaobai2019 in #4890
[grpo] support truncation_strategy delete by @hjh0119 in #4977
fix bugs by @Jintao-Huang in #4981
Remove the dependency on numpy<2.0 by @Jintao-Huang in #4960
[doc] fix index by @hjh0119 in #4984
fix SkipBatchSampler by @Jintao-Huang in #4985
[train] refactor resume_only_model by @Jintao-Huang in #4979
[megatron] update benchmark docs by @Jintao-Huang in #4991
fix: trainer state strict saving by @firefighter-eric in #4995
[grpo] fix gym check by @hjh0119 in #4998
[Fix] deepspeed example shell command path for host.txt by @dykderrick in #5001
[template] fix/pixtral/pixel_values & image_sizes by @CrownStar7 in #4982
refactor packing by @Jintao-Huang in #5000
[grpo] support multi node servers by @hjh0119 in #4999
support multimodal lazy_tokenize=False by @Jintao-Huang in #5006
fix pixtral-12b by @Jintao-Huang in #5007
fix vllm version parse by @hjh0119 in #5012
update docs by @Jintao-Huang in #5014
[model] support glm4_5_moe by @Jintao-Huang in #5015
[megatron] support glm4 moe by @Jintao-Huang in #5016
[megatron] compat peft==0.16 by @Jintao-Huang in #5017
[template] support mixed_data Qwen2.5-VL by @Jintao-Huang in #5018
Fix sp+tf4.53 by @tastelikefeet in #5024
fix finish_reason by @Jintao-Huang in #5025
support multimodal & moe fp8 & bnb by @Jintao-Huang in #5026
fix packing by @Jintao-Huang in #5027
update packing by @Jintao-Huang in #5029
[docs] update swanlab docs by @Jintao-Huang in #5030
[grpo] entropy mask by @hjh0119 in #4850
fix glm4_1v by @Jintao-Huang in #5032
[preprocessor] Multiple video formats can be concatenated. by @Jintao-Huang in #5038
Refactor vllm args by @Jintao-Huang in #5035
fix sp+tf52 by @tastelikefeet in #5041
[model] support Qwen3-235B-A22B-Instruct-250718 by @Jintao-Huang in #5033
fix qwen3 model_id by @Jintao-Huang in #5042
[infer/deploy] fix max_model_len by @Jintao-Huang in #5043
temporary fix ppo by @Jintao-Huang in #5045
[dpo] fix dpo get_train_dataloader by @Jintao-Huang in #5047
[grpo] fix log entropy by @hjh0119 in #5056
[rollout] add trl version check by @hjh0119 in #5049
[megatron] fix loss_scale by @Jintao-Huang in #5059
[grpo] fix entropy mask when set log_entropy by @hjh0119 in #5062
[megatron] support padding_free=False by @Jintao-Huang in #5057
[megatron] support mlp_padding_free by @Jintao-Huang in #5066
fix run_dataset_info by @Jintao-Huang in #5068
Fix awq quant by @Jintao-Huang in #5072
[model] support qwen3_coder by @Jintao-Huang in #5076
[model] support glm4_5 by @Jintao-Huang in #5031
[infer] support vllm_enable_expert_parallel by @Jintao-Huang in #5084
add qwen3_235b_shell by @Jintao-Huang in #5086
Support swift/Qwen3-235B-A22B-Instruct-2507-AWQ by @Jintao-Huang in #5090
Update shell by @Jintao-Huang in #5094
update model_id by @Jintao-Huang in #5100
fix kimi_vl by @Jintao-Huang in #5103
Fix llama4 by @Jintao-Huang in #5105
fix web_ui infer error by @Jintao-Huang in #5106
fix sglang infer hang by @Jintao-Huang in #5108
support Qwen3-235B-A22B-Thinking-2507 by @Jintao-Huang in #5113
fix hang by @tastelikefeet in #5114
[megatron] add fp8 shell by @Jintao-Huang in #5112
[template] fix qwen3_moe_thinking template by @Jintao-Huang in #5116
compat vllm 0.5.1 by @Jintao-Huang in #5117
Fix bug by @Jintao-Huang in #5121
Support AutoTP by @slin000111 in #5093
fix bug by @loadingyy in #5080
Support sglang enable_ep_moe & update code by @Jintao-Huang in #5122
fix sglang response_prefix by @Jintao-Huang in #5125
[grpo] support GSPO by @hjh0119 in #5126
fix new_speical_tokens multimodal by @Jintao-Huang in #5129
[Safety]Fix torch load by @tastelikefeet in #4802
fix get_hf_endpoint by @Jintao-Huang in #5135
[doc] fix gym example and doc by @hjh0119 in #5079
fix unsloth and support device_map by @tastelikefeet in #5139
InferEngine support return_details by @Jintao-Huang in #5134
Support ddp of unsloth by @tastelikefeet in #5141
[model] support ZhipuAI/GLM-4.5 series by @Jintao-Huang in #5142
[bugfix] fix vllm sleep&wake_up produces meaningless output by @hjh0119 in #5143
fix bugs by @tastelikefeet in #5147
[bugfix] grpo length context compatible with latest set_default_max_tokens by @hjh0119 in #5154
support Qwen3-30B-A3B-Instruct-2507 by @Jintao-Huang in #5149
Fix rope-scaling by @tastelikefeet in #5155
[grpo recipe] support deepeyes by @hjh0119 in #5082
[infer] support prompt_token_ids by @Jintao-Huang in #5152
[model] support mistralai/Devstral-Small-2505 by @hieuchi911 in #5102
[bugfix] fix grpo SP without padding_free error by @hjh0119 in #5164
update trl version requirement by @hjh0119 in #5170
Fix ci by @tastelikefeet in #5158
upgrade vllm to 0.10.0 by @hjh0119 in #5168
[rlhf] support MPO & DPO compatible with trl 0.20 by @hjh0119 in #5177
Fix run cmd with os.system by @slin000111 in #5182
[grpo] compatible with trl 0.2 by @hjh0119 in #5178
fix CVE-2025-50460: https://github.com/Anchor0221/CVE-2025-50460 by @tastelikefeet in #5174
support export cached_dataset by @Jintao-Huang in #4992
[train] moe support aux_loss by @Jintao-Huang in #5187
Support qlora merge-lora by @Jintao-Huang in #5190
[megatron] support more metrics by @Jintao-Huang in #5194
fix mcore 0.12 by @Jintao-Huang in #5195
support qwen3_coder 30b by @Jintao-Huang in #5207
Support flash-attention-3 by @Jintao-Huang in #5208
[utils] optimize git clone by @Jintao-Huang in #5214
[megatron] support flash-attention-3 by @Jintao-Huang in #5216
fix emb with liger kernel by @tastelikefeet in #5222
update wechat by @Jintao-Huang in #5223
fix reward_model by @Jintao-Huang in #5224
update swift image by @Jintao-Huang in #5225
update requirements by @Jintao-Huang in #5198
packing compat_transformers 4.53 by @Jintao-Huang in #4972
Fix: init_rope_scaling from model config first by @mungg in #5220
lint code by @tastelikefeet in #5233
[grpo] support more log by @hjh0119 in #5229
[template] content support loss_scale by @Jintao-Huang in #5234
[doc] simplify reranker dataset formats to remove ambiguity by @0russwest0 in #5237
refactor rope_scaling by @Jintao-Huang in #5239
[model] support Hunyuan-7B-Instruct by @Jintao-Huang in #5236
fix web-ui bug by @slin000111 in #5257
fix merge_lora by @Jintao-Huang in #5261
[grpo] set default gas to 1 by @hjh0119 in #5267
optimize mcore load by @Jintao-Huang in #5232
[megatron] optimize test_convert_precision by @Jintao-Huang in #5272
update grpo doc & args check for rollout and rlhf by @hjh0119 in #5268
[megatron] remove non blocking by @Jintao-Huang in #5276
[model] support openai/gpt-oss-20b by @Jintao-Huang in #5277
[model] support minicpmv4 by @Jintao-Huang in #5288
[bugfix] fix process_images in multi turn rollout by @hjh0119 in #5243
Fix: Use DDP for PPO traning will cause AttributeError: 'DistributedDataParallel' object has no attribute 'policy' by @kiritoxkiriko in #5287
support Qwen/Qwen3-4B-Instruct-2507 by @Jintao-Huang in #5294
[shell] add cached dataset examples by @Jintao-Huang in #5282

New Contributors

@yaqiangsun made their first contribution in #4871
@CrownStar7 made their first contribution in #4922
@woshixiaobai2019 made their first contribution in #4890
@loadingyy made their first contribution in #5080
@hieuchi911 made their first contribution in #5102
@mungg made their first contribution in #5220
@kiritoxkiriko made their first contribution in #5287

Full Changelog: v3.6.0...v3.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.7.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!