v3.7.0
中文版
新特性
- GRPO:
a. 支持GSPO算法,在GRPO训练中使用参数--importance_sampling_level sequence
,参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO server mode 支持多机 rollout,支持传入多个 vllm_server_host/port,参考脚本:https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout 兼容 GYM 环境规范(感谢开发者Mouse的贡献),参考文档 https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/GYM%E7%8E%AF%E5%A2%83%E8%AE%AD%E7%BB%83.html
d. GRPO 支持 entropy_mask 来过滤低熵token损失计算,同时logger支持记录熵值动态,参考文档https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. 支持多轮算法DeepEyes训练,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO 支持--truncation_strategy delete
,删除输入长度超过max_length的数据,并重新采样。 - Megatron-SWIFT:
a. 支持使LoRA训练,现支持CPT/SFT/DPO,显著加速MoE训练速度。
- 文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html#lora
- 训练脚本:https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. 支持loss scale,方便Agent训练,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. 默认megatron-core版本升级至0.13。
d. 支持bshd格式,方便自定义attention_mask。
e. 日志优化:新增GPU占用、剩余训练时间等信息打印,并输出logging.jsonl
存储训练日志。
f. 模型加载与转换速度优化,并增加模型加载进度条。 - 训练:
a. 支持Flash-Attention-3(含Megatron-SWIFT),训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. 新增--new_speical_tokens
参数,方便新增特殊tokens。训练脚本参考: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. 新增--cached_dataset
参数,支持CPT/SFT的离线tokenize。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. 序列Packing模块重构。加速Packing速度,并对多模态packing的磁盘存储问题优化。
e. 支持Qwen2.5-VL混合模态数据(即单条数据中含多种模态) + deepspeed训练。
f. 多模态模型训练支持 loss_scale。
g. rope_scaling 支持传入字典,此外支持设置 max_model_len 对 rope_scaling 的 factor 自动调整。
h. 支持DeepSpeed-AutoTP(该技术不支持LoRA)。
i. 多模态Packing兼容 transformers>=4.53;序列并行兼容 transformers>=4.52。
j. resume_only_model默认将进行数据跳过,并使用ignore_data_skip参数进行控制。
k. MoE模型训练支持 router_aux_loss_coef 参数。
l. template新增max_length裁剪保护机制,不对图像/视频等tokens进行裁剪。
m. tuner_backend unsloth 支持moe模型、device_map和DDP。
n. embedding训练支持liger_kernel。 - RLHF:
a. 支持MPO训练,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. 多模态DPO支持了拒绝图片输入,在数据集中加入rejected_images
列。 - 推理部署:
a. 支持embedding系列模型的推理部署,包括pt/vllm/sglang的infer_backend。部署脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b. InferEngine支持return_details参数,以输出prompt_token_ids和token_ids。
c. vLLM推理引擎兼容更多多模态模型:ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4。
d. vLLM参数重构,参数名前加入vllm_
前缀。GRPO模块复用vLLM参数。 - 导出:
a. QLoRA支持Merge-LoRA,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. 支持MoE/多模态模型的FP8/BNB量化,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize
新模型
- 纯文本模型:
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, Qwen/Qwen3-4B-[Instruct/Thinking]-2507系列(含Megatron-SWIFT),训练脚本参考:#5033
b. openai-mirror/gpt-oss-20b系列,最佳实践参考:#5277
c. ZhipuAI/GLM-4.5系列(含Megatron-SWIFT),训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct系列,最佳实践参考:#5236
e. mistralai/Devstral-Small-2505 - 多模态模型:
a. OpenBMB/MiniCPM-V-4,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh
English Version
New Features
- GRPO
a. Added support for the GSPO algorithm. Use--importance_sampling_level sequence
during GRPO training. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO “server mode” now supports multi-node rollout; pass in multiplevllm_server_host/port
. Example script: https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout is now GYM-compatible (thanks to contributor Mouse). Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/DeveloperGuide/gym_env.html
d. Addedentropy_mask
for filtering low-entropy tokens during loss computation, and the logger now tracks entropy dynamics. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. Added support for the multi-round DeepEyes algorithm. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO supports--truncation_strategy delete
: remove samples whose input length exceedsmax_length
and resample. - Megatron-SWIFT
a. Added LoRA training (CPT/SFT/DPO) to significantly accelerate MoE training.
- Docs: https://swift.readthedocs.io/en/latest/Instruction/Megatron-SWIFT-Training.html#lora-training
- Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. Added loss-scaling to simplify Agent training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. Defaultmegatron-core
upgraded to 0.13.
d. Addedbshd
tensor format to facilitate customattention_mask
.
e. Logging improvements: prints GPU memory, estimated remaining time, and writeslogging.jsonl
.
f. Faster model loading & conversion plus a progress bar. - Training
a. Added Flash-Attention-3 support (including Megatron-SWIFT). Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. New--new_special_tokens
flag for adding special tokens. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. New--cached_dataset
flag for offline tokenization in CPT/SFT. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. Re-implemented the sequence-packing module for faster packing and better multimodal disk I/O.
e. Qwen2.5-VL hybrid-modal data (multiple modalities in a single sample) + DeepSpeed training supported.
f. Multimodal training now supports loss-scaling.
g.rope_scaling
now accepts a dict;max_model_len
can auto-adjust the scaling factor.
h. Added DeepSpeed-AutoTP (not compatible with LoRA).
i. Multimodal packing is compatible with transformers ≥ 4.53; sequence parallelism with transformers ≥ 4.52.
j. Withresume_only_model
, data skipping is enabled by default; control viaignore_data_skip
.
k. MoE training supportsrouter_aux_loss_coef
.
l. Template files get a max_length clipping safeguard (no clipping of image/video tokens).
m.tuner_backend unsloth
now supports MoE models,device_map
, and DDP.
n. Embedding training supportsliger_kernel
. - RLHF
a. Added MPO training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. Multimodal DPO can now reject image inputs by adding arejected_images
column. - Inference & Deployment
a. Added deployment for embedding models across pt/vllm/sglang back-ends. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b.InferEngine
supportsreturn_details
to outputprompt_token_ids
andtoken_ids
.
c. vLLM back-end now supports more multimodal models: ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4.
d. vLLM arguments refactored: all start with thevllm_
prefix. GRPO module reuses the same options. - Export
a. QLoRA now supports Merge-LoRA. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. Added FP8 / BNB quantization for MoE and multimodal models. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize
New Models
- Text-only
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, and Qwen/Qwen3-4B-[Instruct/Thinking]-2507 (Megatron-SWIFT supported). Training script: #5033
b. openai-mirror/gpt-oss-20b family. Best-practice: #5277
c. ZhipuAI/GLM-4.5 family (Megatron-SWIFT supported). Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct family. Best-practice: #5236
e. mistralai/Devstral-Small-2505 - Multimodal
a. OpenBMB/MiniCPM-V-4. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh
What's Changed
- [grpo] fix server arg check by @hjh0119 in #4865
- [SP] clean up imports by @hjh0119 in #4878
- fix loss_scale sp by @tastelikefeet in #4880
- fix seq_cls generation_config by @Jintao-Huang in #4882
- optimize imports by @tastelikefeet in #4883
- [model] fix qwen eos_token by @Jintao-Huang in #4888
- Fix: Correct training hang for Keye-VL on DeepSpeed with mixed data by @0russwest0 in #4889
- [megatron] support LoRA & support loss_scale by @Jintao-Huang in #4812
- update framework.txt by @Jintao-Huang in #4896
- [megatron] fix pp mla by @Jintao-Huang in #4904
- [megatron] update to mcore 0.13 by @Jintao-Huang in #4903
- Fix the template suffix of qwen3 embedding by @tastelikefeet in #4909
- Fix FlorenceTemplate for florence2 by @yaqiangsun in #4871
- [megatron] add logging jsonl by @Jintao-Huang in #4908
- [megatron] Support dpo lora by @Jintao-Huang in #4913
- feat: rlhf generation samples log to swanlab by @Zeyi-Lin in #4907
- [bugfix] fix profiling patch by @hjh0119 in #4915
- [doc] fix reranker index by @hjh0119 in #4921
- [megatron] support lora modules_to_save by @Jintao-Huang in #4916
- [model] support Kimi-K2 template by @Jintao-Huang in #4925
- Support infer and deploy of embedding models by @tastelikefeet in #4927
- Add more comments on embedding deployment by @tastelikefeet in #4929
- Fix the missing eval_acc issue (when use_logits_to_keep is True) by @Jintao-Huang in #4938
- [train] fix channel_loss (qwen2.5-vl & packing) by @Jintao-Huang in #4941
- [train] fix packing/padding_free & predict_with_generate by @Jintao-Huang in #4942
- Support new special tokens by @Jintao-Huang in #4945
- [sp] remove deprecated tokenizer by @hjh0119 in #4934
- fix new_special_tokens by @Jintao-Huang in #4947
- fix internvl3 eos_token by @Jintao-Huang in #4948
- [loss_scale] fix loss_scale when meeting
,,
- fix format reward by @tpx818 in #4949
- [deploy/rollout] fix deploy ddp eval_human by @Jintao-Huang in #4952
- Fix: Defer MathAccuracy initialization to init to avoid import-time dependency check by @hjh0119 in #4954
- [infer] support more vllm multimodal model (vllm==0.9.2) by @Jintao-Huang in #4959
- [megatron] support new_special_tokens by @Jintao-Huang in #4956
- [rlhf] support rejected images by @hjh0119 in #4964
- [megatron] support resume_lora by @Jintao-Huang in #4968
- [doc] update grpo document by @hjh0119 in #4971
- [grpo] fix save last-checkpoting by @hjh0119 in #4969
- [Feature] 支持类似GYM环境训练接口,实现端到端的RL训练 by @woshixiaobai2019 in #4890
- [grpo] support truncation_strategy delete by @hjh0119 in #4977
- fix bugs by @Jintao-Huang in #4981
- Remove the dependency on numpy<2.0 by @Jintao-Huang in #4960
- [doc] fix index by @hjh0119 in #4984
- fix SkipBatchSampler by @Jintao-Huang in #4985
- [train] refactor resume_only_model by @Jintao-Huang in #4979
- [megatron] update benchmark docs by @Jintao-Huang in #4991
- fix: trainer state strict saving by @firefighter-eric in #4995
- [grpo] fix gym check by @hjh0119 in #4998
- [Fix] deepspeed example shell command path for host.txt by @dykderrick in #5001
- [template] fix/pixtral/pixel_values & image_sizes by @CrownStar7 in #4982
- refactor packing by @Jintao-Huang in #5000
- [grpo] support multi node servers by @hjh0119 in #4999
- support multimodal lazy_tokenize=False by @Jintao-Huang in #5006
- fix pixtral-12b by @Jintao-Huang in #5007
- fix vllm version parse by @hjh0119 in #5012
- update docs by @Jintao-Huang in #5014
- [model] support glm4_5_moe by @Jintao-Huang in #5015
- [megatron] support glm4 moe by @Jintao-Huang in #5016
- [megatron] compat peft==0.16 by @Jintao-Huang in #5017
- [template] support mixed_data Qwen2.5-VL by @Jintao-Huang in #5018
- Fix sp+tf4.53 by @tastelikefeet in #5024
- fix finish_reason by @Jintao-Huang in #5025
- support multimodal & moe fp8 & bnb by @Jintao-Huang in #5026
- fix packing by @Jintao-Huang in #5027
- update packing by @Jintao-Huang in #5029
- [docs] update swanlab docs by @Jintao-Huang in #5030
- [grpo] entropy mask by @hjh0119 in #4850
- fix glm4_1v by @Jintao-Huang in #5032
- [preprocessor] Multiple video formats can be concatenated. by @Jintao-Huang in #5038
- Refactor vllm args by @Jintao-Huang in #5035
- fix sp+tf52 by @tastelikefeet in #5041
- [model] support Qwen3-235B-A22B-Instruct-250718 by @Jintao-Huang in #5033
- fix qwen3 model_id by @Jintao-Huang in #5042
- [infer/deploy] fix max_model_len by @Jintao-Huang in #5043
- temporary fix ppo by @Jintao-Huang in #5045
- [dpo] fix dpo get_train_dataloader by @Jintao-Huang in #5047
- [grpo] fix log entropy by @hjh0119 in #5056
- [rollout] add trl version check by @hjh0119 in #5049
- [megatron] fix loss_scale by @Jintao-Huang in #5059
- [grpo] fix entropy mask when set log_entropy by @hjh0119 in #5062
- [megatron] support padding_free=False by @Jintao-Huang in #5057
- [megatron] support mlp_padding_free by @Jintao-Huang in #5066
- fix run_dataset_info by @Jintao-Huang in #5068
- Fix awq quant by @Jintao-Huang in #5072
- [model] support qwen3_coder by @Jintao-Huang in #5076
- [model] support glm4_5 by @Jintao-Huang in #5031
- [infer] support vllm_enable_expert_parallel by @Jintao-Huang in #5084
- add qwen3_235b_shell by @Jintao-Huang in #5086
- Support swift/Qwen3-235B-A22B-Instruct-2507-AWQ by @Jintao-Huang in #5090
- Update shell by @Jintao-Huang in #5094
- update model_id by @Jintao-Huang in #5100
- fix kimi_vl by @Jintao-Huang in #5103
- Fix llama4 by @Jintao-Huang in #5105
- fix web_ui infer error by @Jintao-Huang in #5106
- fix sglang infer hang by @Jintao-Huang in #5108
- support Qwen3-235B-A22B-Thinking-2507 by @Jintao-Huang in #5113
- fix hang by @tastelikefeet in #5114
- [megatron] add fp8 shell by @Jintao-Huang in #5112
- [template] fix qwen3_moe_thinking template by @Jintao-Huang in #5116
- compat vllm 0.5.1 by @Jintao-Huang in #5117
- Fix bug by @Jintao-Huang in #5121
- Support AutoTP by @slin000111 in #5093
- fix bug by @loadingyy in #5080
- Support sglang enable_ep_moe & update code by @Jintao-Huang in #5122
- fix sglang response_prefix by @Jintao-Huang in #5125
- [grpo] support GSPO by @hjh0119 in #5126
- fix new_speical_tokens multimodal by @Jintao-Huang in #5129
- [Safety]Fix torch load by @tastelikefeet in #4802
- fix get_hf_endpoint by @Jintao-Huang in #5135
- [doc] fix gym example and doc by @hjh0119 in #5079
- fix unsloth and support device_map by @tastelikefeet in #5139
- InferEngine support return_details by @Jintao-Huang in #5134
- Support ddp of unsloth by @tastelikefeet in #5141
- [model] support ZhipuAI/GLM-4.5 series by @Jintao-Huang in #5142
- [bugfix] fix vllm sleep&wake_up produces meaningless output by @hjh0119 in #5143
- fix bugs by @tastelikefeet in #5147
- [bugfix] grpo length context compatible with latest set_default_max_tokens by @hjh0119 in #5154
- support Qwen3-30B-A3B-Instruct-2507 by @Jintao-Huang in #5149
- Fix rope-scaling by @tastelikefeet in #5155
- [grpo recipe] support deepeyes by @hjh0119 in #5082
- [infer] support prompt_token_ids by @Jintao-Huang in #5152
- [model] support mistralai/Devstral-Small-2505 by @hieuchi911 in #5102
- [bugfix] fix grpo SP without padding_free error by @hjh0119 in #5164
- update trl version requirement by @hjh0119 in #5170
- Fix ci by @tastelikefeet in #5158
- upgrade vllm to 0.10.0 by @hjh0119 in #5168
- [rlhf] support MPO & DPO compatible with trl 0.20 by @hjh0119 in #5177
- Fix run cmd with os.system by @slin000111 in #5182
- [grpo] compatible with trl 0.2 by @hjh0119 in #5178
- fix CVE-2025-50460: https://github.com/Anchor0221/CVE-2025-50460 by @tastelikefeet in #5174
- support export cached_dataset by @Jintao-Huang in #4992
- [train] moe support aux_loss by @Jintao-Huang in #5187
- Support qlora merge-lora by @Jintao-Huang in #5190
- [megatron] support more metrics by @Jintao-Huang in #5194
- fix mcore 0.12 by @Jintao-Huang in #5195
- support qwen3_coder 30b by @Jintao-Huang in #5207
- Support flash-attention-3 by @Jintao-Huang in #5208
- [utils] optimize
git clone
by @Jintao-Huang in #5214 - [megatron] support flash-attention-3 by @Jintao-Huang in #5216
- fix emb with liger kernel by @tastelikefeet in #5222
- update wechat by @Jintao-Huang in #5223
- fix reward_model by @Jintao-Huang in #5224
- update swift image by @Jintao-Huang in #5225
- update requirements by @Jintao-Huang in #5198
- packing compat_transformers 4.53 by @Jintao-Huang in #4972
- Fix: init_rope_scaling from model config first by @mungg in #5220
- lint code by @tastelikefeet in #5233
- [grpo] support more log by @hjh0119 in #5229
- [template] content support loss_scale by @Jintao-Huang in #5234
- [doc] simplify reranker dataset formats to remove ambiguity by @0russwest0 in #5237
- refactor rope_scaling by @Jintao-Huang in #5239
- [model] support Hunyuan-7B-Instruct by @Jintao-Huang in #5236
- fix web-ui bug by @slin000111 in #5257
- fix merge_lora by @Jintao-Huang in #5261
- [grpo] set default gas to 1 by @hjh0119 in #5267
- optimize mcore load by @Jintao-Huang in #5232
- [megatron] optimize test_convert_precision by @Jintao-Huang in #5272
- update grpo doc & args check for rollout and rlhf by @hjh0119 in #5268
- [megatron] remove non blocking by @Jintao-Huang in #5276
- [model] support openai/gpt-oss-20b by @Jintao-Huang in #5277
- [model] support minicpmv4 by @Jintao-Huang in #5288
- [bugfix] fix process_images in multi turn rollout by @hjh0119 in #5243
- Fix: Use DDP for PPO traning will cause
AttributeError: 'DistributedDataParallel' object has no attribute 'policy'
by @kiritoxkiriko in #5287 - support Qwen/Qwen3-4B-Instruct-2507 by @Jintao-Huang in #5294
- [shell] add cached dataset examples by @Jintao-Huang in #5282
New Contributors
- @yaqiangsun made their first contribution in #4871
- @CrownStar7 made their first contribution in #4922
- @woshixiaobai2019 made their first contribution in #4890
- @loadingyy made their first contribution in #5080
- @hieuchi911 made their first contribution in #5102
- @mungg made their first contribution in #5220
- @kiritoxkiriko made their first contribution in #5287
Full Changelog: v3.6.0...v3.7.0