@@ -935,6 +935,8 @@ The table below introduces information about the datasets integrated with ms-swi
935
935
| -| auto_math_text<br >khanacademy<br >openstax<br >stanford<br >stories<br >web_samples_v1<br >web_samples_v2<br >wikihow| huge dataset| -| multi-domain, en, qa| [ HuggingFaceTB/cosmopedia] ( https://huggingface.co/datasets/HuggingFaceTB/cosmopedia ) |
936
936
| [ HumanLLMs/Human-Like-DPO-Dataset] ( https://modelscope.cn/datasets/HumanLLMs/Human-Like-DPO-Dataset ) | default| 10884| 47.5±7.9, min=32, max=85| rlhf, dpo| [ HumanLLMs/Human-Like-DPO-Dataset] ( https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset ) |
937
937
| [ LLM-Research/xlam-function-calling-60k] ( https://modelscope.cn/datasets/LLM-Research/xlam-function-calling-60k ) | default<br >grpo| 120000| 453.7±219.5, min=164, max=2779| agent, grpo, 🔥| [ Salesforce/xlam-function-calling-60k] ( https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k ) |
938
+ | [ MTEB/scidocs-reranking] ( https://modelscope.cn/datasets/MTEB/scidocs-reranking ) | default| 39193| 41.9±5.8, min=31, max=107| rerank, 🔥| [ mteb/scidocs-reranking] ( https://huggingface.co/datasets/mteb/scidocs-reranking ) |
939
+ | [ MTEB/stackoverflowdupquestions-reranking] ( https://modelscope.cn/datasets/MTEB/stackoverflowdupquestions-reranking ) | default| 26485| 39.9±4.6, min=31, max=77| rerank, 🔥| [ mteb/stackoverflowdupquestions-reranking] ( https://huggingface.co/datasets/mteb/stackoverflowdupquestions-reranking ) |
938
940
| [ OmniData/Zhihu-KOL] ( https://modelscope.cn/datasets/OmniData/Zhihu-KOL ) | default| huge dataset| -| zhihu, qa| [ wangrui6/Zhihu-KOL] ( https://huggingface.co/datasets/wangrui6/Zhihu-KOL ) |
939
941
| [ OmniData/Zhihu-KOL-More-Than-100-Upvotes] ( https://modelscope.cn/datasets/OmniData/Zhihu-KOL-More-Than-100-Upvotes ) | default| 271261| 1003.4±1826.1, min=28, max=52541| zhihu, qa| [ bzb2023/Zhihu-KOL-More-Than-100-Upvotes] ( https://huggingface.co/datasets/bzb2023/Zhihu-KOL-More-Than-100-Upvotes ) |
940
942
| [ PowerInfer/LONGCOT-Refine-500K] ( https://modelscope.cn/datasets/PowerInfer/LONGCOT-Refine-500K ) | default| 521921| 296.5±158.4, min=39, max=4634| chat, sft, 🔥, cot| [ PowerInfer/LONGCOT-Refine-500K] ( https://huggingface.co/datasets/PowerInfer/LONGCOT-Refine-500K ) |
@@ -982,13 +984,15 @@ The table below introduces information about the datasets integrated with ms-swi
982
984
| [ open-r1/verifiable-coding-problems-python-10k_decontaminated] ( https://modelscope.cn/datasets/open-r1/verifiable-coding-problems-python-10k_decontaminated ) | default| 1574| 575.7±234.3, min=136, max=2022| grpo, code| [ open-r1/verifiable-coding-problems-python-10k_decontaminated] ( https://huggingface.co/datasets/open-r1/verifiable-coding-problems-python-10k_decontaminated ) |
983
985
| [ open-r1/verifiable-coding-problems-python_decontaminated] ( https://modelscope.cn/datasets/open-r1/verifiable-coding-problems-python_decontaminated ) | default| 27839| 561.9±252.2, min=74, max=6191| grpo, code| [ open-r1/verifiable-coding-problems-python_decontaminated] ( https://huggingface.co/datasets/open-r1/verifiable-coding-problems-python_decontaminated ) |
984
986
| [ open-thoughts/OpenThoughts-114k] ( https://modelscope.cn/datasets/open-thoughts/OpenThoughts-114k ) | default| 113957| 413.2±186.9, min=265, max=13868| chat, sft, cot, r1| [ open-thoughts/OpenThoughts-114k] ( https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k ) |
985
- | [ sentence-transformers/stsb] ( https://modelscope.cn/datasets/sentence-transformers/stsb ) | default<br >generate<br >reg| 5748| 21.0±0.0, min=21, max=21| similarity, 🔥| [ sentence-transformers/stsb] ( https://huggingface.co/datasets/sentence-transformers/stsb ) |
987
+ | [ swift/self-cognition] ( https://modelscope.cn/datasets/swift/self-cognition ) | default<br >qwen3<br >empty_think| 108| 58.9±20.3, min=32, max=131| chat, self-cognition, 🔥| [ modelscope/self-cognition] ( https://huggingface.co/datasets/modelscope/self-cognition ) |
988
+ | [ sentence-transformers/stsb] ( https://modelscope.cn/datasets/sentence-transformers/stsb ) | default<br >positive<br >generate<br >reg| 5748| 21.0±0.0, min=21, max=21| similarity, 🔥| [ sentence-transformers/stsb] ( https://huggingface.co/datasets/sentence-transformers/stsb ) |
986
989
| [ shenweizhou/alpha-umi-toolbench-processed-v2] ( https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2 ) | backbone<br >caller<br >planner<br >summarizer| huge dataset| -| chat, agent, 🔥| -|
987
990
| [ simpleai/HC3] ( https://modelscope.cn/datasets/simpleai/HC3 ) | finance<br >finance_cls<br >medicine<br >medicine_cls| 11021| 296.0±153.3, min=65, max=2267| text-generation, classification, 🔥| [ Hello-SimpleAI/HC3] ( https://huggingface.co/datasets/Hello-SimpleAI/HC3 ) |
988
991
| [ simpleai/HC3-Chinese] ( https://modelscope.cn/datasets/simpleai/HC3-Chinese ) | baike<br >baike_cls<br >open_qa<br >open_qa_cls<br >nlpcc_dbqa<br >nlpcc_dbqa_cls<br >finance<br >finance_cls<br >medicine<br >medicine_cls<br >law<br >law_cls<br >psychology<br >psychology_cls| 39781| 179.9±70.2, min=90, max=1070| text-generation, classification, 🔥| [ Hello-SimpleAI/HC3-Chinese] ( https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese ) |
989
992
| [ speech_asr/speech_asr_aishell1_trainsets] ( https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets ) | train<br >validation<br >test| 141600| 40.8±3.3, min=33, max=53| chat, multi-modal, audio| -|
990
993
| [ swift/A-OKVQA] ( https://modelscope.cn/datasets/swift/A-OKVQA ) | default| 18201| 43.5±7.9, min=27, max=94| multi-modal, en, vqa, quality| [ HuggingFaceM4/A-OKVQA] ( https://huggingface.co/datasets/HuggingFaceM4/A-OKVQA ) |
991
994
| [ swift/ChartQA] ( https://modelscope.cn/datasets/swift/ChartQA ) | default| 28299| 36.8±6.5, min=26, max=74| en, vqa, quality| [ HuggingFaceM4/ChartQA] ( https://huggingface.co/datasets/HuggingFaceM4/ChartQA ) |
995
+ | [ swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT] ( https://modelscope.cn/datasets/swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT ) | default| 110000| 72.1±60.9, min=29, max=2315| 🔥, distill, sft| -|
992
996
| [ swift/GRIT] ( https://modelscope.cn/datasets/swift/GRIT ) | caption<br >grounding<br >vqa| huge dataset| -| multi-modal, en, caption-grounding, vqa, quality| [ zzliang/GRIT] ( https://huggingface.co/datasets/zzliang/GRIT ) |
993
997
| [ swift/GenQA] ( https://modelscope.cn/datasets/swift/GenQA ) | default| huge dataset| -| qa, quality, multi-task| [ tomg-group-umd/GenQA] ( https://huggingface.co/datasets/tomg-group-umd/GenQA ) |
994
998
| [ swift/Infinity-Instruct] ( https://modelscope.cn/datasets/swift/Infinity-Instruct ) | 3M<br >7M<br >0625<br >Gen<br >7M_domains| huge dataset| -| qa, quality, multi-task| [ BAAI/Infinity-Instruct] ( https://huggingface.co/datasets/BAAI/Infinity-Instruct ) |
@@ -1031,7 +1035,6 @@ The table below introduces information about the datasets integrated with ms-swi
1031
1035
| [ swift/pixelprose] ( https://modelscope.cn/datasets/swift/pixelprose ) | default| huge dataset| -| caption, multi-modal, vision| [ tomg-group-umd/pixelprose] ( https://huggingface.co/datasets/tomg-group-umd/pixelprose ) |
1032
1036
| [ swift/refcoco] ( https://modelscope.cn/datasets/swift/refcoco ) | caption<br >grounding| 92430| 45.4±3.0, min=37, max=63| multi-modal, en, grounding| [ jxu124/refcoco] ( https://huggingface.co/datasets/jxu124/refcoco ) |
1033
1037
| [ swift/refcocog] ( https://modelscope.cn/datasets/swift/refcocog ) | caption<br >grounding| 89598| 50.3±4.6, min=39, max=91| multi-modal, en, grounding| [ jxu124/refcocog] ( https://huggingface.co/datasets/jxu124/refcocog ) |
1034
- | [ swift/self-cognition] ( https://modelscope.cn/datasets/swift/self-cognition ) | default<br >qwen3<br >empty_think| 108| 58.9±20.3, min=32, max=131| chat, self-cognition, 🔥| [ modelscope/self-cognition] ( https://huggingface.co/datasets/modelscope/self-cognition ) |
1035
1038
| [ swift/sharegpt] ( https://modelscope.cn/datasets/swift/sharegpt ) | common-zh<br >unknow-zh<br >common-en| 194063| 820.5±366.1, min=25, max=2221| chat, general, multi-round| -|
1036
1039
| [ swift/swift-sft-mixture] ( https://modelscope.cn/datasets/swift/swift-sft-mixture ) | sharegpt<br >firefly<br >codefuse<br >metamathqa| huge dataset| -| chat, sft, general, 🔥| -|
1037
1040
| [ swift/tagengo-gpt4] ( https://modelscope.cn/datasets/swift/tagengo-gpt4 ) | default| 76437| 468.1±276.8, min=28, max=1726| chat, multi-lingual, quality| [ lightblue/tagengo-gpt4] ( https://huggingface.co/datasets/lightblue/tagengo-gpt4 ) |
0 commit comments