ModelEngine-Group · Mermaid97 · Oct 13, 2025 · Oct 13, 2025 · Oct 13, 2025 · Oct 13, 2025
@@ -0,0 +1,115 @@
+# Async Knowledge Summary Prompt Templates (Chinese)
+
+# Summary Generation Prompt
+SUMMARY_GENERATION_PROMPT: |-
+  ### 你是【知识总结专家】，负责生成简洁准确的知识总结。
+
+  请为以下内容生成简洁的知识总结（不超过{{ max_length }}个中文字符）：
+
+  内容：
+  {{ text }}
+
+  ### 要求：
+  1. 提取核心观点和关键信息
+  2. 使用简洁清晰的语言
+  3. 保持客观准确
+  4. 突出重点内容
+  5. 不要使用markdown格式符号（如#、*、-等）
+  6. 直接输出总结内容，无需额外说明
+
+  知识总结：
+
+# Keyword Extraction Prompt
+KEYWORD_EXTRACTION_PROMPT: |-
+  ### 你是【关键词提取专家】，负责从文本中提取核心关键词。
+
+  请从以下文本中提取{{ max_keywords }}个最重要的关键词：
+
+  {{ text }}
+
+  ### 要求：
+  1. 关键词应准确反映文本主题
+  2. 优先提取专有名词和核心概念
+  3. 每个关键词用逗号分隔
+  4. 只输出关键词，不要其他内容
+  5. 使用中文输出
+
+  关键词：
+
+# Knowledge Card Generation Prompt
+KNOWLEDGE_CARD_GENERATION_PROMPT: |-
+  ### 你是【知识卡片生成专家】，负责将文本内容提炼成结构化的知识卡片。
+
+  请为以下内容生成一个知识卡片，包含摘要和关键词：
+
+  内容：
+  {{ text }}
+
+  ### 要求：
+  1. 摘要部分：
+     - 不超过200个中文字符
+     - 提炼核心内容和关键信息
+     - 语言简洁清晰，逻辑连贯
+     - 不使用markdown格式符号
+
+  2. 关键词部分：
+     - 提取5-10个核心关键词
+     - 用逗号分隔
+     - 反映内容主题
+
+  3. 输出格式：
+     - 第一行：摘要内容
+     - 第二行：关键词（用"关键词："前缀）
+
+  请直接输出，无需额外说明。
+
+# Cluster Integration Prompt
+CLUSTER_INTEGRATION_PROMPT: |-
+  ### 你是【知识整合专家】，负责将多个知识卡片整合成连贯的集群总结。
+
+  请将以下知识卡片整合成一个连贯完整的集群总结：
+
+  {{ summaries_text }}
+
+  ### 要求：
+  1. 将所有卡片的核心信息整合成统一主题
+  2. 根据内容重要性和相关性调整权重，重要内容详细描述，次要内容简要提及
+  3. 保持清晰逻辑和完整结构，确保所有信息都得到体现
+  4. 字数控制在200字以内
+  5. 使用简洁清晰的语言
+  6. 不要遗漏任何信息，只调整描述权重
+  7. 不要使用markdown格式符号
+  8. 直接输出纯文本内容
+
+  集群整合总结：
+
+# Global Integration Prompt
+GLOBAL_INTEGRATION_PROMPT: |-
+  ### 你是【知识库总结专家】，负责生成清晰明确的知识库整体总结。
+
+  请将以下{{ cluster_count }}个集群总结整合成一个清晰明确的知识库内容总结：
+
+  {{ summaries_text }}
+
+  ### 要求：
+
+  #### 1. 内容整合要求：
+  - 分析{{ cluster_count }}个集群总结的内容相似性和关联性
+  - 将相似或关联的内容合并到同一个要点中
+  - 最终要点数量不能超过{{ cluster_count }}个（即≤{{ cluster_count }}个要点）
+  - 如果内容差异很大，可以保持{{ cluster_count }}个独立要点
+
+  #### 2. 内容要求：
+  - 总结要清晰、完整、不遗漏关键信息
+  - 每个要点突出核心观点和关键数据
+  - 语言简洁明确，便于大模型识别查询意图
+  - 保持逻辑连贯性和主题关联性
+
+  #### 3. 输出要求：
+  - 使用纯文本格式，不使用Markdown标记
+  - 分点使用"一、"、"二、"等序号
+  - 每个要点之间用空行分隔
+  - 直接输出内容，无需额外说明
+
+  知识库内容总结：
+
@@ -0,0 +1,114 @@
+# Async Knowledge Summary Prompt Templates (English)
+
+# Summary Generation Prompt
+SUMMARY_GENERATION_PROMPT: |-
+  ### You are a [Knowledge Summary Expert] responsible for generating concise and accurate knowledge summaries.
+
+  Please generate a concise knowledge summary (no more than {{ max_length }} characters) for the following content:
+
+  Content:
+  {{ text }}
+
+  ### Requirements:
+  1. Extract core viewpoints and key information
+  2. Use concise and clear language
+  3. Maintain objectivity and accuracy
+  4. Highlight important content
+  5. Do not use markdown format symbols (such as #, *, -, etc.)
+  6. Output the summary directly without additional explanation
+
+  Knowledge Summary:
+
+# Keyword Extraction Prompt
+KEYWORD_EXTRACTION_PROMPT: |-
+  ### You are a【Keyword Extraction Expert】responsible for extracting core keywords from text.
+
+  Please extract {{ max_keywords }} most important keywords from the following text:
+
+  {{ text }}
+
+  ### Requirements:
+  1. Keywords should accurately reflect the text theme
+  2. Prioritize proper nouns and core concepts
+  3. Separate each keyword with a comma
+  4. Output only keywords, no other content
+
+  Keywords:
+
+# Knowledge Card Generation Prompt
+KNOWLEDGE_CARD_GENERATION_PROMPT: |-
+  ### You are a【Knowledge Card Generation Expert】responsible for refining text content into structured knowledge cards.
+
+  Please generate a knowledge card for the following content, including summary and keywords:
+
+  Content:
+  {{ text }}
+
+  ### Requirements:
+  1. Summary section:
+     - No more than 200 characters
+     - Refine core content and key information
+     - Use concise and clear language with coherent logic
+     - Do not use markdown format symbols
+
+  2. Keywords section:
+     - Extract 5-10 core keywords
+     - Separate with commas
+     - Reflect content theme
+
+  3. Output format:
+     - First line: Summary content
+     - Second line: Keywords (with "Keywords:" prefix)
+
+  Please output directly without additional explanation.
+
+# Cluster Integration Prompt
+CLUSTER_INTEGRATION_PROMPT: |-
+  ### You are a【Knowledge Integration Expert】responsible for integrating multiple knowledge cards into coherent cluster summaries.
+
+  Please integrate the following knowledge cards into a coherent and complete cluster summary:
+
+  {{ summaries_text }}
+
+  ### Requirements:
+  1. Integrate core information from all cards into a unified theme
+  2. Adjust weight based on content importance and relevance, describe important content in detail, mention secondary content briefly
+  3. Maintain clear logic and complete structure, ensure all information is represented
+  4. Control word count within 200 words
+  5. Use concise and clear language
+  6. Do not omit any information, only adjust description weight
+  7. Do not use markdown format symbols
+  8. Output plain text content directly
+
+  Cluster Integration Summary:
+
+# Global Integration Prompt
+GLOBAL_INTEGRATION_PROMPT: |-
+  ### You are a【Knowledge Base Summary Expert】responsible for generating clear and explicit overall knowledge base summaries.
+
+  Please integrate the following {{ cluster_count }} cluster summaries into a clear and explicit knowledge base content summary:
+
+  {{ summaries_text }}
+
+  ### Requirements:
+
+  #### 1. Content Integration Requirements:
+  - Analyze content similarity and relevance of {{ cluster_count }} cluster summaries
+  - Merge similar or related content into the same point
+  - Final number of points must not exceed {{ cluster_count }} (i.e., ≤{{ cluster_count }} points)
+  - If content is very different, keep {{ cluster_count }} independent points
+
+  #### 2. Content Requirements:
+  - Summary should be clear, complete, without missing key information
+  - Each point highlights core viewpoints and key data
+  - Language is concise and clear, easy for large models to identify query intent
+  - Maintain logical coherence and thematic relevance
+
+  #### 3. Output Requirements:
+  - Use plain text format, do not use Markdown markup
+  - Use numbered points like "1.", "2.", etc.
+  - Separate each point with blank lines
+  - Output content directly without additional explanation
+
+  Knowledge Base Content Summary:
+
@@ -14,16 +14,18 @@ dependencies = [
     "pyyaml>=6.0.2",
     "redis>=5.0.0",
     "fastmcp==2.12.0",
-    "langchain>=0.3.26"
+    "langchain>=0.3.26",
+    "scikit-learn>=1.3.0"
 ]
 
 [project.optional-dependencies]
 data-process = [
-    "ray[default]>=2.9.3",
+    "ray[default]>=2.8.0,<2.10.0",
     "celery>=5.3.6",
     "flower>=2.0.1",
     "nest_asyncio>=1.5.6",
-    "unstructured[csv,docx,pdf,pptx,xlsx,md]"
+    "unstructured[csv,docx,pdf,pptx,xlsx,md]",
+    "pydantic>=2.0.0,<3.0.0"
 ]
 test = [
     "pytest",