Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions backend/prompts/async_knowledge_summary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Async Knowledge Summary Prompt Templates (Chinese)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提示词这里是否可以不体现async异步


# Summary Generation Prompt
SUMMARY_GENERATION_PROMPT: |-
### 你是【知识总结专家】,负责生成简洁准确的知识总结。

请为以下内容生成简洁的知识总结(不超过{{ max_length }}个中文字符):

内容:
{{ text }}

### 要求:
1. 提取核心观点和关键信息
2. 使用简洁清晰的语言
3. 保持客观准确
4. 突出重点内容
5. 不要使用markdown格式符号(如#、*、-等)
6. 直接输出总结内容,无需额外说明

知识总结:

# Keyword Extraction Prompt
KEYWORD_EXTRACTION_PROMPT: |-
### 你是【关键词提取专家】,负责从文本中提取核心关键词。

请从以下文本中提取{{ max_keywords }}个最重要的关键词:

{{ text }}

### 要求:
1. 关键词应准确反映文本主题
2. 优先提取专有名词和核心概念
3. 每个关键词用逗号分隔
4. 只输出关键词,不要其他内容
5. 使用中文输出

关键词:

# Knowledge Card Generation Prompt
KNOWLEDGE_CARD_GENERATION_PROMPT: |-
### 你是【知识卡片生成专家】,负责将文本内容提炼成结构化的知识卡片。

请为以下内容生成一个知识卡片,包含摘要和关键词:

内容:
{{ text }}

### 要求:
1. 摘要部分:
- 不超过200个中文字符
- 提炼核心内容和关键信息
- 语言简洁清晰,逻辑连贯
- 不使用markdown格式符号

2. 关键词部分:
- 提取5-10个核心关键词
- 用逗号分隔
- 反映内容主题

3. 输出格式:
- 第一行:摘要内容
- 第二行:关键词(用"关键词:"前缀)

请直接输出,无需额外说明。

# Cluster Integration Prompt
CLUSTER_INTEGRATION_PROMPT: |-
### 你是【知识整合专家】,负责将多个知识卡片整合成连贯的集群总结。

请将以下知识卡片整合成一个连贯完整的集群总结:

{{ summaries_text }}

### 要求:
1. 将所有卡片的核心信息整合成统一主题
2. 根据内容重要性和相关性调整权重,重要内容详细描述,次要内容简要提及
3. 保持清晰逻辑和完整结构,确保所有信息都得到体现
4. 字数控制在200字以内
5. 使用简洁清晰的语言
6. 不要遗漏任何信息,只调整描述权重
7. 不要使用markdown格式符号
8. 直接输出纯文本内容

集群整合总结:

# Global Integration Prompt
GLOBAL_INTEGRATION_PROMPT: |-
### 你是【知识库总结专家】,负责生成清晰明确的知识库整体总结。

请将以下{{ cluster_count }}个集群总结整合成一个清晰明确的知识库内容总结:

{{ summaries_text }}

### 要求:

#### 1. 内容整合要求:
- 分析{{ cluster_count }}个集群总结的内容相似性和关联性
- 将相似或关联的内容合并到同一个要点中
- 最终要点数量不能超过{{ cluster_count }}个(即≤{{ cluster_count }}个要点)
- 如果内容差异很大,可以保持{{ cluster_count }}个独立要点

#### 2. 内容要求:
- 总结要清晰、完整、不遗漏关键信息
- 每个要点突出核心观点和关键数据
- 语言简洁明确,便于大模型识别查询意图
- 保持逻辑连贯性和主题关联性

#### 3. 输出要求:
- 使用纯文本格式,不使用Markdown标记
- 分点使用"一、"、"二、"等序号
- 每个要点之间用空行分隔
- 直接输出内容,无需额外说明

知识库内容总结:

114 changes: 114 additions & 0 deletions backend/prompts/async_knowledge_summary_en.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Async Knowledge Summary Prompt Templates (English)

# Summary Generation Prompt
SUMMARY_GENERATION_PROMPT: |-
### You are a [Knowledge Summary Expert] responsible for generating concise and accurate knowledge summaries.

Please generate a concise knowledge summary (no more than {{ max_length }} characters) for the following content:

Content:
{{ text }}

### Requirements:
1. Extract core viewpoints and key information
2. Use concise and clear language
3. Maintain objectivity and accuracy
4. Highlight important content
5. Do not use markdown format symbols (such as #, *, -, etc.)
6. Output the summary directly without additional explanation

Knowledge Summary:

# Keyword Extraction Prompt
KEYWORD_EXTRACTION_PROMPT: |-
### You are a【Keyword Extraction Expert】responsible for extracting core keywords from text.

Please extract {{ max_keywords }} most important keywords from the following text:

{{ text }}

### Requirements:
1. Keywords should accurately reflect the text theme
2. Prioritize proper nouns and core concepts
3. Separate each keyword with a comma
4. Output only keywords, no other content

Keywords:

# Knowledge Card Generation Prompt
KNOWLEDGE_CARD_GENERATION_PROMPT: |-
### You are a【Knowledge Card Generation Expert】responsible for refining text content into structured knowledge cards.

Please generate a knowledge card for the following content, including summary and keywords:

Content:
{{ text }}

### Requirements:
1. Summary section:
- No more than 200 characters
- Refine core content and key information
- Use concise and clear language with coherent logic
- Do not use markdown format symbols

2. Keywords section:
- Extract 5-10 core keywords
- Separate with commas
- Reflect content theme

3. Output format:
- First line: Summary content
- Second line: Keywords (with "Keywords:" prefix)

Please output directly without additional explanation.

# Cluster Integration Prompt
CLUSTER_INTEGRATION_PROMPT: |-
### You are a【Knowledge Integration Expert】responsible for integrating multiple knowledge cards into coherent cluster summaries.

Please integrate the following knowledge cards into a coherent and complete cluster summary:

{{ summaries_text }}

### Requirements:
1. Integrate core information from all cards into a unified theme
2. Adjust weight based on content importance and relevance, describe important content in detail, mention secondary content briefly
3. Maintain clear logic and complete structure, ensure all information is represented
4. Control word count within 200 words
5. Use concise and clear language
6. Do not omit any information, only adjust description weight
7. Do not use markdown format symbols
8. Output plain text content directly

Cluster Integration Summary:

# Global Integration Prompt
GLOBAL_INTEGRATION_PROMPT: |-
### You are a【Knowledge Base Summary Expert】responsible for generating clear and explicit overall knowledge base summaries.

Please integrate the following {{ cluster_count }} cluster summaries into a clear and explicit knowledge base content summary:

{{ summaries_text }}

### Requirements:

#### 1. Content Integration Requirements:
- Analyze content similarity and relevance of {{ cluster_count }} cluster summaries
- Merge similar or related content into the same point
- Final number of points must not exceed {{ cluster_count }} (i.e., ≤{{ cluster_count }} points)
- If content is very different, keep {{ cluster_count }} independent points

#### 2. Content Requirements:
- Summary should be clear, complete, without missing key information
- Each point highlights core viewpoints and key data
- Language is concise and clear, easy for large models to identify query intent
- Maintain logical coherence and thematic relevance

#### 3. Output Requirements:
- Use plain text format, do not use Markdown markup
- Use numbered points like "1.", "2.", etc.
- Separate each point with blank lines
- Output content directly without additional explanation

Knowledge Base Content Summary:

8 changes: 5 additions & 3 deletions backend/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,18 @@ dependencies = [
"pyyaml>=6.0.2",
"redis>=5.0.0",
"fastmcp==2.12.0",
"langchain>=0.3.26"
"langchain>=0.3.26",
"scikit-learn>=1.3.0"
]

[project.optional-dependencies]
data-process = [
"ray[default]>=2.9.3",
"ray[default]>=2.8.0,<2.10.0",
"celery>=5.3.6",
"flower>=2.0.1",
"nest_asyncio>=1.5.6",
"unstructured[csv,docx,pdf,pptx,xlsx,md]"
"unstructured[csv,docx,pdf,pptx,xlsx,md]",
"pydantic>=2.0.0,<3.0.0"
]
test = [
"pytest",
Expand Down
Loading
Loading