Document Enrichment Processors


# 🧠 Document Enrichment Processors

Implement a set of **processors** that can transform, enrich, or normalize documents after they are fetched by a connector and before indexing.

These processors operate independently of the data source and can be chained in a pipeline to apply transformations such as field renaming, extraction, normalization, tagging, and more.


## ✅ Objectives

- Define a standard processor interface
- Support common field-level transformations
- Allow flexible and composable enrichment pipelines

## 🛠️ Example Processor Types

| Processor Type     | Description |
|--------------------|-------------|
| `rename_field`     | Rename fields in the document (e.g., `title` → `doc.title`) |
| `extract_regex`    | Extract substrings using regex from text fields |
| `set_value`        | Set or override a field with a constant value |
| `timestamp_parser` | Convert string timestamps to a unified format |
| `truncate`         | Limit string or array field lengths |
| `add_tags`         | Append static or dynamic tags to a document |
| `lowercase`        | Normalize text to lowercase |

## 🔧 Configuration Example

```yaml
pipeline:
  - name: enrich_documents
    auto_start: false
    keep_running: true
    processor:
      - consumer:
          auto_commit_offset: true
          queue_selector:
            keys:
              - indexing_documents
          consumer:
            group: enriched_documents
            fetch_max_messages: 10
          processor:
            - document_summarization:
                model: $[[env.ENRICHMENT_MODEL]]
                input_queue: "indexing_documents"
                min_input_document_length: 500
                output_queue:
                  name: "enriched_documents"
                  label:
                    tag: "enriched"
```

📁 Reference

- https://github.com/infinilabs/gateway/tree/main/pipeline



Some existing legacy code can be migrated to processors: https://github.com/infinilabs/crawler/tree/master/pipeline/joints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document Enrichment Processors #460

🧠 Document Enrichment Processors

✅ Objectives

🛠️ Example Processor Types

🔧 Configuration Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Processor Type	Description
`rename_field`	Rename fields in the document (e.g., `title` → `doc.title`)
`extract_regex`	Extract substrings using regex from text fields
`set_value`	Set or override a field with a constant value
`timestamp_parser`	Convert string timestamps to a unified format
`truncate`	Limit string or array field lengths
`add_tags`	Append static or dynamic tags to a document
`lowercase`	Normalize text to lowercase

Document Enrichment Processors #460

Description

🧠 Document Enrichment Processors

✅ Objectives

🛠️ Example Processor Types

🔧 Configuration Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions