@@ -45,13 +45,13 @@ uv add genai-processors-url-fetch[markitdown]
4545
4646``` python
4747from genai_processors import processor
48- from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig
48+ from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig, ContentProcessor
4949
5050# Basic usage with defaults (BeautifulSoup text extraction)
5151fetcher = UrlFetchProcessor()
5252
5353# Use markitdown for richer content processing
54- config = FetchConfig(content_processor = " markitdown " )
54+ config = FetchConfig(content_processor = ContentProcessor. MARKITDOWN )
5555markitdown_fetcher = UrlFetchProcessor(config)
5656
5757# Process text containing URLs
@@ -81,7 +81,7 @@ All security features are enabled by default but can be configured via the Fetch
8181The processor uses a dataclass-based configuration system for clean, type-safe settings. You can customize the processor's behavior by passing a FetchConfig object during initialization.
8282
8383``` python
84- from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig
84+ from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig, ContentProcessor
8585
8686# Example of a customized security configuration
8787config = FetchConfig(
@@ -104,12 +104,13 @@ The `FetchConfig` dataclass provides comprehensive configuration options organiz
104104* ** user_agent** (str, default: "GenAI-Processors/UrlFetchProcessor"): The User-Agent string to send with HTTP requests.
105105* ** include_original_part** (bool, default: True): If True, the original ProcessorPart that contained the URL(s) will be yielded at the end of processing.
106106* ** fail_on_error** (bool, default: False): If True, the processor will raise a RuntimeError on the first failed fetch.
107- * ** content_processor** (Literal[ "beautifulsoup", "markitdown", "raw"] , default: "beautifulsoup"): Content processing method.
108- - ` "beautifulsoup" ` : Extract clean text using BeautifulSoup (fastest, good for simple HTML)
109- - ` "markitdown" ` : Convert content to markdown using Microsoft's markitdown library (best for rich content, requires optional dependency)
110- - ` "raw" ` : Return the raw HTML content without processing
107+ * ** content_processor** (ContentProcessor, default: ContentProcessor.BEAUTIFULSOUP): Content processing method.
108+ * ` ContentProcessor.BEAUTIFULSOUP ` : Extract clean text using BeautifulSoup (fastest, good for simple HTML)
109+ * ` ContentProcessor.MARKITDOWN ` : Convert content to markdown using Microsoft's markitdown library (best for rich content, requires optional dependency)
110+ * ` ContentProcessor.RAW ` : Return the raw HTML content without processing
111+ * ** Note:** String values ("beautifulsoup", "markitdown", "raw") are automatically converted to enum values for backward compatibility.
111112* ** markitdown_options** (dict[ str, Any] , default: {}): Options passed to the markitdown MarkItDown constructor when using markitdown processor.
112- * ** extract_text_only** (bool | None, default: None): ** Deprecated.** Use ` content_processor ` instead. For backward compatibility: ` True ` maps to ` "beautifulsoup" ` , ` False ` maps to ` "raw" ` .
113+ * ** extract_text_only** (bool | None, default: None): ** Deprecated.** Use ` content_processor ` instead. For backward compatibility: ` True ` maps to ` ContentProcessor.BEAUTIFULSOUP ` , ` False ` maps to ` ContentProcessor.RAW ` .
113114
114115##### Security Controls
115116
@@ -128,7 +129,7 @@ The UrlFetchProcessor supports three content processing methods via the `content
128129#### BeautifulSoup (Default)
129130
130131``` python
131- config = FetchConfig(content_processor = " beautifulsoup " )
132+ config = FetchConfig(content_processor = ContentProcessor. BEAUTIFULSOUP )
132133fetcher = UrlFetchProcessor(config)
133134# Returns: Clean text extracted from HTML, fastest processing
134135# Mimetype: "text/plain; charset=utf-8"
@@ -140,7 +141,7 @@ The markitdown processor provides the richest content extraction by converting H
140141
141142``` python
142143config = FetchConfig(
143- content_processor = " markitdown " ,
144+ content_processor = ContentProcessor. MARKITDOWN ,
144145 markitdown_options = {
145146 " extract_tables" : True , # Preserve table structure
146147 " preserve_links" : True , # Keep link formatting
@@ -169,7 +170,7 @@ fetcher = UrlFetchProcessor(config)
169170#### Raw HTML
170171
171172``` python
172- config = FetchConfig(content_processor = " raw " )
173+ config = FetchConfig(content_processor = ContentProcessor. RAW )
173174fetcher = UrlFetchProcessor(config)
174175# Returns: Original HTML content without processing
175176# Mimetype: "text/html; charset=utf-8"
@@ -229,11 +230,11 @@ for content_part in successful_content:
229230
230231``` python
231232from genai_processors import streams
232- from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig
233+ from genai_processors_url_fetch import UrlFetchProcessor, FetchConfig, ContentProcessor
233234
234235# Configure markitdown processor for rich content extraction
235236config = FetchConfig(
236- content_processor = " markitdown " ,
237+ content_processor = ContentProcessor. MARKITDOWN ,
237238 include_original_part = False ,
238239 markitdown_options = {
239240 " extract_tables" : True ,
0 commit comments