GeneralUserModels · parmar-abhinav · Jul 18, 2025 · Aug 11, 2025 · Aug 11, 2025 · Aug 11, 2025
diff --git a/docs/AI_PROVIDERS.md b/docs/AI_PROVIDERS.md
@@ -0,0 +1,171 @@
+# AI Provider Configuration
+
+This document explains how to configure different AI providers for the GUM system.
+
+## Overview
+
+The GUM system uses a unified AI client that supports multiple providers for different tasks:
+
+- **Text Completion**: Azure OpenAI (default) or OpenAI
+- **Vision Completion**: OpenRouter (default)
+
+## Provider Configuration
+
+### Text Providers
+
+#### Azure OpenAI (Default)
+```bash
+# Required environment variables
+export AZURE_OPENAI_API_KEY="your-azure-api-key"
+export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
+export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
+export AZURE_OPENAI_DEPLOYMENT="gpt-4o"  # Optional, defaults to gpt-4o
+
+# Optional: Explicitly set text provider (defaults to azure)
+export TEXT_PROVIDER="azure"
+```
+
+#### OpenAI
+```bash
+# Required environment variables
+export OPENAI_API_KEY="your-openai-api-key"
+
+# Optional environment variables
+export OPENAI_MODEL="gpt-4o"  # Optional, defaults to gpt-4o
+export OPENAI_API_BASE="https://api.openai.com/v1"  # Optional, uses default
+export OPENAI_ORGANIZATION="your-org-id"  # Optional
+
+# Set text provider to OpenAI
+export TEXT_PROVIDER="openai"
+```
+
+### Vision Providers
+
+#### OpenRouter (Default)
+```bash
+# Required environment variables
+export OPENROUTER_API_KEY="your-openrouter-api-key"
+
+# Optional environment variables
+export OPENROUTER_MODEL="qwen/qwen-2.5-vl-72b-instruct:free"  # Optional, uses default
+
+# Optional: Explicitly set vision provider (defaults to openrouter)
+export VISION_PROVIDER="openrouter"
+```
+
+## Usage Examples
+
+### Using Azure OpenAI for Text (Default)
+```python
+import asyncio
+from gum import gum
+from gum.observers import Observer
+
+async def main():
+    # No special configuration needed - Azure is the default
+    async with gum("username", "model") as g:
+        # Your GUM code here
+        pass
+
+asyncio.run(main())
+```
+
+### Using OpenAI for Text
+```python
+import asyncio
+import os
+from gum import gum
+from gum.observers import Observer
+
+async def main():
+    # Set OpenAI as text provider
+    os.environ["TEXT_PROVIDER"] = "openai"
+
+    async with gum("username", "model") as g:
+        # Your GUM code here
+        pass
+
+asyncio.run(main())
+```
+
+### Testing Different Providers
+
+#### Test OpenAI Client
+```bash
+python test_openai_client.py
+```
+
+#### Test Unified Client with Different Providers
+```bash
+# Test with Azure OpenAI (default)
+python -c "import asyncio; from unified_ai_client import test_unified_client; asyncio.run(test_unified_client())"
+
+# Test with OpenAI
+TEXT_PROVIDER=openai python -c "import asyncio; from unified_ai_client import test_unified_client; asyncio.run(test_unified_client())"
+```
+
+## Provider Features
+
+| Provider | Text Completion | Vision Completion | Notes |
+|----------|----------------|-------------------|-------|
+| Azure OpenAI | | | Enterprise-grade, requires Azure subscription |
+| OpenAI | | | Direct OpenAI API, requires OpenAI account |
+| OpenRouter | | | Multiple vision models, cost-effective |
+
+## Error Handling
+
+The unified client includes automatic retry logic with exponential backoff for transient errors. You can configure retry behavior:
+
+```python
+from unified_ai_client import UnifiedAIClient
+
+client = UnifiedAIClient(
+    max_retries=5,          # Maximum retry attempts
+    base_delay=2.0,         # Base delay in seconds
+    max_delay=120.0,        # Maximum delay between retries
+    backoff_factor=2.0,     # Exponential backoff multiplier
+    jitter_factor=0.1       # Random jitter to prevent thundering herd
+)
+```
+
+## Environment Variables Reference
+
+### Azure OpenAI
+- `AZURE_OPENAI_API_KEY` (required)
+- `AZURE_OPENAI_ENDPOINT` (required)
+- `AZURE_OPENAI_API_VERSION` (required)
+- `AZURE_OPENAI_DEPLOYMENT` (optional, defaults to "gpt-4o")
+
+### OpenAI
+- `OPENAI_API_KEY` (required)
+- `OPENAI_MODEL` (optional, defaults to "gpt-4o")
+- `OPENAI_API_BASE` (optional, defaults to "https://api.openai.com/v1")
+- `OPENAI_ORGANIZATION` (optional)
+
+### OpenRouter
+- `OPENROUTER_API_KEY` (required)
+- `OPENROUTER_MODEL` (optional, defaults to "qwen/qwen-2.5-vl-72b-instruct:free")
+
+### Provider Selection
+- `TEXT_PROVIDER` (optional, "azure" or "openai", defaults to "azure")
+- `VISION_PROVIDER` (optional, "openrouter", defaults to "openrouter")
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Missing API Keys**: Ensure all required environment variables are set
+2. **Network Issues**: Check firewall/proxy settings
+3. **Rate Limits**: The client includes automatic retry with backoff
+4. **Model Availability**: Verify the model name is correct for your provider
+
+### Debug Logging
+
+Enable debug logging to troubleshoot issues:
+
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+
+This will show detailed HTTP requests and responses for debugging.
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -3,4 +3,41 @@ mkdocs-material>=9.0.0
 mkdocstrings>=0.24.0
 mkdocstrings-python>=1.7.0 
 mistune==3.0.2
-# pymdown-extensions>=10.0.0
+# pymdown-extensions>=10.0.0
+
+# Core dependencies for GUM (General User Models)
+# Image processing and screen capture
+pillow
+mss
+pynput
+shapely
+
+# macOS window management (conditionally installed)
+pyobjc-framework-Quartz; sys_platform == "darwin"
+
+# AI and OpenAI clients
+openai>=1.0.0
+
+# Database and ORM
+SQLAlchemy>=2.0.0
+aiosqlite
+greenlet
+
+# Data validation and serialization
+pydantic>=2.0.0
+
+# Environment and configuration
+python-dotenv>=1.0.0
+
+# Machine learning and data processing
+scikit-learn
+numpy
+
+# Date/time utilities
+python-dateutil
+
+# Development and building tools (optional)
+setuptools>=42
+wheel
+build
+twine
diff --git a/gum/azure_text_client.py b/gum/azure_text_client.py
@@ -0,0 +1,167 @@
+#!/usr/bin/env python3
+"""
+Azure OpenAI Text Completion Utility
+
+This utility handles text completions using the official Azure OpenAI Python SDK
+with proper error handling and logging.
+"""
+
+import asyncio
+import os
+import logging
+from typing import List, Dict, Any
+from dotenv import load_dotenv
+from openai import AsyncAzureOpenAI
+
+# Load environment variables at module level, override existing ones
+load_dotenv(override=True)
+
+# Set up logging with debug level for httpx
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Enable httpx debug logging to see exact HTTP requests
+httpx_logger = logging.getLogger("httpx")
+httpx_logger.setLevel(logging.DEBUG)
+httpx_handler = logging.StreamHandler()
+httpx_handler.setFormatter(logging.Formatter("HTTPX: %(message)s"))
+httpx_logger.addHandler(httpx_handler)
+
+
+class AzureOpenAITextClient:
+    """Azure OpenAI client for text completions using the official Azure OpenAI SDK."""
+
+    def __init__(self):
+        self.api_key = os.getenv("AZURE_OPENAI_API_KEY")
+        self.endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
+        self.api_version = os.getenv("AZURE_OPENAI_API_VERSION")
+        self.deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")
+
+        logger.info("Azure OpenAI Environment Debug:")
+        logger.info(f"   API Key: {self.api_key[:10] + '...' + self.api_key[-4:] if self.api_key else 'None'}")
+        logger.info(f"   Endpoint: {self.endpoint}")
+        logger.info(f"   API Version: {self.api_version}")
+        logger.info(f"   Deployment: {self.deployment}")
+
+        if not all([self.api_key, self.endpoint, self.api_version]):
+            raise ValueError("Azure OpenAI configuration incomplete. Check environment variables.")
+
+        # Initialize the Azure OpenAI client
+        self.client = AsyncAzureOpenAI(
+            api_key=self.api_key,
+            azure_endpoint=self.endpoint,  # type: ignore
+            api_version=self.api_version
+        )
+
+        logger.info("Azure OpenAI Text Client initialized")
+        logger.info(f"  Endpoint: {self.endpoint}")
+        logger.info(f"  Deployment: {self.deployment}")
+        logger.info(f"  API Version: {self.api_version}")
+
+    async def chat_completion(
+        self,
+        messages: List[Dict[str, Any]],
+        max_tokens: int = 1000,
+        temperature: float = 0.1
+    ) -> str:
+        """
+        Send a chat completion request to Azure OpenAI.
+
+        Args:
+            messages: List of message dictionaries
+            max_tokens: Maximum tokens to generate
+            temperature: Temperature for generation
+
+        Returns:
+            The AI response content as a string
+        """
+
+        logger.info("Azure OpenAI text completion request")
+        logger.info(f"   Deployment: {self.deployment}")
+        logger.info(f"   Messages: {len(messages)} message(s)")
+        logger.info(f"   Max tokens: {max_tokens}")
+
+        try:
+            response = await self.client.chat.completions.create(
+                model=self.deployment,  # Use deployment name as model
+                messages=messages,  # type: ignore
+                max_tokens=max_tokens,
+                temperature=temperature
+            )
+
+            content = response.choices[0].message.content
+
+            if content:
+                logger.info("Azure OpenAI success")
+                logger.info(f"   Response length: {len(content)} characters")
+                return content
+            else:
+                error_msg = "Azure OpenAI returned empty response"
+                logger.error(f"Error: {error_msg}")
+                raise ValueError(error_msg)
+
+        except Exception as e:
+            error_msg = f"Azure OpenAI request failed: {str(e)}"
+            logger.error(f"Error: {error_msg}")
+            raise
+
+
+# Global client instance
+_azure_client = None
+
+async def get_azure_text_client() -> AzureOpenAITextClient:
+    """Get the global Azure OpenAI text client instance."""
+    global _azure_client
+    if _azure_client is None:
+        _azure_client = AzureOpenAITextClient()
+    return _azure_client
+
+
+async def azure_text_completion(
+    messages: List[Dict[str, Any]],
+    max_tokens: int = 1000,
+    temperature: float = 0.1
+) -> str:
+    """
+    Convenience function for Azure OpenAI text completion.
+
+    Args:
+        messages: List of message dictionaries
+        max_tokens: Maximum tokens to generate
+        temperature: Temperature for generation
+
+    Returns:
+        The AI response content as a string
+    """
+    client = await get_azure_text_client()
+    return await client.chat_completion(messages, max_tokens, temperature)
+
+
+async def test_azure_text_client():
+    """Test the Azure OpenAI text client."""
+
+    print("Testing Azure OpenAI Text Client...")
+
+    test_messages = [
+        {"role": "user", "content": "Hello! Please respond with exactly 'Azure OpenAI text working correctly'."}
+    ]
+
+    try:
+        response = await azure_text_completion(
+            messages=test_messages,
+            max_tokens=20,
+            temperature=0.0
+        )
+        print(f"Azure OpenAI Text Success: {response}")
+        return True
+    except Exception as e:
+        print(f"Azure OpenAI Text Failed: {e}")
+        return False
+
+
+if __name__ == "__main__":
+    success = asyncio.run(test_azure_text_client())
+    if success:
+        print("Azure OpenAI text client is working!")
+    else:
+        print("Azure OpenAI text client has issues.")