Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions docs/AI_PROVIDERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# AI Provider Configuration

This document explains how to configure different AI providers for the GUM system.

## Overview

The GUM system uses a unified AI client that supports multiple providers for different tasks:

- **Text Completion**: Azure OpenAI (default) or OpenAI
- **Vision Completion**: OpenRouter (default)

## Provider Configuration

### Text Providers

#### Azure OpenAI (Default)
```bash
# Required environment variables
export AZURE_OPENAI_API_KEY="your-azure-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o" # Optional, defaults to gpt-4o

# Optional: Explicitly set text provider (defaults to azure)
export TEXT_PROVIDER="azure"
```

#### OpenAI
```bash
# Required environment variables
export OPENAI_API_KEY="your-openai-api-key"

# Optional environment variables
export OPENAI_MODEL="gpt-4o" # Optional, defaults to gpt-4o
export OPENAI_API_BASE="https://api.openai.com/v1" # Optional, uses default
export OPENAI_ORGANIZATION="your-org-id" # Optional

# Set text provider to OpenAI
export TEXT_PROVIDER="openai"
```

### Vision Providers

#### OpenRouter (Default)
```bash
# Required environment variables
export OPENROUTER_API_KEY="your-openrouter-api-key"

# Optional environment variables
export OPENROUTER_MODEL="qwen/qwen-2.5-vl-72b-instruct:free" # Optional, uses default

# Optional: Explicitly set vision provider (defaults to openrouter)
export VISION_PROVIDER="openrouter"
```

## Usage Examples

### Using Azure OpenAI for Text (Default)
```python
import asyncio
from gum import gum
from gum.observers import Observer

async def main():
# No special configuration needed - Azure is the default
async with gum("username", "model") as g:
# Your GUM code here
pass

asyncio.run(main())
```

### Using OpenAI for Text
```python
import asyncio
import os
from gum import gum
from gum.observers import Observer

async def main():
# Set OpenAI as text provider
os.environ["TEXT_PROVIDER"] = "openai"

async with gum("username", "model") as g:
# Your GUM code here
pass

asyncio.run(main())
```

### Testing Different Providers

#### Test OpenAI Client
```bash
python test_openai_client.py
```

#### Test Unified Client with Different Providers
```bash
# Test with Azure OpenAI (default)
python -c "import asyncio; from unified_ai_client import test_unified_client; asyncio.run(test_unified_client())"

# Test with OpenAI
TEXT_PROVIDER=openai python -c "import asyncio; from unified_ai_client import test_unified_client; asyncio.run(test_unified_client())"
```

## Provider Features

| Provider | Text Completion | Vision Completion | Notes |
|----------|----------------|-------------------|-------|
| Azure OpenAI | | | Enterprise-grade, requires Azure subscription |
| OpenAI | | | Direct OpenAI API, requires OpenAI account |
| OpenRouter | | | Multiple vision models, cost-effective |

## Error Handling

The unified client includes automatic retry logic with exponential backoff for transient errors. You can configure retry behavior:

```python
from unified_ai_client import UnifiedAIClient

client = UnifiedAIClient(
max_retries=5, # Maximum retry attempts
base_delay=2.0, # Base delay in seconds
max_delay=120.0, # Maximum delay between retries
backoff_factor=2.0, # Exponential backoff multiplier
jitter_factor=0.1 # Random jitter to prevent thundering herd
)
```

## Environment Variables Reference

### Azure OpenAI
- `AZURE_OPENAI_API_KEY` (required)
- `AZURE_OPENAI_ENDPOINT` (required)
- `AZURE_OPENAI_API_VERSION` (required)
- `AZURE_OPENAI_DEPLOYMENT` (optional, defaults to "gpt-4o")

### OpenAI
- `OPENAI_API_KEY` (required)
- `OPENAI_MODEL` (optional, defaults to "gpt-4o")
- `OPENAI_API_BASE` (optional, defaults to "https://api.openai.com/v1")
- `OPENAI_ORGANIZATION` (optional)

### OpenRouter
- `OPENROUTER_API_KEY` (required)
- `OPENROUTER_MODEL` (optional, defaults to "qwen/qwen-2.5-vl-72b-instruct:free")

### Provider Selection
- `TEXT_PROVIDER` (optional, "azure" or "openai", defaults to "azure")
- `VISION_PROVIDER` (optional, "openrouter", defaults to "openrouter")

## Troubleshooting

### Common Issues

1. **Missing API Keys**: Ensure all required environment variables are set
2. **Network Issues**: Check firewall/proxy settings
3. **Rate Limits**: The client includes automatic retry with backoff
4. **Model Availability**: Verify the model name is correct for your provider

### Debug Logging

Enable debug logging to troubleshoot issues:

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

This will show detailed HTTP requests and responses for debugging.
39 changes: 38 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,41 @@ mkdocs-material>=9.0.0
mkdocstrings>=0.24.0
mkdocstrings-python>=1.7.0
mistune==3.0.2
# pymdown-extensions>=10.0.0
# pymdown-extensions>=10.0.0

# Core dependencies for GUM (General User Models)
# Image processing and screen capture
pillow
mss
pynput
shapely

# macOS window management (conditionally installed)
pyobjc-framework-Quartz; sys_platform == "darwin"

# AI and OpenAI clients
openai>=1.0.0

# Database and ORM
SQLAlchemy>=2.0.0
aiosqlite
greenlet

# Data validation and serialization
pydantic>=2.0.0

# Environment and configuration
python-dotenv>=1.0.0

# Machine learning and data processing
scikit-learn
numpy

# Date/time utilities
python-dateutil

# Development and building tools (optional)
setuptools>=42
wheel
build
twine
167 changes: 167 additions & 0 deletions gum/azure_text_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#!/usr/bin/env python3
"""
Azure OpenAI Text Completion Utility

This utility handles text completions using the official Azure OpenAI Python SDK
with proper error handling and logging.
"""

import asyncio
import os
import logging
from typing import List, Dict, Any
from dotenv import load_dotenv
from openai import AsyncAzureOpenAI

# Load environment variables at module level, override existing ones
load_dotenv(override=True)

# Set up logging with debug level for httpx
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Enable httpx debug logging to see exact HTTP requests
httpx_logger = logging.getLogger("httpx")
httpx_logger.setLevel(logging.DEBUG)
httpx_handler = logging.StreamHandler()
httpx_handler.setFormatter(logging.Formatter("HTTPX: %(message)s"))
httpx_logger.addHandler(httpx_handler)


class AzureOpenAITextClient:
"""Azure OpenAI client for text completions using the official Azure OpenAI SDK."""

def __init__(self):
self.api_key = os.getenv("AZURE_OPENAI_API_KEY")
self.endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
self.api_version = os.getenv("AZURE_OPENAI_API_VERSION")
self.deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")

logger.info("Azure OpenAI Environment Debug:")
logger.info(f" API Key: {self.api_key[:10] + '...' + self.api_key[-4:] if self.api_key else 'None'}")
logger.info(f" Endpoint: {self.endpoint}")
logger.info(f" API Version: {self.api_version}")
logger.info(f" Deployment: {self.deployment}")

if not all([self.api_key, self.endpoint, self.api_version]):
raise ValueError("Azure OpenAI configuration incomplete. Check environment variables.")

# Initialize the Azure OpenAI client
self.client = AsyncAzureOpenAI(
api_key=self.api_key,
azure_endpoint=self.endpoint, # type: ignore
api_version=self.api_version
)

logger.info("Azure OpenAI Text Client initialized")
logger.info(f" Endpoint: {self.endpoint}")
logger.info(f" Deployment: {self.deployment}")
logger.info(f" API Version: {self.api_version}")

async def chat_completion(
self,
messages: List[Dict[str, Any]],
max_tokens: int = 1000,
temperature: float = 0.1
) -> str:
"""
Send a chat completion request to Azure OpenAI.

Args:
messages: List of message dictionaries
max_tokens: Maximum tokens to generate
temperature: Temperature for generation

Returns:
The AI response content as a string
"""

logger.info("Azure OpenAI text completion request")
logger.info(f" Deployment: {self.deployment}")
logger.info(f" Messages: {len(messages)} message(s)")
logger.info(f" Max tokens: {max_tokens}")

try:
response = await self.client.chat.completions.create(
model=self.deployment, # Use deployment name as model
messages=messages, # type: ignore
max_tokens=max_tokens,
temperature=temperature
)

content = response.choices[0].message.content

if content:
logger.info("Azure OpenAI success")
logger.info(f" Response length: {len(content)} characters")
return content
else:
error_msg = "Azure OpenAI returned empty response"
logger.error(f"Error: {error_msg}")
raise ValueError(error_msg)

except Exception as e:
error_msg = f"Azure OpenAI request failed: {str(e)}"
logger.error(f"Error: {error_msg}")
raise


# Global client instance
_azure_client = None

async def get_azure_text_client() -> AzureOpenAITextClient:
"""Get the global Azure OpenAI text client instance."""
global _azure_client
if _azure_client is None:
_azure_client = AzureOpenAITextClient()
return _azure_client


async def azure_text_completion(
messages: List[Dict[str, Any]],
max_tokens: int = 1000,
temperature: float = 0.1
) -> str:
"""
Convenience function for Azure OpenAI text completion.

Args:
messages: List of message dictionaries
max_tokens: Maximum tokens to generate
temperature: Temperature for generation

Returns:
The AI response content as a string
"""
client = await get_azure_text_client()
return await client.chat_completion(messages, max_tokens, temperature)


async def test_azure_text_client():
"""Test the Azure OpenAI text client."""

print("Testing Azure OpenAI Text Client...")

test_messages = [
{"role": "user", "content": "Hello! Please respond with exactly 'Azure OpenAI text working correctly'."}
]

try:
response = await azure_text_completion(
messages=test_messages,
max_tokens=20,
temperature=0.0
)
print(f"Azure OpenAI Text Success: {response}")
return True
except Exception as e:
print(f"Azure OpenAI Text Failed: {e}")
return False


if __name__ == "__main__":
success = asyncio.run(test_azure_text_client())
if success:
print("Azure OpenAI text client is working!")
else:
print("Azure OpenAI text client has issues.")
Loading