Skip to content

Commit 646dfbc

Browse files
committed
kerberos auth for proxy
1 parent aee6863 commit 646dfbc

File tree

10 files changed

+114791
-4
lines changed

10 files changed

+114791
-4
lines changed

CLAUDE.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Repository Overview
6+
7+
This is the official Python client for Databricks SQL. It implements PEP 249 (DB API 2.0) and uses Apache Thrift for communication with Databricks clusters/SQL warehouses.
8+
9+
## Essential Development Commands
10+
11+
```bash
12+
# Install dependencies
13+
poetry install
14+
15+
# Install with PyArrow support (recommended)
16+
poetry install --all-extras
17+
18+
# Run unit tests
19+
poetry run python -m pytest tests/unit
20+
21+
# Run specific test
22+
poetry run python -m pytest tests/unit/test_client.py::ClientTestSuite::test_method_name
23+
24+
# Code formatting (required before commits)
25+
poetry run black src
26+
27+
# Type checking
28+
poetry run mypy --install-types --non-interactive src
29+
30+
# Check formatting without changing files
31+
poetry run black src --check
32+
```
33+
34+
## High-Level Architecture
35+
36+
### Core Components
37+
38+
1. **Client Layer** (`src/databricks/sql/client.py`)
39+
- Main entry point implementing DB API 2.0
40+
- Handles connections, cursors, and query execution
41+
- Key classes: `Connection`, `Cursor`
42+
43+
2. **Backend Layer** (`src/databricks/sql/backend/`)
44+
- Thrift-based communication with Databricks
45+
- Handles protocol-level operations
46+
- Key files: `thrift_backend.py`, `databricks_client.py`
47+
- SEA (Streaming Execute API) support in `experimental/backend/sea_backend.py`
48+
49+
3. **Authentication** (`src/databricks/sql/auth/`)
50+
- Multiple auth methods: OAuth U2M/M2M, PAT, custom providers
51+
- Authentication flow abstraction
52+
- OAuth persistence support for token caching
53+
54+
4. **Data Transfer** (`src/databricks/sql/cloudfetch/`)
55+
- Cloud fetch for large results
56+
- Arrow format support for efficiency
57+
- Handles data pagination and streaming
58+
- Result set management in `result_set.py`
59+
60+
5. **Parameters** (`src/databricks/sql/parameters/`)
61+
- Native parameter support (v3.0.0+) - server-side parameterization
62+
- Inline parameters (legacy) - client-side interpolation
63+
- SQL injection prevention
64+
- Type mapping and conversion
65+
66+
6. **Telemetry** (`src/databricks/sql/telemetry/`)
67+
- Usage metrics and performance monitoring
68+
- Configurable batch processing and time-based flushing
69+
- Server-side flag integration
70+
71+
### Key Design Patterns
72+
73+
- **Result Sets**: Uses Arrow format by default for efficient data transfer
74+
- **Error Handling**: Comprehensive retry logic with exponential backoff
75+
- **Resource Management**: Context managers for proper cleanup
76+
- **Type System**: Strong typing with MyPy throughout
77+
78+
## Testing Strategy
79+
80+
### Unit Tests (No Databricks account needed)
81+
```bash
82+
poetry run python -m pytest tests/unit
83+
```
84+
85+
### E2E Tests (Requires Databricks account)
86+
1. Set environment variables or create `test.env` file:
87+
```bash
88+
export DATABRICKS_SERVER_HOSTNAME="****"
89+
export DATABRICKS_HTTP_PATH="/sql/1.0/endpoints/****"
90+
export DATABRICKS_TOKEN="dapi****"
91+
```
92+
2. Run: `poetry run python -m pytest tests/e2e`
93+
94+
Test organization:
95+
- `tests/unit/` - Fast, isolated unit tests
96+
- `tests/e2e/` - Integration tests against real Databricks
97+
- Test files follow `test_*.py` naming convention
98+
- Test suites: core, large queries, staging ingestion, retry logic
99+
100+
## Important Development Notes
101+
102+
1. **Dependency Management**: Always use Poetry, never pip directly
103+
2. **Code Style**: Black formatter with 100-char line limit (PEP 8 with this exception)
104+
3. **Type Annotations**: Required for all new code
105+
4. **Thrift Files**: Generated code in `thrift_api/` - do not edit manually
106+
5. **Parameter Security**: Always use native parameters, never string interpolation
107+
6. **Arrow Support**: Optional but highly recommended for performance
108+
7. **Python Support**: 3.8+ (up to 3.13)
109+
8. **DCO**: Sign commits with Developer Certificate of Origin
110+
111+
## Common Development Tasks
112+
113+
### Adding a New Feature
114+
1. Implement in appropriate module under `src/databricks/sql/`
115+
2. Add unit tests in `tests/unit/`
116+
3. Add integration tests in `tests/e2e/` if needed
117+
4. Update type hints and ensure MyPy passes
118+
5. Run Black formatter before committing
119+
120+
### Debugging Connection Issues
121+
- Check auth configuration in `auth/` modules
122+
- Review retry logic in `src/databricks/sql/utils.py`
123+
- Enable debug logging for detailed trace
124+
125+
### Working with Thrift
126+
- Protocol definitions in `src/databricks/sql/thrift_api/`
127+
- Backend implementation in `backend/thrift_backend.py`
128+
- Don't modify generated Thrift files directly
129+
130+
### Running Examples
131+
Example scripts are in `examples/` directory:
132+
- Basic query execution examples
133+
- OAuth authentication patterns
134+
- Parameter usage (native vs inline)
135+
- Staging ingestion operations
136+
- Custom credential providers

docs/proxy_configuration.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Proxy Configuration Guide
2+
3+
This guide explains how to configure the Databricks SQL Connector for Python to work with HTTP/HTTPS proxies, including support for Kerberos authentication.
4+
5+
## Table of Contents
6+
- [Basic Proxy Configuration](#basic-proxy-configuration)
7+
- [Proxy with Basic Authentication](#proxy-with-basic-authentication)
8+
- [Proxy with Kerberos Authentication](#proxy-with-kerberos-authentication)
9+
- [Troubleshooting](#troubleshooting)
10+
11+
## Basic Proxy Configuration
12+
13+
The connector automatically detects proxy settings from environment variables:
14+
15+
```bash
16+
# For HTTPS connections (most common)
17+
export HTTPS_PROXY=http://proxy.example.com:8080
18+
19+
# For HTTP connections
20+
export HTTP_PROXY=http://proxy.example.com:8080
21+
22+
# Hosts to bypass proxy
23+
export NO_PROXY=localhost,127.0.0.1,.internal.company.com
24+
```
25+
26+
Then connect normally:
27+
28+
```python
29+
from databricks import sql
30+
31+
connection = sql.connect(
32+
server_hostname="your-workspace.databricks.com",
33+
http_path="/sql/1.0/warehouses/your-warehouse",
34+
access_token="your-token"
35+
)
36+
```
37+
38+
## Proxy with Basic Authentication
39+
40+
For proxies requiring username/password authentication, include credentials in the proxy URL:
41+
42+
```bash
43+
export HTTPS_PROXY=http://username:password@proxy.example.com:8080
44+
```
45+
46+
## Proxy with Kerberos Authentication
47+
48+
For enterprise environments using Kerberos authentication on proxies:
49+
50+
### Prerequisites
51+
52+
1. Install Kerberos dependencies:
53+
```bash
54+
pip install databricks-sql-connector[kerberos]
55+
```
56+
57+
2. Obtain a valid Kerberos ticket:
58+
```bash
59+
kinit user@EXAMPLE.COM
60+
```
61+
62+
3. Set proxy environment variables (without credentials):
63+
```bash
64+
export HTTPS_PROXY=http://proxy.example.com:8080
65+
```
66+
67+
### Connection with Kerberos Proxy
68+
69+
```python
70+
from databricks import sql
71+
72+
connection = sql.connect(
73+
server_hostname="your-workspace.databricks.com",
74+
http_path="/sql/1.0/warehouses/your-warehouse",
75+
access_token="your-databricks-token",
76+
77+
# Enable Kerberos proxy authentication
78+
_proxy_auth_type="kerberos",
79+
80+
# Optional Kerberos settings
81+
_proxy_kerberos_service_name="HTTP", # Default: "HTTP"
82+
_proxy_kerberos_principal="user@EXAMPLE.COM", # Optional: uses default if not set
83+
_proxy_kerberos_delegate=False, # Enable credential delegation
84+
_proxy_kerberos_mutual_auth="REQUIRED" # Options: REQUIRED, OPTIONAL, DISABLED
85+
)
86+
```
87+
88+
### Kerberos Configuration Options
89+
90+
| Parameter | Default | Description |
91+
|-----------|---------|-------------|
92+
| `_proxy_auth_type` | None | Set to `"kerberos"` to enable Kerberos proxy auth |
93+
| `_proxy_kerberos_service_name` | `"HTTP"` | Kerberos service name for the proxy |
94+
| `_proxy_kerberos_principal` | None | Specific principal to use (uses default if not set) |
95+
| `_proxy_kerberos_delegate` | `False` | Whether to delegate credentials to the proxy |
96+
| `_proxy_kerberos_mutual_auth` | `"REQUIRED"` | Mutual authentication requirement level |
97+
98+
### Example: Custom Kerberos Settings
99+
100+
```python
101+
# Using a specific service principal with delegation
102+
connection = sql.connect(
103+
server_hostname="your-workspace.databricks.com",
104+
http_path="/sql/1.0/warehouses/your-warehouse",
105+
access_token="your-token",
106+
107+
_proxy_auth_type="kerberos",
108+
_proxy_kerberos_service_name="HTTP",
109+
_proxy_kerberos_principal="dbuser@CORP.EXAMPLE.COM",
110+
_proxy_kerberos_delegate=True, # Allow credential delegation
111+
_proxy_kerberos_mutual_auth="OPTIONAL" # Less strict verification
112+
)
113+
```
114+
115+
## Troubleshooting
116+
117+
### Kerberos Authentication Issues
118+
119+
1. **No Kerberos ticket**:
120+
```bash
121+
# Check if you have a valid ticket
122+
klist
123+
124+
# If not, obtain one
125+
kinit user@EXAMPLE.COM
126+
```
127+
128+
2. **Wrong service principal**:
129+
- Check with your IT team for the correct proxy service principal name
130+
- It's typically `HTTP@proxy.example.com` but may vary
131+
132+
3. **Import errors**:
133+
```
134+
ImportError: Kerberos proxy authentication requires 'pykerberos'
135+
```
136+
Solution: Install with `pip install databricks-sql-connector[kerberos]`
137+
138+
### Proxy Connection Issues
139+
140+
1. **Enable debug logging**:
141+
```python
142+
import logging
143+
logging.basicConfig(level=logging.DEBUG)
144+
```
145+
146+
2. **Test proxy connectivity**:
147+
```bash
148+
# Test if proxy is reachable
149+
curl -x http://proxy.example.com:8080 https://www.databricks.com
150+
```
151+
152+
3. **Verify environment variables**:
153+
```python
154+
import os
155+
print(f"HTTPS_PROXY: {os.environ.get('HTTPS_PROXY')}")
156+
print(f"NO_PROXY: {os.environ.get('NO_PROXY')}")
157+
```
158+
159+
### Platform-Specific Notes
160+
161+
- **Linux/Mac**: Uses `pykerberos` library
162+
- **Windows**: Uses `winkerberos` library (automatically selected)
163+
- **Docker/Containers**: Ensure Kerberos configuration files are mounted
164+
165+
## Security Considerations
166+
167+
1. **Avoid hardcoding credentials** - Use environment variables or secure credential stores
168+
2. **Use HTTPS connections** - Even through proxies, maintain encrypted connections to Databricks
169+
3. **Credential delegation** - Only enable `_proxy_kerberos_delegate=True` if required by your proxy
170+
4. **Mutual authentication** - Keep `_proxy_kerberos_mutual_auth="REQUIRED"` for maximum security
171+
172+
## See Also
173+
174+
- [Kerberos Proxy Example](../examples/kerberos_proxy_auth.py)
175+
- [Databricks SQL Connector Documentation](https://docs.databricks.com/dev-tools/python-sql-connector.html)

0 commit comments

Comments
 (0)