Skip to content

Commit 62953b8

Browse files
committed
Add support to deploy local minikube Ray test environment
1 parent 64ddd69 commit 62953b8

File tree

6 files changed

+527
-0
lines changed

6 files changed

+527
-0
lines changed

etc/minikube/ray/README.md

Lines changed: 358 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,358 @@
1+
# Ray + Jupyter Enterprise Gateway + JupyterHub on Minikube
2+
3+
This directory contains scripts and configuration files to deploy a complete Jupyter development environment on Minikube with Ray cluster support for distributed computing.
4+
5+
## Overview
6+
7+
This setup provides:
8+
9+
- **Minikube Kubernetes Cluster**: Local Kubernetes environment for testing and development
10+
- **Ray Operator**: Manages Ray clusters for distributed Python workloads
11+
- **Jupyter Enterprise Gateway**: Enables remote kernel execution on Ray and Kubernetes
12+
- **JupyterHub**: Multi-user notebook server with custom spawner integration
13+
14+
## Architecture
15+
16+
```
17+
┌─────────────────────────────────────────────────────────┐
18+
│ Minikube Cluster │
19+
│ │
20+
│ ┌──────────────┐ ┌─────────────────────────┐ │
21+
│ │ JupyterHub │────────▶│ Enterprise Gateway │ │
22+
│ │ (hub ns) │ │ (enterprise-gateway ns)│ │
23+
│ └──────────────┘ └─────────────────────────┘ │
24+
│ │ │ │
25+
│ │ │ │
26+
│ ▼ ▼ │
27+
│ ┌──────────────┐ ┌─────────────────────────┐ │
28+
│ │ User Pods │ │ Ray Kernels │ │
29+
│ │ (Notebooks) │ │ (ray_python_operator) │ │
30+
│ └──────────────┘ └─────────────────────────┘ │
31+
│ │ │
32+
│ ▼ │
33+
│ ┌─────────────────────────┐ │
34+
│ │ KubeRay Operator │ │
35+
│ │ (Ray Clusters) │ │
36+
│ └─────────────────────────┘ │
37+
└─────────────────────────────────────────────────────────┘
38+
```
39+
40+
## Prerequisites
41+
42+
Before running the installation, ensure you have:
43+
44+
- **Docker Desktop**: Running and accessible
45+
- **Minikube**: Installed (`brew install minikube` on macOS)
46+
- **kubectl**: Kubernetes command-line tool
47+
- **Helm 3**: Package manager for Kubernetes
48+
- **EG_HOME**: Environment variable pointing to Enterprise Gateway repository root
49+
50+
```bash
51+
# Example setup
52+
export EG_HOME=/Users/lresende/opensource/jupyter/enterprise-gateway
53+
```
54+
55+
## Installation
56+
57+
### Initial Cluster Setup
58+
59+
Run `install-minikube-ray.sh` to create a complete new cluster from scratch:
60+
61+
```bash
62+
./install-minikube-ray.sh
63+
```
64+
65+
**What this script does:**
66+
67+
1. **Launches Docker Desktop** (if not running)
68+
2. **Stops existing Minikube cluster** named `ray` (if it exists)
69+
3. **Starts Minikube** with:
70+
- Profile: `ray`
71+
- Driver: `docker`
72+
- Kubernetes version: `v1.31`
73+
- Memory: `12GB`
74+
4. **Installs KubeRay Operator** (v1.5.0) via Helm
75+
- Manages Ray cluster lifecycle
76+
- Handles Ray pod scheduling and scaling
77+
5. **Deploys Enterprise Gateway** via Helm chart
78+
- Uses local development build from `$EG_HOME/dist/`
79+
- Configured with `enterprise-gateway-minikube-helm.yaml`
80+
- Default kernel: `ray_python_operator`
81+
- Service exposed on NodePort 30088
82+
6. **Applies Network Policy** (`enterprise-gateway-network.yaml`)
83+
- Allows all ingress/egress for Enterprise Gateway namespace
84+
7. **Installs JupyterHub** (v4.3.1) via Helm
85+
- Custom KubeSpawner configuration
86+
- Integrates with Enterprise Gateway
87+
- Admin users: root, jovyan, lresende
88+
- Exposed via NodePort service
89+
8. **Displays service URL** for JupyterHub proxy
90+
91+
**Expected Output:**
92+
93+
At the end of installation, you'll see the JupyterHub URL:
94+
95+
```
96+
http://127.0.0.1:XXXXX
97+
```
98+
99+
Open this URL in your browser to access JupyterHub.
100+
101+
### Notes on Installation Script
102+
103+
The script includes two options for cluster management (line 3-4):
104+
105+
```bash
106+
minikube -p ray stop # Updates existing cluster
107+
# minikube -p ray stop && minikube -p ray delete # Creates fresh cluster
108+
```
109+
110+
- **Default behavior**: Stops and restarts the existing cluster (preserves state)
111+
- **Alternative**: Uncomment line 4 to completely delete and recreate the cluster
112+
113+
## Development Workflow
114+
115+
### Building and Updating Images
116+
117+
Use `update-minikube-ray.sh` when you've made changes to Enterprise Gateway or kernel images:
118+
119+
```bash
120+
./update-minikube-ray.sh
121+
```
122+
123+
**What this script does:**
124+
125+
1. **Navigates to EG_HOME** and builds distributions:
126+
```bash
127+
make clean dist
128+
```
129+
- Creates Helm chart tarball for Enterprise Gateway
130+
131+
2. **Builds and pushes Docker images**:
132+
```bash
133+
make clean-enterprise-gateway enterprise-gateway push-enterprise-gateway \
134+
clean-kernel-ray-py kernel-ray-py push-kernel-ray-py \
135+
HUB_ORG=lresende TAG=dev
136+
```
137+
- Builds `lresende/enterprise-gateway:dev`
138+
- Builds `lresende/kernel-ray-py:dev`
139+
- Pushes to DockerHub (requires authentication)
140+
141+
3. **Loads images into Minikube**:
142+
```bash
143+
minikube image load lresende/enterprise-gateway:dev
144+
minikube image load lresende/kernel-ray-py:dev
145+
```
146+
- Makes images available to Kubernetes without pulling from registry
147+
148+
4. **Restarts Enterprise Gateway deployment**:
149+
```bash
150+
kubectl rollout restart deployment/enterprise-gateway -n enterprise-gateway
151+
```
152+
- Picks up new image versions
153+
- Zero-downtime rolling update
154+
155+
5. **Displays Enterprise Gateway service URL** for verification
156+
157+
**When to use this script:**
158+
159+
- After modifying Enterprise Gateway source code
160+
- After updating kernel image definitions
161+
- When testing new features or bug fixes
162+
- Before creating pull requests
163+
164+
**Note**: The script assumes you have push access to the `lresende` DockerHub organization. Modify `HUB_ORG` in the script if using a different registry.
165+
166+
## Configuration Files
167+
168+
### enterprise-gateway-minikube-helm.yaml
169+
170+
Helm values for Enterprise Gateway deployment:
171+
172+
- **Image**: `lresende/enterprise-gateway:dev`
173+
- **Kernel Configuration**:
174+
- Allowed kernels: `ray_python_operator`, `python_kubernetes`
175+
- Default kernel: `ray_python_operator`
176+
- Launch/timeout settings: 500 seconds (helpful for debugging)
177+
- Idle timeout: 3600 seconds (1 hour)
178+
- **Service**: NodePort on 30088 (HTTP) and 30077 (responses)
179+
- **RBAC**: Enabled with `enterprise-gateway-sa` service account
180+
- **KIP**: Enabled for pre-pulling kernel images from Docker Hub
181+
182+
### jupyterhub-config.yaml
183+
184+
JupyterHub configuration with custom spawner:
185+
186+
- **Database**: In-memory SQLite (not for production!)
187+
- **Authenticator**: Admin users configured (root, jovyan, lresende)
188+
- **Custom Spawner**: `CustomKubeSpawner` extends KubeSpawner
189+
- Sets `JUPYTER_GATEWAY_URL` to Enterprise Gateway service
190+
- Configures kernel namespace: `enterprise-gateway`
191+
- Sets service account: `enterprise-gateway-sa`
192+
- Passes username to kernels via environment variables
193+
- **Single-user Image**: `quay.io/jupyterhub/k8s-singleuser-sample:4.3.1`
194+
- **Storage**: Ephemeral (type: none) - notebooks are not persisted
195+
- **Default UI**: JupyterLab (`/lab`)
196+
197+
### enterprise-gateway-network.yaml
198+
199+
Kubernetes NetworkPolicy that allows all traffic to/from Enterprise Gateway namespace:
200+
201+
- Required for Enterprise Gateway to communicate with kernels
202+
- Allows kernels in the same namespace to connect back to gateway
203+
- In production, consider more restrictive policies
204+
205+
## Usage
206+
207+
### Access JupyterHub
208+
209+
1. After installation, open the URL displayed by the script
210+
2. Log in with username: `root`, `jovyan`, or `lresende` (any password works in dev mode)
211+
3. Wait for your user pod to start (first launch may take 1-2 minutes)
212+
213+
### Launch a Ray Kernel
214+
215+
1. In JupyterLab, create a new notebook
216+
2. Select kernel: **Ray Python (ray_python_operator)**
217+
3. Run Python code that executes on Ray:
218+
219+
```python
220+
import ray
221+
ray.init(address='auto')
222+
223+
@ray.remote
224+
def compute_pi(n):
225+
import random
226+
count = sum(1 for _ in range(n)
227+
if random.random()**2 + random.random()**2 <= 1)
228+
return 4.0 * count / n
229+
230+
# Distributed computation across Ray cluster
231+
futures = [compute_pi.remote(1000000) for _ in range(10)]
232+
results = ray.get(futures)
233+
print(f"Pi estimate: {sum(results) / len(results)}")
234+
```
235+
236+
### Verify Cluster Status
237+
238+
```bash
239+
# Check all pods across namespaces
240+
kubectl get pods --all-namespaces
241+
242+
# Check Enterprise Gateway logs
243+
kubectl logs -n enterprise-gateway deployment/enterprise-gateway -f
244+
245+
# Check JupyterHub logs
246+
kubectl logs -n hub deployment/hub -f
247+
248+
# List running Ray clusters
249+
kubectl get rayclusters --all-namespaces
250+
251+
# Access Enterprise Gateway directly
252+
minikube -p ray service enterprise-gateway -n enterprise-gateway --url
253+
```
254+
255+
## Troubleshooting
256+
257+
### Minikube won't start
258+
259+
```bash
260+
# Clean up and retry
261+
minikube -p ray delete
262+
./install-minikube-ray.sh
263+
```
264+
265+
### Pods stuck in ImagePullBackOff
266+
267+
```bash
268+
# Verify images are loaded
269+
minikube -p ray image ls | grep lresende
270+
271+
# Re-run update script
272+
./update-minikube-ray.sh
273+
```
274+
275+
### Kernels fail to start
276+
277+
Check timeout settings and logs:
278+
279+
```bash
280+
# View Enterprise Gateway logs
281+
kubectl logs -n enterprise-gateway deployment/enterprise-gateway --tail=100
282+
283+
# Check kernel pods
284+
kubectl get pods -n enterprise-gateway -l kernel_id
285+
286+
# Describe a failing pod
287+
kubectl describe pod -n enterprise-gateway <kernel-pod-name>
288+
```
289+
290+
### JupyterHub can't connect to Enterprise Gateway
291+
292+
Verify the service URL configuration:
293+
294+
```bash
295+
# Get Enterprise Gateway service URL
296+
kubectl get svc -n enterprise-gateway enterprise-gateway
297+
298+
# Should show: http://enterprise-gateway.enterprise-gateway:8888
299+
# This matches JUPYTER_GATEWAY_URL in jupyterhub-config.yaml
300+
```
301+
302+
### Ray cluster issues
303+
304+
```bash
305+
# Check KubeRay operator
306+
kubectl get pods -l app.kubernetes.io/name=kuberay-operator
307+
308+
# View operator logs
309+
kubectl logs -l app.kubernetes.io/name=kuberay-operator -f
310+
```
311+
312+
## Cleanup
313+
314+
### Stop the cluster (preserves state)
315+
316+
```bash
317+
minikube -p ray stop
318+
```
319+
320+
### Delete everything
321+
322+
```bash
323+
minikube -p ray delete
324+
```
325+
326+
This removes all data, configurations, and the Minikube VM.
327+
328+
## Development Tips
329+
330+
1. **Faster iteration**: Use `imagePullPolicy: Never` in Helm configs during development to force local image usage
331+
332+
2. **Debug mode**: Both Enterprise Gateway and JupyterHub are configured with debug logging enabled
333+
334+
3. **Resource monitoring**:
335+
```bash
336+
# Watch resource usage
337+
kubectl top nodes
338+
kubectl top pods --all-namespaces
339+
```
340+
341+
4. **Port forwarding** (alternative to NodePort):
342+
```bash
343+
kubectl port-forward -n hub svc/proxy-public 8000:80
344+
kubectl port-forward -n enterprise-gateway svc/enterprise-gateway 8888:8888
345+
```
346+
347+
5. **Multi-arch builds**: Uncomment the `MULTIARCH_BUILD=true` line in `update-minikube-ray.sh` if building for ARM and x86 architectures
348+
349+
## References
350+
351+
- [Jupyter Enterprise Gateway Documentation](https://jupyter-enterprise-gateway.readthedocs.io/)
352+
- [KubeRay Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html)
353+
- [JupyterHub on Kubernetes](https://z2jh.jupyter.org/)
354+
- [Ray Documentation](https://docs.ray.io/)
355+
356+
## License
357+
358+
This configuration is part of the Jupyter Enterprise Gateway project. See the parent repository for license information.

0 commit comments

Comments
 (0)