-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Table of Contents
Overview
This guide provides step-by-step instructions to optimize Docker build performance across all Dockerfiles in the MCP Gateway Registry project.
What You're Optimizing
Three Docker build configurations that currently have suboptimal layer caching and build strategies:
- Root Dockerfile - Main application container
- docker/Dockerfile.registry - Registry service with frontend
- metrics-service/Dockerfile - Metrics service
Expected Improvements
- Build time: Reduce from ~10-15 minutes to ~2-3 minutes (after first build)
- Rebuild time: When only app code changes, rebuild in <1 minute
- Image size: Reduce by 30-50% with multi-stage builds
- CI/CD: Faster deployments, more frequent releases
Key References
- Docker layer caching: https://docs.docker.com/build/cache/
- Multi-stage builds: https://docs.docker.com/build/building/multi-stage/
- npm ci vs npm install: https://docs.npmjs.com/cli/v8/commands/npm-ci
- Current Dockerfiles:
/Dockerfile,/docker/Dockerfile.registry,/metrics-service/Dockerfile
Architecture
Current Problems
Current Flow (Inefficient):
1. Copy ALL application code
2. Install dependencies
3. Build application
Problem: Any code change invalidates ALL layers, forcing full rebuild
Optimized Flow
Optimized Flow:
1. Copy ONLY dependency files (package.json, pyproject.toml)
2. Install dependencies (CACHED if dependencies unchanged)
3. Copy application code
4. Build application (CACHED if build unchanged)
Benefit: Code changes only rebuild final layers, reusing cached dependencies
Multi-stage Build Pattern
Stage 1: Builder (with build tools)
→ Install build dependencies
→ Build application
→ Large image size
Stage 2: Runtime (minimal)
→ Copy built artifacts from Stage 1
→ NO build tools in final image
→ Small, secure image
Step-by-Step Implementation
File Overview
Files that will be modified:
/Dockerfile(root)/docker/Dockerfile.registry/metrics-service/Dockerfile- No nginx changes needed (
.dockerignorealready optimal)
Step 1: Optimize Root Dockerfile
File: /Dockerfile
Lines to modify: 23-34
Current Code (BEFORE)
Lines 23-34:
# Set the working directory in the container
WORKDIR /app
# Copy the application code
COPY . /app/
# Copy nginx configurations (both HTTP-only and HTTP+HTTPS versions)
COPY docker/nginx_rev_proxy_http_only.conf /app/docker/nginx_rev_proxy_http_only.conf
COPY docker/nginx_rev_proxy_http_and_https.conf /app/docker/nginx_rev_proxy_http_and_https.conf
# Make the entrypoint script executable
COPY docker/entrypoint.sh /app/docker/entrypoint.sh
RUN chmod +x /app/docker/entrypoint.shProblem: Line 26 copies entire application before dependencies are installed, breaking layer cache.
New Code (AFTER)
Replace lines 23-34 with:
# Set the working directory in the container
WORKDIR /app
# Copy dependency files first for layer caching
COPY pyproject.toml ./
# Copy nginx configurations and scripts (rarely change, cache-friendly)
COPY docker/nginx_rev_proxy_http_only.conf /app/docker/nginx_rev_proxy_http_only.conf
COPY docker/nginx_rev_proxy_http_and_https.conf /app/docker/nginx_rev_proxy_http_and_https.conf
COPY docker/entrypoint.sh /app/docker/entrypoint.sh
RUN chmod +x /app/docker/entrypoint.sh
# Install Python dependencies (cached unless pyproject.toml changes)
RUN pip install uv && \
uv pip install --system -e .
# Copy the application code AFTER dependencies are installed
COPY . /app/Why This Works
- Line 26: Copy only
pyproject.tomlfirst - Lines 29-32: Copy nginx configs (rarely change)
- Lines 35-36: Install dependencies (cached if pyproject.toml unchanged)
- Line 39: Copy full app code LAST
Impact: Code changes only rebuild from line 39 onwards, saving 5-10 minutes per build.
Step 2: Optimize Registry Dockerfile (Option A - Quick Fix)
File: /docker/Dockerfile.registry
Lines to modify: 23-63
Current Code (BEFORE)
Lines 23-37 (Frontend build):
WORKDIR /app
# Copy the application code
COPY . /app/
# Copy nginx configurations (both HTTP-only and HTTP+HTTPS versions)
COPY docker/nginx_rev_proxy_http_only.conf /app/docker/nginx_rev_proxy_http_only.conf
COPY docker/nginx_rev_proxy_http_and_https.conf /app/docker/nginx_rev_proxy_http_and_https.conf
# Build React frontend
WORKDIR /app/frontend
COPY frontend/package.json ./
RUN npm install --legacy-peer-deps
COPY frontend/ ./
RUN npm run buildLines 39-63 (Python setup):
# Return to app directory
WORKDIR /app
# Install uv and setup Python environment
RUN pip install uv && \
uv venv .venv --python 3.12 && \
. .venv/bin/activate && \
uv pip install \
"fastapi>=0.115.12" \
# ... (more dependencies)
"hf_xet>=0.1.0" && \
uv pip install -e .Problems:
- Line 26: Copies entire app before dependencies
- Line 35: Uses
npm installinstead ofnpm ci(slower) - Line 43-63: Python dependencies installed after app copy
New Code (AFTER)
Replace lines 23-63 with:
WORKDIR /app
# Copy nginx configurations first (rarely change)
COPY docker/nginx_rev_proxy_http_only.conf /app/docker/nginx_rev_proxy_http_only.conf
COPY docker/nginx_rev_proxy_http_and_https.conf /app/docker/nginx_rev_proxy_http_and_https.conf
# Build React frontend - optimize layer caching
WORKDIR /app/frontend
COPY frontend/package.json frontend/package-lock.json ./
RUN npm ci --legacy-peer-deps
COPY frontend/ ./
RUN npm run build
# Return to app directory
WORKDIR /app
# Copy Python dependency files first
COPY pyproject.toml ./
# Install uv and Python dependencies (cached unless pyproject.toml changes)
RUN pip install uv && \
uv venv .venv --python 3.12 && \
. .venv/bin/activate && \
uv pip install \
"fastapi>=0.115.12" \
"itsdangerous>=2.2.0" \
"jinja2>=3.1.6" \
"mcp>=1.6.0" \
"pydantic>=2.11.3" \
"httpx>=0.27.0" \
"python-dotenv>=1.1.0" \
"python-multipart>=0.0.20" \
"uvicorn[standard]>=0.34.2" \
"faiss-cpu>=1.7.4" \
"sentence-transformers>=2.2.2" \
"websockets>=15.0.1" \
"scikit-learn>=1.3.0" \
"torch>=1.6.0" \
"huggingface-hub[cli,hf_xet]>=0.31.1" \
"hf_xet>=0.1.0" && \
uv pip install -e .
# Copy the rest of the application code LAST
COPY . /app/Key Changes
- Line 31: Copy both
package.jsonANDpackage-lock.json - Line 32: Changed
npm install→npm ci(faster, deterministic) - Line 41: Copy
pyproject.tomlbefore pip install - Line 59: Copy full app code LAST
Impact
- First build: Similar time (~10-15 min)
- Code-only changes: ~1-2 minutes (90% time savings)
- Dependency changes: ~5-8 minutes (50% time savings)
Step 3: Optimize Registry Dockerfile (Option B - Multi-stage Build)
File: /docker/Dockerfile.registry
Replace entire file
New Code (FULL FILE REPLACEMENT)
Replace ALL 79 lines with:
# =============================================================================
# Stage 1: Frontend Builder
# =============================================================================
FROM node:20-slim AS frontend-builder
WORKDIR /build
# Copy package files and install dependencies
COPY frontend/package.json frontend/package-lock.json ./
RUN npm ci --legacy-peer-deps
# Copy frontend source and build
COPY frontend/ ./
RUN npm run build
# =============================================================================
# Stage 2: Python Builder
# =============================================================================
FROM python:3.12-slim AS python-builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy and install Python dependencies
COPY pyproject.toml ./
RUN pip install uv && \
uv venv .venv --python 3.12 && \
. .venv/bin/activate && \
uv pip install \
"fastapi>=0.115.12" \
"itsdangerous>=2.2.0" \
"jinja2>=3.1.6" \
"mcp>=1.6.0" \
"pydantic>=2.11.3" \
"httpx>=0.27.0" \
"python-dotenv>=1.1.0" \
"python-multipart>=0.0.20" \
"uvicorn[standard]>=0.34.2" \
"faiss-cpu>=1.7.4" \
"sentence-transformers>=2.2.2" \
"websockets>=15.0.1" \
"scikit-learn>=1.3.0" \
"torch>=1.6.0" \
"huggingface-hub[cli,hf_xet]>=0.31.1" \
"hf_xet>=0.1.0"
# Copy app code and install as editable
COPY . /app/
RUN . .venv/bin/activate && uv pip install -e .
# =============================================================================
# Stage 3: Final Runtime
# =============================================================================
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1 \
DEBIAN_FRONTEND=noninteractive
# Install only runtime dependencies (no build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
nginx \
nginx-extras \
lua-cjson \
curl \
procps \
openssl \
ca-certificates \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy Python virtual environment from builder
COPY --from=python-builder /app/.venv /app/.venv
# Copy frontend build from frontend-builder
COPY --from=frontend-builder /build/build /app/frontend/build
# Copy application code
COPY . /app/
# Copy nginx configurations
COPY docker/nginx_rev_proxy_http_only.conf /app/docker/nginx_rev_proxy_http_only.conf
COPY docker/nginx_rev_proxy_http_and_https.conf /app/docker/nginx_rev_proxy_http_and_https.conf
# Create logs directory
RUN mkdir -p /app/logs
# Expose ports
EXPOSE 80 443 7860
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:7860/health || exit 1
# Entrypoint
COPY docker/registry-entrypoint.sh /app/registry-entrypoint.sh
RUN chmod +x /app/registry-entrypoint.sh
ENTRYPOINT ["/app/registry-entrypoint.sh"]Why Multi-stage Is Better
Benefits:
- Image size: Reduce from ~2GB to ~800MB-1.2GB
- Security: No build tools in production image
- Layer caching: Each stage caches independently
- Parallel builds: Docker can build stages concurrently
Stage 1 (lines 1-14): Builds frontend in isolated Node.js environment
Stage 2 (lines 16-54): Builds Python dependencies in isolated environment
Stage 3 (lines 56-110): Copies artifacts from both stages into minimal runtime image
Step 4: Optimize Metrics Service Dockerfile
File: /metrics-service/Dockerfile
Lines to modify: 10-12
Current Code (BEFORE)
Lines 10-12:
# Install dependencies
COPY metrics-service/pyproject.toml .
RUN pip install uv && uv pip install --system -e .Problem: Line 12 uses -e . (editable install) but app code not copied yet, causing potential issues.
New Code (AFTER) - Quick Fix
Replace lines 10-12 with:
# Install dependencies - copy pyproject.toml first
COPY metrics-service/pyproject.toml ./
RUN pip install uv && uv pip install --system .Change: Line 12 now uses . instead of -e . (no longer editable install).
Impact
- Minimal change, fixes logical issue
- Already well-structured for layer caching
Step 5: Optimize Metrics Service Dockerfile (Multi-stage)
File: /metrics-service/Dockerfile
Replace entire file (OPTIONAL but recommended)
New Code (FULL FILE REPLACEMENT)
Replace ALL 28 lines with:
# =============================================================================
# Stage 1: Builder
# =============================================================================
FROM python:3.12-slim AS builder
WORKDIR /app
# Install system dependencies needed for building
RUN apt-get update && apt-get install -y \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY metrics-service/pyproject.toml ./
RUN pip install uv && uv pip install --system .
# =============================================================================
# Stage 2: Runtime
# =============================================================================
FROM python:3.12-slim
WORKDIR /app
# Install only runtime dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
# Copy application code
COPY metrics-service/app/ app/
COPY metrics-service/create_api_key.py ./
# Create data directory
RUN mkdir -p /var/lib/sqlite
# Expose port
EXPOSE 8890
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8890/health || exit 1
CMD ["python", "-m", "app.main"]Impact
- Quick fix: Minimal improvement (already well-structured)
- Multi-stage: Reduce image size by ~30-40%
Implementation Priority
High Priority - Quick Wins
-
[ ] Fix
/Dockerfile(root) - Replace lines 23-34- Effort: 5 minutes
- Impact: High - used for main application builds
-
[ ] Fix
/docker/Dockerfile.registry- Option A (lines 23-63)- Effort: 10 minutes
- Impact: Very High - most complex, largest time savings
-
[ ] Change
npm installtonpm ci- Line 35 in registry Dockerfile- Effort: 1 minute (included in Option A)
- Impact: Medium - 20-30% faster npm installs
Medium Priority
-
[ ] Implement multi-stage build for
/docker/Dockerfile.registry- Option B- Effort: 15 minutes
- Impact: High - significant image size reduction
-
[ ] Fix
/metrics-service/Dockerfile- Lines 10-12- Effort: 2 minutes
- Impact: Low - already well-structured
Low Priority
- [ ] Implement multi-stage build for
/metrics-service/Dockerfile- Effort: 10 minutes
- Impact: Medium - image size reduction only
Testing Guide
Validation Steps
After implementing changes:
# 1. Build and time the first build
time docker build -f Dockerfile -t test-root-build .
# 2. Make a small code change (e.g., add comment to a .py file)
echo "# test comment" >> registry/main.py
# 3. Rebuild and time it (should be MUCH faster)
time docker build -f Dockerfile -t test-root-build .
# 4. Check image size
docker images test-root-buildExpected Results
Root Dockerfile:
- First build: ~8-10 minutes
- Second build (code only): ~30-45 seconds ✅
- Image size: Similar (no multi-stage yet)
Registry Dockerfile (Option A):
- First build: ~12-15 minutes
- Second build (code only): ~60-90 seconds ✅
- Image size: Similar (~2GB)
Registry Dockerfile (Option B - Multi-stage):
- First build: ~12-15 minutes
- Second build (code only): ~60-90 seconds ✅
- Image size: ~800MB-1.2GB ✅ (40-60% reduction)
Test Each Dockerfile
# Test root Dockerfile
time docker build -f Dockerfile -t mcp-gateway:optimized .
# Test registry Dockerfile
time docker build -f docker/Dockerfile.registry -t mcp-registry:optimized .
# Test metrics service Dockerfile
time docker build -f metrics-service/Dockerfile -t mcp-metrics:optimized .
# Check all image sizes
docker images | grep mcp-Integration Testing
# 1. Build all containers
docker-compose build
# 2. Start services
docker-compose up -d
# 3. Verify services are healthy
docker-compose ps
# 4. Check logs for errors
docker-compose logs | grep -i error
# 5. Test API endpoints
curl http://localhost/health
curl http://localhost:8890/healthTroubleshooting
Issue 1: Build fails with "pyproject.toml not found"
Symptoms:
COPY pyproject.toml ./
ERROR: failed to compute cache key: "/pyproject.toml" not found
Cause: File path incorrect or .dockerignore is blocking it
Solution:
# Check if pyproject.toml exists
ls -la pyproject.toml
# Check .dockerignore doesn't exclude it
cat .dockerignore | grep pyproject
# Verify build context includes it
docker build -f Dockerfile . --no-cacheIssue 2: Frontend build fails with "package-lock.json not found"
Symptoms:
COPY frontend/package.json frontend/package-lock.json ./
ERROR: failed to compute cache key: "/frontend/package-lock.json" not found
Cause: No package-lock.json in frontend directory (using yarn or pnpm instead of npm)
Solution:
# Check what lock file exists
ls -la frontend/
# If using package-lock.json doesn't exist, generate it:
cd frontend
npm install
cd ..
# OR modify Dockerfile to only copy package.json:
COPY frontend/package.json ./
RUN npm install --legacy-peer-deps # Keep npm install if no lock fileIssue 3: Multi-stage build larger than expected
Symptoms:
Image size after multi-stage build is not significantly smaller
Cause: Copying unnecessary files or directories in final stage
Solution:
# Check what's being copied in final stage
docker history mcp-registry:optimized
# Use dive tool to inspect layers
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
wagoodman/dive mcp-registry:optimized
# Only copy necessary files in final COPY statement:
COPY registry/ /app/registry/ # Instead of COPY . /app/Issue 4: "npm ci" fails but "npm install" works
Symptoms:
npm ERR! cipm can only install packages when your package.json and package-lock.json
npm ERR! are in sync.
Cause: package-lock.json out of sync with package.json
Solution:
# Regenerate package-lock.json
cd frontend
rm package-lock.json
npm install
git add package-lock.json
git commit -m "Update package-lock.json"
# Then rebuild
docker build -f docker/Dockerfile.registry -t mcp-registry:optimized .Issue 5: Layer cache not working as expected
Symptoms:
Docker rebuilds dependencies every time even though no changes
Cause: COPY command includes files that change frequently
Solution:
# Use --cache-from to debug
docker build --cache-from mcp-registry:latest -f docker/Dockerfile.registry . --progress=plain
# Check what files are being copied
# Ensure .dockerignore excludes volatile files:
cat .dockerignore
# Should include:
__pycache__/
*.pyc
.git/
logs/
node_modules/ # If copying from hostIssue 6: Virtual environment not activating in multi-stage
Symptoms:
ModuleNotFoundError: No module named 'fastapi'
Cause: Virtual environment path incorrect or not activated
Solution:
Option A: Copy site-packages directly (recommended):
# In builder stage
RUN pip install uv && uv pip install --system <packages>
# In final stage
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packagesOption B: Activate venv in entrypoint:
# In final stage
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"Issue 7: Image builds but fails at runtime
Symptoms:
Container starts but immediately exits with error
Cause: Missing runtime dependencies in final stage
Solution:
# Check container logs
docker logs <container-id>
# Common missing dependencies:
# - curl (for health checks)
# - nginx (for registry)
# - shared libraries
# Add to final stage:
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
nginx \
&& rm -rf /var/lib/apt/lists/*Validation Checklist
Use this to verify optimizations are working:
Build Performance
- First build completes successfully
- Second build (code change only) takes <2 minutes
- Dependency change rebuilds take ~50% less time
- No cache warnings in build output
Image Quality
- Multi-stage images are 30-50% smaller
- Containers start without errors
- Health checks pass
- All services respond to requests
Functionality
- Registry API responds:
curl http://localhost/v0/servers - Metrics service responds:
curl http://localhost:8890/health - Frontend loads correctly
- Authentication works
Layer Caching
# Verify cache is being used:
docker build -f Dockerfile . 2>&1 | grep "CACHED"
# Should see output like:
# => CACHED [2/8] WORKDIR /app
# => CACHED [3/8] COPY pyproject.toml ./
# => CACHED [4/8] RUN pip install uv && ...Summary
You've now optimized:
✅ Root Dockerfile - Proper dependency layer caching
✅ Registry Dockerfile - npm ci + layer caching + multi-stage option
✅ Metrics Dockerfile - Fixed editable install + multi-stage option
✅ Build times - Reduced from 10-15min to 1-3min for code changes
✅ Image sizes - Reduced by 30-50% with multi-stage builds
Quick Reference
| Dockerfile | Priority | Effort | Time Savings | Size Reduction |
|---|---|---|---|---|
/Dockerfile |
High | 5 min | 80-90% | 0% |
/docker/Dockerfile.registry (Option A) |
Very High | 10 min | 85-90% | 0% |
/docker/Dockerfile.registry (Option B) |
Medium | 15 min | 85-90% | 40-60% |
/metrics-service/Dockerfile (Quick) |
Medium | 2 min | 10% | 0% |
/metrics-service/Dockerfile (Multi-stage) |
Low | 10 min | 10% | 30-40% |
Next Steps
- Start with High Priority quick wins (Steps 1-2)
- Test thoroughly after each change
- Commit changes incrementally
- Implement multi-stage builds when time allows
- Update CI/CD pipelines to leverage cache
Good luck with your optimization! 🚀