diff --git a/utilities/glue_python_dependency_packager/README.md b/utilities/glue_python_dependency_packager/README.md deleted file mode 100644 index 436fb3b..0000000 --- a/utilities/glue_python_dependency_packager/README.md +++ /dev/null @@ -1,155 +0,0 @@ -# Glue Python Dependency Packager - -This command line utility packages Python dependencies into a single wheel file for use with AWS Glue jobs. The tool creates an "uber wheel" containing all your dependencies and their transitive dependencies, ensuring consistent and reliable deployments across AWS Glue environments. - -## Features - -- Resolves all transitive dependencies for target Glue environments -- Creates a single wheel file containing all dependencies -- Automatically selects correct Python version and platform tags based on Glue version -- Provides runtime installation utilities for easy dependency loading in Glue jobs -- Validates glibc compatibility for native dependencies - -## Supported Glue Versions - -- AWS Glue 5.0 (Python 3.11, glibc 2.34) -- AWS Glue 4.0 (Python 3.10, glibc 2.26) -- AWS Glue 3.0 (Python 3.7, glibc 2.26) -- AWS Glue 2.0 (Python 3.7, glibc 2.17) -- AWS Glue 1.0 (Python 3.6, glibc 2.17) -- AWS Glue 0.9 (Python 2.7, glibc 2.17) - -## How to use it - -You can run this utility in any location where you have Python and the following environment. - -### Pre-requisite - -- Python 3.6+ (matching or compatible with your target Glue version) -- pip, build, wheel, setuptools, and pip-tools -- A `requirements.txt` file with your dependencies - -### Command line Syntax - -```bash -./wheel_packager.sh -g GLUE_VERSION [OPTIONS] -``` - -**Required Arguments:** - -- `-g, --glue-version VERSION` - AWS Glue version (required) - -**Optional Arguments:** - -- `-r, --requirements FILE` - Path to requirements.txt file (default: requirements.txt) -- `-o, --wheel-output DIR` - Output directory for final wheel (default: current directory) -- `-n, --name NAME` - Package name (default: current directory name) -- `-v, --version VERSION` - Package version (default: 0.1.0) -- `-h, --help` - Show help message - -## Examples - -### Example 1. Basic usage with requirements.txt - -```bash -./wheel_packager.sh -g 4.0 -r path/to/requirements.txt -``` - -### Example 2. Custom output directory - -```bash -./wheel_packager.sh -g 4.0 -r path/to/requirements.txt -o dist -``` - -### Example 3. Full configuration - -```bash -./wheel_packager.sh -g 4.0 -r path/to/requirements.txt -n my_glue_dependencies -v 1.0.0 -o dist -``` - -### Example of command output - -``` -========================================= -Building wheel for my_glue_dependencies with all dependencies from requirements.txt -========================================= -Using Glue version 4.0 -Using Glue python version 3.10 -Using Glue glibc version 2.26 - -Step 1/5: Installing build tools... -✓ Build tools installed successfully - -Step 2/5: Creating build environment... -✓ Build environment created successfully - -Step 3/5: Resolving all dependencies... -✓ Dependencies resolved successfully - -Step 4/5: Downloading all dependency wheels... -✓ Downloaded 15 dependency wheels successfully - -Step 5/5: Creating uber wheel with all dependencies included... -✓ Uber wheel created successfully! - -========================================= -BUILD COMPLETED SUCCESSFULLY! -========================================= -Final wheel: ./my_glue_dependencies-1.0.0-py3-none-any.whl -Wheel size: 25M -Dependencies included: 15 packages - -To install the bundle, run: - pip install ./my_glue_dependencies-1.0.0-py3-none-any.whl -``` - -## Using the Generated Wheel - -### 1. Upload to S3 - -```bash -aws s3 cp my_glue_dependencies-1.0.0-py3-none-any.whl s3://your-bucket/glue-dependencies/ -``` - -### 2. Configure your Glue job - -Add the following job parameter: -``` ---additional-python-modules s3://your-bucket/glue-dependencies/my_glue_dependencies-1.0.0-py3-none-any.whl -``` - -### 3. Use in your Glue script - -```python -# Option 1: Automatic installation (recommended) -import my_glue_dependencies.auto - -# Option 2: Manual installation -from my_glue_dependencies import load_wheels -load_wheels() - -# Your dependencies are now available -import pandas as pd -import numpy as np -``` - -## Troubleshooting - -### Error: "No Python executable found" -Ensure Python 3 is installed and available in your PATH. - -### Error: "Failed to resolve dependencies" -Check for conflicting version requirements in your requirements.txt. Ensure all packages are available for your target platform. - -### Error: "Unsupported glue version" -Verify you're using a supported Glue version (0.9, 1.0, 2.0, 3.0, 4.0, 5.0). - -### Warning: "Package name contains dashes" -The script automatically converts dashes to underscores for Python compatibility. - -## Limitations - -- Must be run in an environment compatible with your target Glue version -- Requires network access to download dependencies -- Large wheels may impact Glue job startup time -- Native dependencies must be compatible with the target Glue environment \ No newline at end of file diff --git a/utilities/glue_python_dependency_packager/wheel_packager.sh b/utilities/glue_python_dependency_packager/wheel_packager.sh deleted file mode 100755 index 2778291..0000000 --- a/utilities/glue_python_dependency_packager/wheel_packager.sh +++ /dev/null @@ -1,335 +0,0 @@ -#!/bin/bash -set -e -REQUIREMENTS_FILE="requirements.txt" -FINAL_WHEEL_OUTPUT_DIRECTORY="." -PACKAGE_NAME=$(basename "$(pwd)") -PACKAGE_VERSION="0.1.0" -# Help message -show_help() { - echo "Usage: $0 [options]" - echo "" - echo "Options:" - echo " -r, --requirements FILE Path to requirements.txt file (default: requirements.txt)" - echo " -o, --wheel-output DIR Output directory for final wheel (default: current directory)" - echo " -n, --name NAME Package name (default: current directory name)" - echo " -v, --version VERSION Package version (default: 0.1.0)" - echo " -h, --help Show this help message" - echo " -g, --glue-version Glue version (required)" - echo "" - echo "Example:" - echo " $0 -r custom-requirements.txt -o dist -n my_package -v 1.2.3 -g 4.0" -} -# Parse command line arguments -while [[ $# -gt 0 ]]; do - key="$1" - case $key in - -r | --requirements) - REQUIREMENTS_FILE="$2" - shift 2 - ;; - -o | --wheel-output) - FINAL_WHEEL_OUTPUT_DIRECTORY="$2" - shift 2 - ;; - -n | --name) - PACKAGE_NAME="$2" - shift 2 - ;; - -v | --version) - PACKAGE_VERSION="$2" - shift 2 - ;; - -g | --glue-version) - GLUE_VERSION="$2" - shift 2 - ;; - -h | --help) - show_help - exit 0 - ;; - *) - echo "Unknown option: $1" - show_help - exit 1 - ;; - esac -done -# If package name has dashes, convert to underscores and notify user. We need to check this since we cant import a package with dashes. -if [[ "$PACKAGE_NAME" =~ "-" ]]; then - echo "Warning: Package name '$PACKAGE_NAME' contains dashes. Converting to underscores." - PACKAGE_NAME=$(echo "$PACKAGE_NAME" | tr '-' '_') -fi -UBER_WHEEL_NAME="${PACKAGE_NAME}-${PACKAGE_VERSION}-py3-none-any.whl" -# Check if glue version is provided -if [ -z "$GLUE_VERSION" ]; then - echo "Error: Glue version is required." - exit 1 -fi -# Validate version format (basic check) -if [[ ! "$PACKAGE_VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]] && [[ ! "$PACKAGE_VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then - echo "Warning: Version '$PACKAGE_VERSION' doesn't follow semantic versioning (x.y.z or x.y)" -fi -# Check if requirements file exists -if [ ! -f "$REQUIREMENTS_FILE" ]; then - echo "Error: Requirements file '$REQUIREMENTS_FILE' not found." - exit 1 -fi -# Get relevant platform tags/python versions based on glue version -if [[ "$GLUE_VERSION" == "5.0" ]]; then - PYTHON_VERSION="3.11" - GLIBC_VERSION="2.34" -elif [[ "$GLUE_VERSION" == "4.0" ]]; then - PYTHON_VERSION="3.10" - GLIBC_VERSION="2.26" -elif [[ "$GLUE_VERSION" == "3.0" ]]; then - PYTHON_VERSION="3.7" - GLIBC_VERSION="2.26" -elif [[ "$GLUE_VERSION" == "2.0" ]]; then - PYTHON_VERSION="3.7" - GLIBC_VERSION="2.17" -elif [[ "$GLUE_VERSION" == "1.0" ]]; then - PYTHON_VERSION="3.6" - GLIBC_VERSION="2.17" -elif [[ "$GLUE_VERSION" == "0.9" ]]; then - PYTHON_VERSION="2.7" - GLIBC_VERSION="2.17" -else - echo "Error: Unsupported glue version '$GLUE_VERSION'." - exit 1 -fi -echo "Using Glue version $GLUE_VERSION" -echo "Using Glue python version $PYTHON_VERSION" -echo "Using Glue glibc version $GLIBC_VERSION" -PIP_PLATFORM_FLAG="" -is_glibc_compatible() { - # assumes glibc version in the form of major.minor (ex: 2.17) - # glue glibc must be >= platform glibc - local glue_glibc_version="$GLIBC_VERSION" - local platform_glibc_version="$1" - # 2.27 (platform) can run on 2.27 (glue) - if [[ "$platform_glibc_version" == "$glue_glibc_version" ]]; then - return 0 - fi - local glue_glibc_major="${glue_glibc_version%%.*}" - local glue_glibc_minor="${glue_glibc_version#*.}" - local platform_glibc_major="${platform_glibc_version%%.*}" - local platform_glibc_minor="${platform_glibc_version#*.}" - # 3.27 (platform) cannot run on 2.27 (glue) - if [[ "$platform_glibc_major" -gt "$glue_glibc_major" ]]; then - return 1 - fi - # 2.34 (platform) cannot run on 2.27 (glue) - if [[ "$platform_glibc_major" -eq "$glue_glibc_major" ]] && [[ "$platform_glibc_minor" -gt "$glue_glibc_minor" ]]; then - return 1 - fi - # 2.17 (platform) can run on 2.27 (glue) - return 0 -} -PIP_PLATFORM_FLAG="" -if is_glibc_compatible "2.17"; then - PIP_PLATFORM_FLAG="${PIP_PLATFORM_FLAG} --platform manylinux2014_x86_64" -fi -if is_glibc_compatible "2.28"; then - PIP_PLATFORM_FLAG="${PIP_PLATFORM_FLAG} --platform manylinux_2_28_x86_64" -fi -if is_glibc_compatible "2.34"; then - PIP_PLATFORM_FLAG="${PIP_PLATFORM_FLAG} --platform manylinux_2_34_x86_64" -fi -if is_glibc_compatible "2.39"; then - PIP_PLATFORM_FLAG="${PIP_PLATFORM_FLAG} --platform manylinux_2_39_x86_64" -fi -echo "Using pip platform flags: $PIP_PLATFORM_FLAG" -# Convert to absolute paths -REQUIREMENTS_FILE=$(realpath "$REQUIREMENTS_FILE") -FINAL_WHEEL_OUTPUT_DIRECTORY=$(realpath "$FINAL_WHEEL_OUTPUT_DIRECTORY") -TEMP_WORKING_DIR=$(mktemp -d) -VENV_DIR="${TEMP_WORKING_DIR}/.build_venv" -WHEEL_OUTPUT_DIRECTORY="${TEMP_WORKING_DIR}/wheelhouse" -# Cleanup function -cleanup() { - echo "Cleaning up temporary files..." - rm -rf "$TEMP_WORKING_DIR" -} -trap cleanup EXIT -echo "=========================================" -echo "Building wheel for $PACKAGE_NAME with all dependencies from $REQUIREMENTS_FILE" -echo "=========================================" -# Determine Python executable to use consistently -PYTHON_EXEC=$(which python3 2>/dev/null || which python 2>/dev/null) -if [ -z "$PYTHON_EXEC" ]; then - echo "Error: No Python executable found" - exit 1 -fi -echo "Using Python: $PYTHON_EXEC" -echo "" -# Install build requirements -echo "Step 1/5: Installing build tools..." -echo "----------------------------------------" -"$PYTHON_EXEC" -m pip install --upgrade pip build wheel setuptools -echo "✓ Build tools installed successfully" -echo "" -# Create a virtual environment for building -echo "Step 2/5: Creating build environment..." -echo "----------------------------------------" -"$PYTHON_EXEC" -m venv "$VENV_DIR" -# Check if virtual environment was created successfully -if [ ! -f "$VENV_DIR/bin/activate" ]; then - echo "Error: Failed to create virtual environment" - exit 1 -fi -source "$VENV_DIR/bin/activate" -# Install pip-tools for dependency resolution -"$VENV_DIR/bin/pip" install pip-tools -echo "✓ Build environment created successfully" -echo "" -# Compile requirements to get all transitive dependencies -GLUE_PIP_ARGS="$PIP_PLATFORM_FLAG --python-version $PYTHON_VERSION --only-binary=:all:" -echo "Step 3/5: Resolving all dependencies..." -echo "----------------------------------------" -if ! "$VENV_DIR/bin/pip-compile" --pip-args "$GLUE_PIP_ARGS" --no-emit-index-url --output-file "$TEMP_WORKING_DIR/.compiled_requirements.txt" "$REQUIREMENTS_FILE"; then - echo "Error: Failed to resolve dependencies. Check for conflicts in $REQUIREMENTS_FILE" - exit 1 -fi -echo "✓ Dependencies resolved successfully" -echo "" -# Download all wheels for dependencies -echo "Step 4/5: Downloading all dependency wheels..." -echo "----------------------------------------" -"$VENV_DIR/bin/pip" download -r "$TEMP_WORKING_DIR/.compiled_requirements.txt" -d "$WHEEL_OUTPUT_DIRECTORY" $GLUE_PIP_ARGS -# Check if any wheels were downloaded -if [ ! "$(ls -A "$WHEEL_OUTPUT_DIRECTORY")" ]; then - echo "Error: No wheels were downloaded. Check your requirements file." - exit 1 -fi -# Count downloaded wheels (using find instead of ls for better handling) -WHEEL_COUNT=$(find "$WHEEL_OUTPUT_DIRECTORY" -name "*.whl" -type f | wc -l | tr -d ' ') -echo "✓ Downloaded $WHEEL_COUNT dependency wheels successfully" -echo "" -# Create a single uber wheel with all dependencies -echo "Step 5/5: Creating uber wheel with all dependencies included..." -echo "----------------------------------------" -# Create a temporary directory for the uber wheel -UBER_WHEEL_DIR="$TEMP_WORKING_DIR/uber" -mkdir -p "$UBER_WHEEL_DIR" -# Create the setup.py file with custom install command -cat >"$UBER_WHEEL_DIR/setup.py" <"$UBER_WHEEL_DIR/MANIFEST.in" <"$UBER_WHEEL_DIR/${PACKAGE_NAME}/__init__.py" <"$UBER_WHEEL_DIR/${PACKAGE_NAME}/auto.py" <