Skip to content

Refactor image build operations to use Dockerfile-based builds instead of Dagger #63378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jul 18, 2025

Refactor image build operations to use Dockerfile-based builds while maintaining backward compatibility

Summary

This PR refactors the image build operations in airbyte-ci/connectors/base_images and airbyte-ci/connectors/pipelines to use Dockerfile-based builds instead of custom Dagger code, while preserving all original function signatures and type contracts.

Key Changes:

  • Base Images: Modified AirbytePythonConnectorBaseImage and AirbyteJavaConnectorBaseImage to use dagger_client.container().build() with Dockerfiles from the docker-images/ directory
  • Build Steps: Updated all connector build step classes to maintain dagger.Container return types while using Dockerfile-based builds internally
  • Backward Compatibility: Restored dagger_client parameter to base class constructors and preserved all original method signatures
  • Type Safety: Fixed import issues with Dagger types using a simplified approach without explicit type annotations

Implementation Approach:
Instead of completely removing Dagger, this uses Dagger's native container().build(context=workspace, dockerfile="Dockerfile") method to build from Dockerfiles while wrapping results in dagger.Container objects to maintain API compatibility.

Review & Testing Checklist for Human

⚠️ HIGH RISK CHANGES - Please verify the following critical items:

  • End-to-end build testing: Test actual Python and Java connector builds to ensure they work with the new Dockerfile-based approach
  • Requirements alignment: Verify that keeping Dagger's build() method aligns with your intended direction (vs. pure Docker subprocess calls)
  • Dockerfile dependencies: Confirm that docker-images/Dockerfile.python-connector-base and docker-images/Dockerfile.java-connector-base exist and are properly structured
  • CI regression testing: Monitor CI results carefully for any build failures or type checking issues
  • Import validation: Check that Dagger imports work correctly across different environments (the type annotation issues suggest potential SDK compatibility problems)

Recommended Test Plan:

  1. Run airbyte-ci connectors build on a sample Python connector (e.g., source-faker)
  2. Run airbyte-ci connectors build on a sample Java connector
  3. Verify base image builds complete successfully
  4. Test both local and CI environments

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph docker_images["docker-images/"]
        dockerfile_python["Dockerfile.python-connector-base"]:::context
        dockerfile_java["Dockerfile.java-connector-base"]:::context
    end
    
    subgraph base_images["base_images/"]
        bases_py["bases.py<br/>(restored dagger_client param)"]:::major-edit
        python_bases["python/bases.py<br/>(uses dagger build())"]:::major-edit
        java_bases["java/bases.py<br/>(uses dagger build())"]:::major-edit
    end
    
    subgraph build_steps["build_image/steps/"]
        common_py["common.py<br/>(restored signatures)"]:::major-edit
        python_connectors["python_connectors.py<br/>(maintained compatibility)"]:::major-edit
        java_connectors["java_connectors.py<br/>(maintained compatibility)"]:::major-edit
        manifest_connectors["manifest_only_connectors.py<br/>(maintained compatibility)"]:::major-edit
    end
    
    python_bases -.->|"builds from"| dockerfile_python
    java_bases -.->|"builds from"| dockerfile_java
    
    python_connectors -->|"uses"| python_bases
    java_connectors -->|"uses"| java_bases
    manifest_connectors -->|"extends"| common_py
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Session Details: Requested by @aaronsteers, session available at https://app.devin.ai/sessions/6eca743d0d894bfaad558b9cb11eb288
  • Type Annotation Issues: Had to remove explicit dagger.Platform type annotations due to import errors, which may indicate Dagger SDK compatibility issues worth investigating
  • Testing Limitations: Could not run full local type checking due to missing mypy in environment, so runtime behavior needs human verification
  • Dockerfile Dependency: This approach assumes the Dockerfiles in docker-images/ are correctly structured and maintained
  • Alternative Approach: If the Dagger-based approach doesn't align with requirements, could implement pure subprocess Docker calls instead, but would require more significant API changes

…d of Dagger

- Replace all custom Dagger code with Docker CLI subprocess calls
- Update base image classes to build using Dockerfiles from docker-images directory
- Refactor Python, Java, and manifest-only connector build steps
- Remove build_customization.py support as requested
- Maintain existing function signatures for compatibility
- Simplify build process while preserving functionality

Changes:
- airbyte-ci/connectors/base_images/base_images/bases.py: Remove Dagger dependencies, update signatures
- airbyte-ci/connectors/base_images/base_images/python/bases.py: Use Docker build for Python base images
- airbyte-ci/connectors/base_images/base_images/java/bases.py: Use Docker build for Java base images
- airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image/steps/common.py: Replace Dagger with subprocess calls
- airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image/steps/python_connectors.py: Use Dockerfile.python-connector
- airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image/steps/java_connectors.py: Use Dockerfile.java-connector
- airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image/steps/manifest_only_connectors.py: Use Dockerfile.manifest-only-connector
- Remove build_customization.py entirely

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration devin-ai-integration bot requested a review from a team as a code owner July 18, 2025 18:08
Copy link
Contributor Author

Original prompt from AJ Steers:

@Devin - I want you to refactor the image build operations in airbyte-ci/connectors/base_images and airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image.

Reomve all custom Dagger code and replace with a simple Dockerfile-based image build, using the Dockerfile resources in the new `docker-images` directory in the root of the repo. Use Dagger if you must) to meet prior function signature promises.

Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

vercel bot commented Jul 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jul 18, 2025 7:24pm

Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
    • You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
    • Leaving the changelog arg blank will auto-populate the changelog from the PR title.
  • /run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).
  • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
  • /poe source example lock - Alias for /poe connector source-example lock.
  • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
  • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.

📝 Edit this welcome message.

We install some packages required to build java connectors.
We also download the datadog java agent jar and the javabase.sh script.
We set some env variables used by the javabase.sh script.
def get_container(self, platform: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to still return a dagger.Container instance for maintaining previous type contracts? Aka, can we wrap our built image in the Dagger class?

devin-ai-integration bot and others added 3 commits July 18, 2025 18:35
…() method

- Restore dagger.Platform parameter types and dagger.Container return types
- Use client.container().build() pattern for Dockerfile builds instead of subprocess
- Maintain original function signatures for backward compatibility
- Fix type annotations using TYPE_CHECKING pattern for better compatibility
- Update base image classes to use Dagger's native Dockerfile building

Co-Authored-By: AJ Steers <aj@airbyte.io>
- Remove explicit dagger.Platform type annotations that were causing import errors
- Maintain backward compatibility with original function signatures
- Use Dagger's native build() method for Dockerfile-based builds
- All methods now accept platform objects and return Container objects as expected

Co-Authored-By: AJ Steers <aj@airbyte.io>
…nnotations

- Remove unused subprocess, dagger, and pathlib.Path imports from build step files
- Add missing /usr/share/nltk_data directory with NLTK data to Python Dockerfile
- Add proper type annotations for dagger.Platform and dagger.Container
- Fix async/await issue in with_python_connector_installed call
- Ensures format check, test failures, and type check issues are resolved

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration devin-ai-integration bot temporarily deployed to ghcr.io/airbytehq/python-connector-base July 18, 2025 18:58 Inactive
devin-ai-integration bot and others added 2 commits July 18, 2025 19:15
- Add missing self type annotations to normalization.py methods
- Fix import formatting in base_images/__init__.py
- All ruff checks now pass for build step files
- Resolves remaining linting errors from CI

Co-Authored-By: AJ Steers <aj@airbyte.io>
…seImage

- Add missing get_container() and run_sanity_checks() methods
- Resolves mypy error about concrete class requirements
- Maintains consistency with base class contract

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration devin-ai-integration bot had a problem deploying to ghcr.io/airbytehq/python-connector-base July 18, 2025 19:19 Error
devin-ai-integration bot and others added 2 commits July 18, 2025 19:19
- Change PIP_CACHE_DIR from /pip_cache to /custom_cache/pip
- Aligns with expected test behavior and base image class expectations
- Part of Dockerfile-based build refactoring

Co-Authored-By: AJ Steers <aj@airbyte.io>
- Mount pip cache volume to /custom_cache/pip in get_container method
- Fixes test_pip_cache_volume test by ensuring cache volume is properly mounted
- Maintains backward compatibility with existing pip cache behavior

Co-Authored-By: AJ Steers <aj@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants