Skip to content

Parallel builds compete for archive_file -> unwanted lambda code updates (source_code_hash flips) force a new version #1333

@pireba

Description

@pireba

Describe the bug

When instantiating the autoscaler module multiple times in the same root (e.g., arm64/amd64/*-dind variants), tofu plan intermittently reports a Lambda code change even though the Python source (lambda_function.py) has not changed. The plan shows a flip in source_code_hash and then forces a new Lambda version and a replacement of aws_lambda_permission (due to the updated qualifier).

This only happens with parallel planning/applying. Running with -parallelism=1 makes the issue disappear, suggesting a race condition between module instances that share the same ZIP output path produced by data.archive_file.

To Reproduce

  1. Create a root module that instantiates the autoscaler module multiple times (e.g., arm64, amd64, arm64-dind, amd64-dind), all of which include the terminate_agent_hook submodule and package the same lambda_function.py.
  2. Ensure default parallelism (do not pass -parallelism=1).
  3. Run:
tofu init
tofu plan
  1. Intermittently observe a plan similar to:
OpenTofu will perform the following actions:

  # module.linux-small-arm64-test.module.autoscaler.module.terminate_agent_hook.aws_lambda_function.terminate_runner_instances will be updated in-place
  ~ resource "aws_lambda_function" "terminate_runner_instances" {
        id                             = "linux-small-arm64-test-terminate-instances"
      ~ last_modified                  = "2025-10-01T07:51:40.000+0000" -> (known after apply)
      ~ qualified_arn                  = "arn:aws:lambda:eu-central-1:285552316323:function:linux-small-arm64-test-terminate-instances:13" -> (known after apply)
      ~ qualified_invoke_arn           = "arn:aws:apigateway:eu-central-1:lambda:path/2015-03-31/functions/arn:aws:lambda:eu-central-1:285552316323:function:linux-small-arm64-test-terminate-instances:13/invocations" -> (known after apply)
      ~ source_code_hash               = "3rwZzxv0oni04lbl0sUXnhbxOgiOKCjW1whJoXQUjX8=" -> "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
        tags                           = {
            "Environment" = "linux-small-arm64-test"
            "Name"        = "linux-small-arm64-test"
        }
      ~ version                        = "13" -> (known after apply)
        # (20 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }

  # module.linux-small-arm64-test.module.autoscaler.module.terminate_agent_hook.aws_lambda_permission.current_version_triggers must be replaced
-/+ resource "aws_lambda_permission" "current_version_triggers" {
      ~ id                  = "TerminateInstanceEvent" -> (known after apply)
      ~ qualifier           = "13" # forces replacement -> (known after apply) # forces replacement
      + statement_id_prefix = (known after apply)
        # (6 unchanged attributes hidden)
    }

Plan: 1 to add, 1 to change, 1 to destroy.
  1. Re-run with:
tofu plan -parallelism=1

No changes are reported across repeated runs.

Expected behavior

  • No changes when Lambda source code hasn’t changed.
  • source_code_hash remains stable across plans.
  • No unnecessary new Lambda version.

Additional context

Evidence & observations

  • The submodule uses a data.archive_file with an output_path like:

builds/lambda_function_${local.source_sha256}.zip

Because the Python source is identical across instances, ${local.source_sha256} is identical, so all instances target the same ZIP file.

  • During tofu plan, the ZIP at that path is written by archive_file. With parallel runs, multiple instances can write/read the same file concurrently, leading to intermittent empty/partial reads by one instance while another is writing/truncating.
  • The source_code_hash sometimes flips to 47DEQpj8… — which is the Base64 SHA-256 of an empty string — consistent with reading a truncated/empty file.
  • Outside of the race, the ZIP file’s content is stable (shasum -a 256 unchanged).

Workarounds tried

  • tofu plan/apply -parallelism=1: eliminates the issue but disables parallelism globally.
  • Normalizing mtimes of source files: ZIP bytes remain stable but the race persists (root cause is shared file path).
  • Sorting layer_arns: good hygiene but unrelated; issue persists.
  • We cannot modify the module source ourselves to change the output path.

Hypothesis

A race condition among multiple data.archive_file data sources that share the same output_path. Without a per-instance-unique path, parallel planning causes TOCTOU issues and spurious Lambda code updates.

Requested fix / proposal

Please make the ZIP output path unique per module instance (include the environment in the path) to avoid shared-file races. For example:

output_path = "${path.module}/builds/${var.name}/lambda_function_${local.source_sha256}.zip"

Why this matters

Spurious versioning and permission replacements add noise to plans, increase deployment time, and may briefly disrupt triggers/monitoring. A per-instance output path would eliminate the race without forcing users to globally disable parallelism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions