-
-
Notifications
You must be signed in to change notification settings - Fork 349
Description
Describe the bug
When instantiating the autoscaler module multiple times in the same root (e.g., arm64/amd64/*-dind variants), tofu plan intermittently reports a Lambda code change even though the Python source (lambda_function.py) has not changed. The plan shows a flip in source_code_hash and then forces a new Lambda version and a replacement of aws_lambda_permission (due to the updated qualifier).
This only happens with parallel planning/applying. Running with -parallelism=1 makes the issue disappear, suggesting a race condition between module instances that share the same ZIP output path produced by data.archive_file.
To Reproduce
- Create a root module that instantiates the autoscaler module multiple times (e.g., arm64, amd64, arm64-dind, amd64-dind), all of which include the terminate_agent_hook submodule and package the same lambda_function.py.
- Ensure default parallelism (do not pass -parallelism=1).
- Run:
tofu init
tofu plan
- Intermittently observe a plan similar to:
OpenTofu will perform the following actions:
# module.linux-small-arm64-test.module.autoscaler.module.terminate_agent_hook.aws_lambda_function.terminate_runner_instances will be updated in-place
~ resource "aws_lambda_function" "terminate_runner_instances" {
id = "linux-small-arm64-test-terminate-instances"
~ last_modified = "2025-10-01T07:51:40.000+0000" -> (known after apply)
~ qualified_arn = "arn:aws:lambda:eu-central-1:285552316323:function:linux-small-arm64-test-terminate-instances:13" -> (known after apply)
~ qualified_invoke_arn = "arn:aws:apigateway:eu-central-1:lambda:path/2015-03-31/functions/arn:aws:lambda:eu-central-1:285552316323:function:linux-small-arm64-test-terminate-instances:13/invocations" -> (known after apply)
~ source_code_hash = "3rwZzxv0oni04lbl0sUXnhbxOgiOKCjW1whJoXQUjX8=" -> "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
tags = {
"Environment" = "linux-small-arm64-test"
"Name" = "linux-small-arm64-test"
}
~ version = "13" -> (known after apply)
# (20 unchanged attributes hidden)
# (5 unchanged blocks hidden)
}
# module.linux-small-arm64-test.module.autoscaler.module.terminate_agent_hook.aws_lambda_permission.current_version_triggers must be replaced
-/+ resource "aws_lambda_permission" "current_version_triggers" {
~ id = "TerminateInstanceEvent" -> (known after apply)
~ qualifier = "13" # forces replacement -> (known after apply) # forces replacement
+ statement_id_prefix = (known after apply)
# (6 unchanged attributes hidden)
}
Plan: 1 to add, 1 to change, 1 to destroy.
- Re-run with:
tofu plan -parallelism=1
No changes are reported across repeated runs.
Expected behavior
- No changes when Lambda source code hasn’t changed.
- source_code_hash remains stable across plans.
- No unnecessary new Lambda version.
Additional context
Evidence & observations
- The submodule uses a data.archive_file with an output_path like:
builds/lambda_function_${local.source_sha256}.zip
Because the Python source is identical across instances, ${local.source_sha256} is identical, so all instances target the same ZIP file.
- During tofu plan, the ZIP at that path is written by archive_file. With parallel runs, multiple instances can write/read the same file concurrently, leading to intermittent empty/partial reads by one instance while another is writing/truncating.
- The source_code_hash sometimes flips to 47DEQpj8… — which is the Base64 SHA-256 of an empty string — consistent with reading a truncated/empty file.
- Outside of the race, the ZIP file’s content is stable (shasum -a 256 unchanged).
Workarounds tried
- tofu plan/apply -parallelism=1: eliminates the issue but disables parallelism globally.
- Normalizing mtimes of source files: ZIP bytes remain stable but the race persists (root cause is shared file path).
- Sorting layer_arns: good hygiene but unrelated; issue persists.
- We cannot modify the module source ourselves to change the output path.
Hypothesis
A race condition among multiple data.archive_file data sources that share the same output_path. Without a per-instance-unique path, parallel planning causes TOCTOU issues and spurious Lambda code updates.
Requested fix / proposal
Please make the ZIP output path unique per module instance (include the environment in the path) to avoid shared-file races. For example:
output_path = "${path.module}/builds/${var.name}/lambda_function_${local.source_sha256}.zip"
Why this matters
Spurious versioning and permission replacements add noise to plans, increase deployment time, and may briefly disrupt triggers/monitoring. A per-instance output path would eliminate the race without forcing users to globally disable parallelism.