Skip to content

Use ECS properties instead of container properties #6270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jorgee
Copy link
Contributor

@jorgee jorgee commented Jul 11, 2025

Closes #5096

Ports the @tom-seqera implementation in #5391 to AWS SDK v2 implementation.

With the RegisterJobDefinitionModel and ContainerPropertiesModel introduced by @pditommaso, these changes shouldn't affect the xpack as the interface it extends remains untouched.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Copy link

netlify bot commented Jul 11, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 06b3a07
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6889d5f01bf0580008cd631a

@pditommaso
Copy link
Member

Nice, let's test using [e2e prod] in the commit message

@pditommaso
Copy link
Member

@jorgee aws tests are failing

@jorgee
Copy link
Contributor Author

jorgee commented Jul 21, 2025

It seems the tests are failing because of the job definition reuse. The old definition is using containerProperties and the new one is trying to override with ecsProperties. If I am not wrong, the job definition is reused if container name and nf-token is the same, isn't it? So, I think we should include something else to the token to ensure both are using the same container or ecs properties.

jorgee added 2 commits July 21, 2025 13:36
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee
Copy link
Contributor Author

jorgee commented Jul 21, 2025

Token was basically hashing the result of containerProperties.toString. I have added a fake property ecsMode=true in the new version. If there is a job definition with same values but with the previous version, the token will be different and it will generate a new revision. Now, the job is failing because of CI user role permissions as in the other CI pipelines.

@pditommaso
Copy link
Member

Launching some e2e tests

@pditommaso
Copy link
Member

@jorgee would not this impact also the xpack plugin for amazon?

@jorgee
Copy link
Contributor Author

jorgee commented Jul 25, 2025

@jorgee would not this impact also the xpack plugin for amazon?

The xpack is overriding the configJobDefRequest method that is not changed. The changes are internal in the RegisterJobDefinitionModel.toBatchRequest method where it sets the ecs properties. But let me do a test to see if I miss something.

@jorgee
Copy link
Contributor Author

jorgee commented Jul 25, 2025

When testing with xpack I have got this error. Not sure yet if it is related to this change or happening also with the master.

java.lang.ClassCastException: class software.amazon.awssdk.services.batch.model.MountPoint cannot be cast to class software.amazon.awssdk.services.batch.model.MountPoint (software.amazon.awssdk.services.batch.model.MountPoint is in unnamed module of loader org.pf4j.PluginClassLoader @4ea17147; software.amazon.awssdk.services.batch.model.MountPoint is in unnamed module of loader org.pf4j.PluginClassLoader @9a20cbd)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)
        at software.amazon.awssdk.services.batch.model.MountPointsCopier.copy(MountPointsCopier.java:32)
        at software.amazon.awssdk.services.batch.model.ContainerProperties$BuilderImpl.mountPoints(ContainerProperties.java:2578)
        at nextflow.cloud.aws.batch.model.ContainerPropertiesModel.toBatchContainerProperties(ContainerPropertiesModel.groovy:254)

@jorgee
Copy link
Contributor Author

jorgee commented Jul 25, 2025

It was an xpack gradle problem. Working after applying the fix.

Comment on lines +251 to +252
taskArn = getTaskProperties(job)?.taskRoleArn()
?: job?.container()?.taskArn()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why now check first getTaskProperties(job) and then job?.container()?.taskArn() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it from Tom's PR #5391. I think it is needed because of the reuse of job definitions. If the job was defined with ECS, the information is inside TaskProperties. If it is reusing an old definition with container properties, the taskProperties are empty and you need to get in the Container. It is the same for other job properties.

Maybe, with the change I did to do not reuse jobs definitions with container properties, it is not needed (54ee92b). However, I am not 100% sure because the hash is only applied when container properties are defined, and there could be cases were old definitions are still reused. Maybe the best is just add a comment to explain it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to follow, if it's an improvement it should go into a separate PR and keep this as a mere SDK upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an improvement. When using ECS properties in the task definition (the goal of the PR), some properties that were obtained before from the container are now inside TaskProperties. This is why we check them first in getTaskProperties(job), it is the expected place with the new definition. We have a fallback to the old container properties because of the job definition reusage. If a job submission is reused with the old container properties, the task properties will be empty and we should get them from the container properties as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, triggered e2e tests

@jorgee
Copy link
Contributor Author

jorgee commented Jul 30, 2025

It seems that something is broken with the e2e test triggering condition. It has not triggered the e2e tests.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee
Copy link
Contributor Author

jorgee commented Jul 30, 2025

The problem was the git commit message was returning a string with escapes due to %q option in
echo "commit_message=$(printf "%q" "$COMMIT_MESSAGE")" | head -n 1 >> $GITHUB_OUTPUT
The logs of the get_commit_message show the escaped string:

GitHub event=pull_request
Commit message=\[e2e\ prod\]\ Merge\ branch\ \'master\'\ into\ migrate-ecs-properties

I have changed %q to %s to generate a plain string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS: Switch from containerProperties to ecsProperties
3 participants