Skip to content

Conversation

@vkuma17
Copy link
Contributor

@vkuma17 vkuma17 commented Sep 20, 2025

Description

Added Addon test and few virtual variables have been exposed in ibm_catalog.json for openshift cluster

Release required?

  • No release
  • Patch release (x.x.X)
  • Minor release (x.X.x)
  • Major release (X.x.x)
Release notes content

Following variable has been renamed in DA

ocp_version -> openshift_version

Run the pipeline

If the CI pipeline doesn't run when you create the PR, the PR requires a user with GitHub collaborators access to run the pipeline.

Run the CI pipeline when the PR is ready for review and you expect tests to pass. Add a comment to the PR with the following text:

/run pipeline

Checklist for reviewers

  • If relevant, a test for the change is included or updated with this PR.
  • If relevant, documentation for the change is included or updated with this PR.

For mergers

  • Use a conventional commit message to set the release level. Follow the guidelines.
  • Include information that users need to know about the PR in the commit message. The commit message becomes part of the GitHub release notes.
  • Use the Squash and merge option.

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 20, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 20, 2025

/run pipeline

1 similar comment
@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 20, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 20, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 21, 2025

Build has failed 3 times so far. 2 times it failed because I set allow_outbound_traffic to false in ibm_catalog.json.. I think it has to be true as for reaching the redhat marketplace outbound traffic must be allowed. One time it has failed because of timeout building the cluster. Re-running the pipeline again.

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 21, 2025

/run pipeline

2 similar comments
@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 22, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 22, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 22, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Sep 22, 2025

/run pipeline

Copy link
Contributor

@ocofaigh ocofaigh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update to use OCP DA v3.63.1

@ocofaigh
Copy link
Contributor

ocofaigh commented Oct 9, 2025

Bump to latest test wrapper version too

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 9, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 10, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 10, 2025

cluster is built but it stays in warning state, checked with valerio and he created a debug pod from which any external urls are not accessible, although public gateways are attached and outbound traffic is allowed. Debugging on the issue currently.

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 23, 2025

/run pipeline

1 similar comment
@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 23, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 23, 2025

network_acls were causing the issue. Currently 443,80 and (30000-32767) outbound/inbound was allowed but still when i tried to start an nginx pod it did not come up because image could not be pulled from dockerhub. Allowing all inbound and outbound fixed the problem. But we need to check which other ports should be allowed instead of everything.

Later i faced another issue while virtualization module was running and i have asked Aashiq if he has seen the error before. I saw it twice.
cc: @ocofaigh

2025/10/23 19:11:06 Terraform apply | 1 error occurred:
 2025/10/23 19:11:06 Terraform apply | 	* Internal error occurred: failed calling webhook
 2025/10/23 19:11:06 Terraform apply | "mutate-hyperconverged-hco.kubevirt.io": failed to call webhook: Post
 2025/10/23 19:11:06 Terraform apply | "https://hco-webhook-service.openshift-cnv.svc:4343/mutate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=10s":
 2025/10/23 19:11:06 Terraform apply | tls: failed to verify certificate: x509: certificate signed by unknown
 2025/10/23 19:11:06 Terraform apply | authority
 2025/10/23 19:11:06 Terraform apply | 

cluster is built but it stays in warning state, checked with valerio and he created a debug pod from which any external urls are not accessible, although public gateways are attached and outbound traffic is allowed. Debugging on the issue currently.

@ocofaigh ocofaigh mentioned this pull request Oct 24, 2025
6 tasks
Copy link
Contributor

@ocofaigh ocofaigh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuma17 FYI, Aashiq created a PR to update the network ACLs -> #56
However I see you are making them wide open in this PR. Is that what we want to do here? Or should we use the changes he made in that PR?

@ocofaigh
Copy link
Contributor

@vkuma17 Aashiq has confirmed we should use what was in his PR - can you update this PR with those changes please?

@ocofaigh
Copy link
Contributor

FYI, the last test failed with:

 2025/10/23 19:11:06 Terraform apply | Warning: Helm release created with warnings
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply |   with module.virtualization.helm_release.operator,
 2025/10/23 19:11:06 Terraform apply |   on ../../main.tf line 107, in resource "helm_release" "operator":
 2025/10/23 19:11:06 Terraform apply |  107: resource "helm_release" "operator" {
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | Helm release "d3t6ep2f0cctkv3hsqvg-operator" was created but has a failed
 2025/10/23 19:11:06 Terraform apply | status. Use the `helm` command to investigate the error, correct it, then run
 2025/10/23 19:11:06 Terraform apply | Terraform again.
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | Error: Helm release error
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply |   with module.virtualization.helm_release.operator,
 2025/10/23 19:11:06 Terraform apply |   on ../../main.tf line 107, in resource "helm_release" "operator":
 2025/10/23 19:11:06 Terraform apply |  107: resource "helm_release" "operator" {
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | 1 error occurred:
 2025/10/23 19:11:06 Terraform apply | 	* Internal error occurred: failed calling webhook
 2025/10/23 19:11:06 Terraform apply | "mutate-hyperconverged-hco.kubevirt.io": failed to call webhook: Post
 2025/10/23 19:11:06 Terraform apply | "https://hco-webhook-service.openshift-cnv.svc:4343/mutate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=10s":
 2025/10/23 19:11:06 Terraform apply | tls: failed to verify certificate: x509: certificate signed by unknown
 2025/10/23 19:11:06 Terraform apply | authority

Lets make sure to leave a comment in this PR any time the test fails so we understand what issues we have been seeing

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 24, 2025

FYI, the last test failed with:

 2025/10/23 19:11:06 Terraform apply | Warning: Helm release created with warnings
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply |   with module.virtualization.helm_release.operator,
 2025/10/23 19:11:06 Terraform apply |   on ../../main.tf line 107, in resource "helm_release" "operator":
 2025/10/23 19:11:06 Terraform apply |  107: resource "helm_release" "operator" {
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | Helm release "d3t6ep2f0cctkv3hsqvg-operator" was created but has a failed
 2025/10/23 19:11:06 Terraform apply | status. Use the `helm` command to investigate the error, correct it, then run
 2025/10/23 19:11:06 Terraform apply | Terraform again.
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | Error: Helm release error
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply |   with module.virtualization.helm_release.operator,
 2025/10/23 19:11:06 Terraform apply |   on ../../main.tf line 107, in resource "helm_release" "operator":
 2025/10/23 19:11:06 Terraform apply |  107: resource "helm_release" "operator" {
 2025/10/23 19:11:06 Terraform apply | 
 2025/10/23 19:11:06 Terraform apply | 1 error occurred:
 2025/10/23 19:11:06 Terraform apply | 	* Internal error occurred: failed calling webhook
 2025/10/23 19:11:06 Terraform apply | "mutate-hyperconverged-hco.kubevirt.io": failed to call webhook: Post
 2025/10/23 19:11:06 Terraform apply | "https://hco-webhook-service.openshift-cnv.svc:4343/mutate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=10s":
 2025/10/23 19:11:06 Terraform apply | tls: failed to verify certificate: x509: certificate signed by unknown
 2025/10/23 19:11:06 Terraform apply | authority

Lets make sure to leave a comment in this PR any time the test fails so we understand what issues we have been seeing

I think you missed my above comment @ocofaigh . I had shared the same error
#47 (comment)

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 25, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 25, 2025

/run pipeline

1 similar comment
@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 25, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 25, 2025

2025-10-25T15:19:42.3351600Z          2025/10/25 15:19:05 Terraform init | request discovery document: Get
2025-10-25T15:19:42.3352099Z          2025/10/25 15:19:05 Terraform init | "https://registry.terraform.io/.well-known/terraform.json": net/http: TLS
2025-10-25T15:19:42.3352578Z          2025/10/25 15:19:05 Terraform init | handshake timeout

last PR failed on this issue during consistency plan.. addon test passed

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 26, 2025

2025/10/23 19:11:06 Terraform apply | 1 error occurred:
 2025/10/23 19:11:06 Terraform apply | 	* Internal error occurred: failed calling webhook
 2025/10/23 19:11:06 Terraform apply | "mutate-hyperconverged-hco.kubevirt.io": failed to call webhook: Post
 2025/10/23 19:11:06 Terraform apply | "https://hco-webhook-service.openshift-cnv.svc:4343/mutate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=10s":
 2025/10/23 19:11:06 Terraform apply | tls: failed to verify certificate: x509: certificate signed by unknown
 2025/10/23 19:11:06 Terraform apply | authority
 2025/10/23 19:11:06 Terraform apply | 

This error is occuring because hco-webhook deployment that validates the HyperConverged custom resource contents is not ready in due time. I am increasing wait time between subscription helm chart and CR creation in operator helm chart to 240 seconds from 120 seconds.

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 28, 2025

/run pipeline

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 28, 2025

Upgrade test would fail because we are increasing the delay in time_sleep. Skipping the upgrade test, this is not a breaking change.

         2025/10/28 02:07:23 Terraform plan |   # module.virtualization.time_sleep.wait_for_subscription will be updated in-place
         2025/10/28 02:07:23 Terraform plan |   ~ resource "time_sleep" "wait_for_subscription" {
         2025/10/28 02:07:23 Terraform plan |       ~ create_duration = "120s" -> "240s"
         2025/10/28 02:07:23 Terraform plan |         id              = "2025-10-28T02:01:33Z"
         2025/10/28 02:07:23 Terraform plan |     }
         2025/10/28 02:07:23 Terraform plan | 
         2025/10/28 02:07:23 Terraform plan | Plan: 0 to add, 1 to change, 0 to destroy.

@vkuma17
Copy link
Contributor Author

vkuma17 commented Oct 28, 2025

/run pipeline

@ocofaigh ocofaigh merged commit 8b05cd5 into main Oct 28, 2025
2 checks passed
@ocofaigh ocofaigh deleted the addon-tests branch October 28, 2025 10:10
@terraform-ibm-modules-ops
Copy link
Contributor

🎉 This PR is included in version 1.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants