Skip to content

Conversation

@ssulav
Copy link
Contributor

@ssulav ssulav commented Jun 18, 2025

What changes were proposed in this pull request?

HDDS-13251. Adding Byteman support in acceptance test suites via new docker-compose

Please describe your PR in detail:

  • This will help us to run acceptance test suites with Byteman faults.
  • Currently its added as new docker compose but if this goes through we can plan to add it in existing suites

What is the link to the Apache JIRA

HDDS-13251

How was this patch tested?

This has to be tested locally and also via CI

@ssulav ssulav requested review from Tejaskriya and adoroszlai June 18, 2025 11:59
@adoroszlai adoroszlai marked this pull request as draft June 18, 2025 12:13
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ssulav for working on this.

@ssulav ssulav marked this pull request as ready for review June 23, 2025 20:09
@ssulav
Copy link
Contributor Author

ssulav commented Jun 23, 2025

This PR is required to be merged before
apache/ozone-docker-runner#47

@adoroszlai adoroszlai marked this pull request as draft June 24, 2025 04:02
@adoroszlai
Copy link
Contributor

This PR is required to be merged before apache/ozone-docker-runner#47

I think the dependency is the other way around: we need the new image with bmsubmit before this PR.

  1. merge the ozone-runner change
  2. tag new image
  3. update ozone-runner version in this PR

Until then, let's keep it draft.

@ssulav ssulav marked this pull request as ready for review June 24, 2025 19:06
@ssulav ssulav marked this pull request as draft June 24, 2025 19:31
@ssulav ssulav changed the title HDDS-13251. Adding Byteman support in acceptance test suites via new docker-compose HDDS-13251. Adding Byteman support in acceptance test suites via new suite Jun 24, 2025
@ssulav ssulav changed the title HDDS-13251. Adding Byteman support in acceptance test suites via new suite HDDS-13251. Adding Byteman support via new acceptance test suites #ozone-fi Jun 24, 2025
@ssulav ssulav marked this pull request as ready for review June 24, 2025 19:58
@adoroszlai
Copy link
Contributor

Failure in Rclone Client Test is due to rclone version change in the new image. We need to fix it first (HDDS-13333).

@adoroszlai adoroszlai marked this pull request as draft June 25, 2025 07:27
@adoroszlai adoroszlai changed the title HDDS-13251. Adding Byteman support via new acceptance test suites #ozone-fi HDDS-13251. Support dynamic Byteman scripts via bmsubmit in ozonesecure-ha Jun 25, 2025
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ssulav for updating the patch. Mostly LGTM, but the test output is mixed with Robot's execution output. Two example test cases:

------------------------------------------------------------------------------
Inject Byteman Rule in one component                                  Add rule /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm for datanode1 successful.
List rules for datanode1 successful.
Active rules in datanode1:
# File /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm line 21
Remove rule /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm for datanode1 successful.
| PASS |
------------------------------------------------------------------------------
Inject Multiple Byteman Rules in one component                        Add rule /opt/hadoop/share/ozone/byteman/skip-put-block.btm for datanode1 successful.
Add rule /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm for datanode1 successful.
List rules for datanode1 successful.
Active rules in datanode1:
# File /opt/hadoop/share/ozone/byteman/skip-put-block.btm line 21
# File /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm line 21
List rules for datanode1 successful.
Active rules in datanode1:
# File /opt/hadoop/share/ozone/byteman/skip-put-block.btm line 21
# File /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm line 21
Remove rule /opt/hadoop/share/ozone/byteman/skip-put-block.btm for datanode1 successful.
Remove rule /opt/hadoop/share/ozone/byteman/skip-notify-group-remove.btm for datanode1 successful.
List rules for datanode1 successful.
Active rules in datanode1: No rules found
| PASS |
------------------------------------------------------------------------------

Compare this to other tests:

------------------------------------------------------------------------------
Create Encrypted Bucket                                               | PASS |
------------------------------------------------------------------------------
Create Key in Encrypted Bucket                                        | PASS |
------------------------------------------------------------------------------
Key Can Be Written                                                    | PASS |
------------------------------------------------------------------------------
Key Can Be Written To Bucket With Replication Type                    | PASS |
------------------------------------------------------------------------------

This can be fixed by changing log level from console to info, then it becomes:

==============================================================================
Byteman Faults Sample                                                         
==============================================================================
Print All Byteman Rules                                               | PASS |
------------------------------------------------------------------------------
Inject Byteman Rule in one component                                  | PASS |
------------------------------------------------------------------------------
Inject Multiple Byteman Rules in one component                        | PASS |
------------------------------------------------------------------------------
Test Datanode Only Fault Injection                                    | PASS |
------------------------------------------------------------------------------
Test OM Only Fault Injection                                          | PASS |
------------------------------------------------------------------------------
Test SCM Only Fault Injection                                         | PASS |
------------------------------------------------------------------------------
Byteman Faults Sample                                                 | PASS |
6 tests, 6 passed, 0 failed
==============================================================================

@ssulav
Copy link
Contributor Author

ssulav commented Jun 25, 2025

Updated logger

bash-5.1$ robot byteman_faults_sample.robot 
==============================================================================
Byteman Faults Sample                                                         
==============================================================================
Print All Byteman Rules                                               | PASS |
------------------------------------------------------------------------------
Inject Byteman Rule in one component                                  | PASS |
------------------------------------------------------------------------------
Inject Multiple Byteman Rules in one component                        | PASS |
------------------------------------------------------------------------------
Test Datanode Only Fault Injection                                    | PASS |
------------------------------------------------------------------------------
Test OM Only Fault Injection                                          | PASS |
------------------------------------------------------------------------------
Test SCM Only Fault Injection                                         | PASS |
------------------------------------------------------------------------------
Byteman Faults Sample                                                 | PASS |
6 tests, 6 passed, 0 failed
==============================================================================
Output:  /opt/hadoop/smoketest/ozone-fi/output.xml
Log:     /opt/hadoop/smoketest/ozone-fi/log.html
Report:  /opt/hadoop/smoketest/ozone-fi/report.html

@ssulav ssulav marked this pull request as ready for review June 25, 2025 16:43
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ssulav for iterating on this patch.

@ssulav ssulav merged commit d0d09aa into apache:master Jun 25, 2025
49 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2025
* master: (90 commits)
  HDDS-12468. Check for space availability for all dns while container creation in pipeline (apache#8663)
  HDDS-12151. Fail write when volume is full considering min free space (apache#8642)
  HDDS-13251. Update Byteman usage README license (apache#8700)
  HDDS-13251. Support dynamic Byteman scripts via bmsubmit in ozonesecure-ha (apache#8654)
  HDDS-12070. Bump Ratis to 3.2.0 (apache#8689)
  HDDS-12984. Use InodeID to identify the SST files inside the tarball. (apache#8477)
  HDDS-13319. Simplify KeyPrefixFilter (apache#8692)
  HDDS-13295. Remove jackson1 exclusions for hadoop-common (apache#8687)
  HDDS-13314. Remove unused maven-pdf-plugin (apache#8686)
  HDDS-13324. Optimize memory footprint for Recon listKeys API (apache#8680)
  HDDS-13240. Add newly added metrics into grafana dashboard. (apache#8656)
  HDDS-13309. Add keyIterator/valueIterator methods to Table. (apache#8675)
  HDDS-13325. Introduce OZONE_SERVER_OPTS for common options for server processes (apache#8685)
  HDDS-13318. Simplify the getRangeKVs methods in Table (apache#8683)
  HDDS-13322. Remove module auto-detection from flaky-test-check (apache#8679)
  HDDS-13289. Remove usage of Jetty StringUtil (apache#8684)
  HDDS-13270. Reduce getBucket API invocations in S3 bucket owner verification (apache#8653)
  HDDS-13288. Container checksum file proto changes to account for deleted blocks. (apache#8649)
  HDDS-13306. Intermittent failure in testDirectoryDeletingServiceIntervalReconfiguration (apache#8682)
  HDDS-13317. Table should support empty array/String (apache#8676)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2025
* master: (170 commits)
  HDDS-12468. Check for space availability for all dns while container creation in pipeline (apache#8663)
  HDDS-12151. Fail write when volume is full considering min free space (apache#8642)
  HDDS-13251. Update Byteman usage README license (apache#8700)
  HDDS-13251. Support dynamic Byteman scripts via bmsubmit in ozonesecure-ha (apache#8654)
  HDDS-12070. Bump Ratis to 3.2.0 (apache#8689)
  HDDS-12984. Use InodeID to identify the SST files inside the tarball. (apache#8477)
  HDDS-13319. Simplify KeyPrefixFilter (apache#8692)
  HDDS-13295. Remove jackson1 exclusions for hadoop-common (apache#8687)
  HDDS-13314. Remove unused maven-pdf-plugin (apache#8686)
  HDDS-13324. Optimize memory footprint for Recon listKeys API (apache#8680)
  HDDS-13240. Add newly added metrics into grafana dashboard. (apache#8656)
  HDDS-13309. Add keyIterator/valueIterator methods to Table. (apache#8675)
  HDDS-13325. Introduce OZONE_SERVER_OPTS for common options for server processes (apache#8685)
  HDDS-13318. Simplify the getRangeKVs methods in Table (apache#8683)
  HDDS-13322. Remove module auto-detection from flaky-test-check (apache#8679)
  HDDS-13289. Remove usage of Jetty StringUtil (apache#8684)
  HDDS-13270. Reduce getBucket API invocations in S3 bucket owner verification (apache#8653)
  HDDS-13288. Container checksum file proto changes to account for deleted blocks. (apache#8649)
  HDDS-13306. Intermittent failure in testDirectoryDeletingServiceIntervalReconfiguration (apache#8682)
  HDDS-13317. Table should support empty array/String (apache#8676)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants