Skip to content

Conversation

@AlexKehayov
Copy link
Contributor

@AlexKehayov AlexKehayov commented Oct 16, 2025

Description:

  • Added pipelineOperationTimeout config parameter (default: 3s) to detect unresponsive block nodes during gRPC pipeline operations
  • Implemented timeout detection for onNext() operations when sending block items - triggers stream failure and retry on timeout
  • Implemented timeout detection for onComplete() operations when closing the stream - logs timeout but allows graceful close
  • Timeout tasks are cancelled if operations complete successfully
  • Added conn_pipelineOperationTimeout counter to track timeouts across both operations
  • Updated Grafana dashboard with panels for the new timeout metric
  • Updated unit tests to verify timeout behavior
  • Updated documentation with timeout details

Closes #21605

Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
@AlexKehayov AlexKehayov requested review from a team as code owners October 16, 2025 11:59
@AlexKehayov AlexKehayov self-assigned this Oct 16, 2025
@lfdt-bot
Copy link

lfdt-bot commented Oct 16, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 91.54930% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...app/blocks/impl/streaming/BlockNodeConnection.java 93.84% 2 Missing and 2 partials ⚠️
...om/hedera/node/app/metrics/BlockStreamMetrics.java 60.00% 2 Missing ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #21663      +/-   ##
============================================
+ Coverage     70.62%   70.66%   +0.03%     
- Complexity    24353    24367      +14     
============================================
  Files          2673     2673              
  Lines        104202   104262      +60     
  Branches      10935    10937       +2     
============================================
+ Hits          73591    73672      +81     
+ Misses        26567    26547      -20     
+ Partials       4044     4043       -1     
Files with missing lines Coverage Δ Complexity Δ
...cks/impl/streaming/BlockNodeConnectionManager.java 91.86% <100.00%> (+0.01%) 74.00 <0.00> (ø)
...ra/node/config/data/BlockNodeConnectionConfig.java 100.00% <ø> (ø) 1.00 <0.00> (ø)
...om/hedera/node/app/metrics/BlockStreamMetrics.java 72.24% <60.00%> (+0.62%) 29.00 <0.00> (+1.00)
...app/blocks/impl/streaming/BlockNodeConnection.java 90.72% <93.84%> (+<0.01%) 78.00 <2.00> (+2.00)

... and 12 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Conflicts:
#	hedera-node/hedera-app/src/main/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnection.java
#	hedera-node/hedera-app/src/test/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnectionTest.java
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
@AlexKehayov AlexKehayov force-pushed the 21605-timeout-onnext-oncomplete branch from 565a49d to 582fc3b Compare October 20, 2025 14:53
petreze
petreze previously approved these changes Oct 20, 2025
@petreze petreze mentioned this pull request Oct 20, 2025
2 tasks
@petreze
Copy link
Contributor

petreze commented Oct 20, 2025

A mirror PR is created for this work

…eads

Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
@derektriley
Copy link
Contributor

We should also have timeouts around creating the PbjGrpcClient and pipeline

 blockStreamPublishServiceClient = createNewGrpcClient();
            final Pipeline<? super PublishStreamRequest> pipeline =
                    blockStreamPublishServiceClient.publishBlockStream(this);

Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
# Conflicts:
#	hedera-node/docs/design/app/blocks/BlockNodeConnection.md
#	hedera-node/hedera-app/src/main/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnection.java
#	hedera-node/hedera-app/src/test/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnectionTest.java
#	hedera-node/hedera-config/src/main/java/com/hedera/node/config/data/BlockNodeConnectionConfig.java
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
…hanged default pipeline timeout to 3s

Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
# Conflicts:
#	hedera-node/hedera-app/src/main/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnection.java
#	hedera-node/hedera-app/src/test/java/com/hedera/node/app/blocks/impl/streaming/BlockNodeConnectionTest.java
Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
@derektriley
Copy link
Contributor

Can we please get the rolling log file renamed from blocknode-comms.log to block-node-comms.log

Signed-off-by: Alex Kehayov <aleks.kehayov@limechain.tech>
Copy link
Contributor

@petreze petreze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks are passing on the mirror PR

Copy link
Contributor

@lukelee-sl lukelee-sl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Seems unrelated to SC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A Timeout for sending PublishStreamRequest and completing pipeline

8 participants