Skip to content

Conversation

@nicktindall
Copy link
Contributor

My previous attempt at logging more info on NOT_PREFERRED decisions that are acted on missed that we discard the explanation unless debugDecision() is turned on.

This PR means we re-run decideMove() with debugDecision() turned on when the .not_preferred logger is turned on and we're about to act on a NOT_PREFERRED decision. I think this will be generally useful as we roll out more deciders capable of returning NOT_PREFERRED.

@nicktindall nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Nov 19, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0 labels Nov 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@nicktindall nicktindall changed the title Re-run decideMove if debug logging is turned on to populate explanation Make sure explanation is populated when .not_preferred logging is turned on Nov 19, 2025
Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Left some superficial suggestions.

Comment on lines +906 to +908
if (notPreferredLogger.isDebugEnabled()) {
allocation.setDebugMode(RoutingAllocation.DebugMode.EXCLUDE_YES_DECISIONS);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (notPreferredLogger.isDebugEnabled()) {
allocation.setDebugMode(RoutingAllocation.DebugMode.EXCLUDE_YES_DECISIONS);
}
allocation.setDebugMode(notPreferredLogger.isDebugEnabled() ? RoutingAllocation.DebugMode.EXCLUDE_YES_DECISIONS : oldDebugMode);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this one because I don't think either form is significantly more or less readable, and it seems weird to set the value if it's not being changed.

Comment on lines 887 to 893
* If a shard has moved we need to re-run the decision to check that it's still valid. If {@link #notPreferredLogger} is
* set to debug, we re-run the decision with explain turned on so we can populate the explanation in our log message
*
* @param index The index of the shard
* @param storedShardMovement The existing shard movement decision
* @param shardMoved True if a shard moved in this balancing round, false otherwise
* @return The move decision to act on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found

"If a shard has moved we need to re-run the decision to check that it's still valid."

to be difficult to follow. Could it be replaced with something like

"If any shards in the cluster have been relocated since the stored shard movement was calculated, the move decision must be re-run to ensure it is still allowed."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved(?) in c9a8751

nicktindall and others added 2 commits November 20, 2025 14:57
…ion/allocator/BalancedShardsAllocator.java

Co-authored-by: Dianna Hohensee <artemisapple@gmail.com>
@nicktindall nicktindall merged commit ee8a1a7 into elastic:main Nov 20, 2025
34 checks passed
@nicktindall nicktindall deleted the rerun_decidemove_to_get_explanation branch November 20, 2025 05:43
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants