Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@
import org.apache.kafka.controller.stream.NodeMetadata;
import org.apache.kafka.controller.stream.NodeState;

import com.automq.stream.utils.Systems;

import java.util.Map;
import java.util.Objects;
import java.util.concurrent.TimeUnit;

/**
* NodeRuntimeMetadata is a runtime view of a node's metadata.
Expand All @@ -40,7 +41,7 @@ public final class NodeRuntimeMetadata {
* @see ClusterControlManager#getNextNodeId()
*/
private static final int MAX_CONTROLLER_ID = 1000 - 1;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS should be documented with a comment explaining its purpose, acceptable values (non-negative long in milliseconds), and the default value (60000ms = 1 minute). This helps operators understand how to configure this parameter.

Suggested change
private static final int MAX_CONTROLLER_ID = 1000 - 1;
private static final int MAX_CONTROLLER_ID = 1000 - 1;
/**
* The minimum time (in milliseconds) after a node starts a new epoch before failover is allowed.
* Controlled by the environment variable {@code AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS}.
* <p>
* Purpose: Prevents failover immediately after a node restarts, allowing time for recovery.
* Acceptable values: non-negative long (milliseconds).
* Default value: 60000ms (1 minute).
*/

Copilot uses AI. Check for mistakes.
private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = TimeUnit.MINUTES.toMillis(1);
private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of DONT_FAILOVER_AFTER_NEW_EPOCH_MS now depends on an environment variable, but there are no tests verifying that shouldFailover() works correctly with different values of this parameter. Consider adding tests that validate the failover logic with various timeout values to ensure the environment variable configuration works as expected.

Copilot uses AI. Check for mistakes.
private final int id;
Comment on lines +44 to 45
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Systems.getEnvLong method can throw NumberFormatException if the environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS is set to an invalid value (e.g., non-numeric string). This would cause the class to fail during static initialization, potentially crashing the application at startup. Consider adding validation or catching the exception with a fallback to the default value, or document the expected format clearly.

Suggested change
private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000);
private final int id;
private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = getDontFailoverAfterNewEpochMs();
private final int id;
private static long getDontFailoverAfterNewEpochMs() {
final long defaultValue = 60L * 1000;
try {
return Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", defaultValue);
} catch (NumberFormatException e) {
// Optionally log the error here if a logger is available
return defaultValue;
}
}

Copilot uses AI. Check for mistakes.
private final long epoch;
private final String walConfigs;
Expand Down
Loading