-
Notifications
You must be signed in to change notification settings - Fork 575
feat(core): Replace fixed parameters with environment variables #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request replaces a hardcoded failover protection period with a configurable environment variable, enabling dynamic adjustment of the timeout without code modification. The main change allows operators to control when a node should be eligible for failover after entering a new epoch.
- Replaced
TimeUnit.MINUTES.toMillis(1)withSystems.getEnvLong()for dynamic configuration - Removed unused
TimeUnitimport and addedSystemsutility import - Default value remains 60000ms (1 minute) when environment variable is not set
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); | ||
| private final int id; |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Systems.getEnvLong method can throw NumberFormatException if the environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS is set to an invalid value (e.g., non-numeric string). This would cause the class to fail during static initialization, potentially crashing the application at startup. Consider adding validation or catching the exception with a fallback to the default value, or document the expected format clearly.
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); | |
| private final int id; | |
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = getDontFailoverAfterNewEpochMs(); | |
| private final int id; | |
| private static long getDontFailoverAfterNewEpochMs() { | |
| final long defaultValue = 60L * 1000; | |
| try { | |
| return Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", defaultValue); | |
| } catch (NumberFormatException e) { | |
| // Optionally log the error here if a logger is available | |
| return defaultValue; | |
| } | |
| } |
| * | ||
| * @see ClusterControlManager#getNextNodeId() | ||
| */ | ||
| private static final int MAX_CONTROLLER_ID = 1000 - 1; |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS should be documented with a comment explaining its purpose, acceptable values (non-negative long in milliseconds), and the default value (60000ms = 1 minute). This helps operators understand how to configure this parameter.
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | |
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | |
| /** | |
| * The minimum time (in milliseconds) after a node starts a new epoch before failover is allowed. | |
| * Controlled by the environment variable {@code AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS}. | |
| * <p> | |
| * Purpose: Prevents failover immediately after a node restarts, allowing time for recovery. | |
| * Acceptable values: non-negative long (milliseconds). | |
| * Default value: 60000ms (1 minute). | |
| */ |
| */ | ||
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | ||
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = TimeUnit.MINUTES.toMillis(1); | ||
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of DONT_FAILOVER_AFTER_NEW_EPOCH_MS now depends on an environment variable, but there are no tests verifying that shouldFailover() works correctly with different values of this parameter. Consider adding tests that validate the failover logic with various timeout values to ensure the environment variable configuration works as expected.
The failover protection period parameter can be dynamically configured. This time can be adjusted by setting environment variables without requiring code modification and recompilation.