-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
area/docdbYugabyteDB core featuresYugabyteDB core featureskind/bugThis issue is a bugThis issue is a bugpriority/mediumMedium priority issueMedium priority issuestatus/awaiting-triageIssue awaiting triageIssue awaiting triage
Description
Jira Link: DB-18374
Description
Steps to reproduce:
- Start group of 3 masters:
./bin/yb-master \
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
--fs_data_dirs=$HOME/yugabyte/node1/data \
--rpc_bind_addresses=127.0.0.1:7100
sudo ifconfig lo0 alias 127.0.0.2
./bin/yb-master \
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
--fs_data_dirs=$HOME/yugabyte/node2/data \
--rpc_bind_addresses=127.0.0.2:7100
sudo ifconfig lo0 alias 127.0.0.3
./bin/yb-master \
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
--fs_data_dirs=$HOME/yugabyte/node3/data \
--rpc_bind_addresses=127.0.0.3:7100
- Check they are healthy:
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 list_all_masters
Master UUID RPC Host/Port State Role Broadcast Host/Port
af08844be93d4cdf9e0b94858fe33675 127.0.0.1:7100 ALIVE FOLLOWER N/A
8bff6598e2624fbdbd20000c5dde8f0f 127.0.0.2:7100 ALIVE FOLLOWER N/A
240ce9373a8a42d18b9efa7e44021969 127.0.0.3:7100 ALIVE LEADER N/A
- Stop
node3
and clear its' data:
rm -fr $HOME/yugabyte/node3/data/yb-data/*
- Start it again:
./bin/yb-master \
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
--fs_data_dirs=$HOME/yugabyte/node3/data \
--rpc_bind_addresses=127.0.0.3:7100
- Check list of masters:
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 list_all_masters
Master UUID RPC Host/Port State Role Broadcast Host/Port
af08844be93d4cdf9e0b94858fe33675 127.0.0.1:7100 ALIVE LEADER N/A
8bff6598e2624fbdbd20000c5dde8f0f 127.0.0.2:7100 ALIVE FOLLOWER N/A
6e9269eaa24740eaa5bc7bccda343917 127.0.0.3:7100 ALIVE FOLLOWER N/A
node3
looks like a healthy FOLLOWER
- But if you try to promote it to LEADER:
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 master_leader_stepdown 6e9269eaa24740eaa5bc7bccda343917
E0923 21:02:23.128075 47841792 yb-admin_client.cc:729] LeaderStepDown for af08844be93d4cdf9e0b94858fe33675received error code: LEADER_NOT_READY_TO_STEP_DOWN status { code: ILLEGAL_STATE message: "Suggested peer is not caught up yet" source_file: "../../src/yb/consensus/raft_consensus.cc" source_line: 851 errors: "\000" }
Error running master_leader_stepdown: Illegal state (yb/consensus/raft_consensus.cc:851): Suggested peer is not caught up yet
It turns out it's not healthy actually.
It remains in this state indefinitely - i.e. it doesn't catch up.
This is very misleading and can cause serious troubles if you continue working on cluster in this state.
For example if you change disk of another yb-master, then it will lead to cluster meta becoming unavailable (due to yb-master raft group losing quorum I suppose)
Expected behavior:
Such yb-master node is shown as non-healthy in the masters list
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
- I confirm this issue does not contain any sensitive information.
Metadata
Metadata
Assignees
Labels
area/docdbYugabyteDB core featuresYugabyteDB core featureskind/bugThis issue is a bugThis issue is a bugpriority/mediumMedium priority issueMedium priority issuestatus/awaiting-triageIssue awaiting triageIssue awaiting triage