[DocDB] yb-master shown healthy when it's not

Jira Link: [DB-18374](https://yugabyte.atlassian.net/browse/DB-18374)
### Description

**Steps to reproduce:**

1. Start group of 3 masters:
```shell
./bin/yb-master \
    --master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
    --fs_data_dirs=$HOME/yugabyte/node1/data \
    --rpc_bind_addresses=127.0.0.1:7100

sudo ifconfig lo0 alias 127.0.0.2

./bin/yb-master \
    --master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
    --fs_data_dirs=$HOME/yugabyte/node2/data \
    --rpc_bind_addresses=127.0.0.2:7100

sudo ifconfig lo0 alias 127.0.0.3

./bin/yb-master \
    --master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
    --fs_data_dirs=$HOME/yugabyte/node3/data \
    --rpc_bind_addresses=127.0.0.3:7100
```

2. Check they are healthy:
```shell
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 list_all_masters                                       
Master UUID                      	RPC Host/Port        	State    	Role 	Broadcast Host/Port 
af08844be93d4cdf9e0b94858fe33675 	127.0.0.1:7100       	ALIVE    	FOLLOWER 	N/A                 
8bff6598e2624fbdbd20000c5dde8f0f 	127.0.0.2:7100       	ALIVE    	FOLLOWER 	N/A                 
240ce9373a8a42d18b9efa7e44021969 	127.0.0.3:7100       	ALIVE    	LEADER 	N/A
```

3. Stop `node3` and clear its' data:
```shell
rm -fr $HOME/yugabyte/node3/data/yb-data/*
```

4. Start it again:
```shell
./bin/yb-master \
    --master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 \
    --fs_data_dirs=$HOME/yugabyte/node3/data \
    --rpc_bind_addresses=127.0.0.3:7100
```

5. Check list of masters:
```shell
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 list_all_masters
Master UUID                      	RPC Host/Port        	State    	Role 	Broadcast Host/Port 
af08844be93d4cdf9e0b94858fe33675 	127.0.0.1:7100       	ALIVE    	LEADER 	N/A                 
8bff6598e2624fbdbd20000c5dde8f0f 	127.0.0.2:7100       	ALIVE    	FOLLOWER 	N/A                 
6e9269eaa24740eaa5bc7bccda343917 	127.0.0.3:7100       	ALIVE    	FOLLOWER 	N/A 
```
`node3` looks like a healthy FOLLOWER

6. But if you try to promote it to LEADER:
```shell
% ./bin/yb-admin --master_addresses 127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100 master_leader_stepdown 6e9269eaa24740eaa5bc7bccda343917
E0923 21:02:23.128075 47841792 yb-admin_client.cc:729] LeaderStepDown for af08844be93d4cdf9e0b94858fe33675received error code: LEADER_NOT_READY_TO_STEP_DOWN status { code: ILLEGAL_STATE message: "Suggested peer is not caught up yet" source_file: "../../src/yb/consensus/raft_consensus.cc" source_line: 851 errors: "\000" }
Error running master_leader_stepdown: Illegal state (yb/consensus/raft_consensus.cc:851): Suggested peer is not caught up yet
```
It turns out it's not healthy actually.
It remains in this state indefinitely - i.e. it doesn't catch up.

This is very misleading and can cause serious troubles if you continue working on cluster in this state.
For example if you change disk of another yb-master, then it will lead to cluster meta becoming unavailable (due to yb-master raft group losing quorum I suppose)

**Expected behavior:**
Such yb-master node is shown as non-healthy in the masters list

### Issue Type

kind/bug

### Warning: Please confirm that this issue does not contain any sensitive information

- [x] I confirm this issue does not contain any sensitive information.

[DB-18374]: https://yugabyte.atlassian.net/browse/DB-18374?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DocDB] yb-master shown healthy when it's not #28675

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DocDB] yb-master shown healthy when it's not #28675

Description

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions