-
Notifications
You must be signed in to change notification settings - Fork 126
Description
Environment : Two node active-passive and third node dedicated as monitor (Postgresql 14, ubuntu 22.04)
Purpose : Manage auto failover/failback without manual intervention
After configuration, monitor node, shows active (node_1) - passive(node_2) as healthy, with reported & assigned as PRIMARY and SECONDARY respectively. all pre-requisites denote readiness for configuration and setup of nodes.
scenario : before go-live, tried manual failover for feel and experience.
using - pg_autoctl perform failover # From current primary node
Findings :
- background process, changed the state for node_1 : demoted/catchingup and node_2 : wait_primary/wait_primary
- initiated pg_basebackup from node_2 onto node_1 after remove content of data directory
- due to blank of data directory - cluster DOWN
- mismatch for special files between config and data directory (using symlink)
- in absence of required special files at place where it should be, base backup restore went into endless loop
- assign of pgautofailover_replicator password to system
- after manual intervention for provide of base backup to data directory
- make available of all required files at respective positions
- lastly max_wal_sender parameter was the show stopper (which was same as defined earlier stage)
- ultimately both nodes were in state, where their required positions after time consuming efforts.
if these efforts are during manual implement of failover, how the tool is reliable for GO-LIVE scenario, is there any extra efforts to make it auto process ?
Any experience/suggestion always welcome from expert team.
Thanks in advance