Skip to content

pg_autoctl perform failover #1088

@ashwanidave

Description

@ashwanidave

Environment : Two node active-passive and third node dedicated as monitor (Postgresql 14, ubuntu 22.04)
Purpose : Manage auto failover/failback without manual intervention

After configuration, monitor node, shows active (node_1) - passive(node_2) as healthy, with reported & assigned as PRIMARY and SECONDARY respectively. all pre-requisites denote readiness for configuration and setup of nodes.

scenario : before go-live, tried manual failover for feel and experience.

using - pg_autoctl perform failover # From current primary node

Findings :

  • background process, changed the state for node_1 : demoted/catchingup and node_2 : wait_primary/wait_primary
  • initiated pg_basebackup from node_2 onto node_1 after remove content of data directory
  • due to blank of data directory - cluster DOWN
  • mismatch for special files between config and data directory (using symlink)
  • in absence of required special files at place where it should be, base backup restore went into endless loop
  • assign of pgautofailover_replicator password to system
  • after manual intervention for provide of base backup to data directory
  • make available of all required files at respective positions
  • lastly max_wal_sender parameter was the show stopper (which was same as defined earlier stage)
  • ultimately both nodes were in state, where their required positions after time consuming efforts.

if these efforts are during manual implement of failover, how the tool is reliable for GO-LIVE scenario, is there any extra efforts to make it auto process ?

Any experience/suggestion always welcome from expert team.

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions