Skip to content

pcp_promote_node switchover reports success when pg_rewind from follow_primary fails #147

@tallenaz

Description

@tallenaz

Using pgpool 4.7.0:

root@dlss-azanella-db-01:~# pgpool --version
pgpool-II version 4.7.0 (tasukiboshi)

I have a healthy pgpool cluster:

root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:39:31
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:39:47
root@dlss-azanella-db-01:~# su - postgres -c 'psql -c "show checkpoint_timeout;"'
 checkpoint_timeout 
--------------------
 5min
(1 row)

After the checkpoint_timeout, I perform a switchover:

root@dlss-azanella-db-01:~# date
Fri Jan 30 11:47:32 AM PST 2026
root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 --switchover -w
pcp_promote_node -- Command Successful

The follow_primary script succeeds:

2026-01-30 11:47:49.972: follow_child pid 353841: LOG:  === Starting follow primary command for node 0 ===
2026-01-30 11:47:49.972: follow_child pid 353841: LOG:  execute command: /etc/pgpool2/follow_primary.sh  0 dlss-azanella-db-01.stanford.edu  5432 /var/lib/postgresql/16/main  1 dlss-azanella-db-02.stanford.edu 0 0 5432 /var/lib/postgresql/16/main
+ NODE_ID=0
+ NODE_HOST=dlss-azanella-db-01.stanford.edu
+ NODE_PORT=5432
+ NODE_PGDATA=/var/lib/postgresql/16/main
+ NEW_PRIMARY_NODE_ID=1
+ NEW_PRIMARY_NODE_HOST=dlss-azanella-db-02.stanford.edu
+ OLD_MAIN_NODE_ID=0
+ OLD_PRIMARY_NODE_ID=0
+ NEW_PRIMARY_NODE_PORT=5432
+ NEW_PRIMARY_NODE_PGDATA=/var/lib/postgresql/16/main
+ PGHOME=/usr/lib/postgresql/16
+ REPLUSER=repl
+ PCP_USER=pgpool
+ PGPOOL_PATH=/sbin
+ PCP_PORT=9898
++ tr -- -. _
++ echo dlss-azanella-db-01.stanford.edu
+ REPL_SLOT_NAME=dlss_azanella_db_01_stanford_edu
+ POSTGRESQL_STARTUP_USER=postgres
+ SSH_KEY_FILE=id_rsa
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa'
+ PGCONFIG_DIR=/etc/postgresql/16/main
+ echo follow_primary.sh: start: Standby node 0
follow_primary.sh: start: Standby node 0
+ /usr/lib/postgresql/16/bin/pg_isready -h dlss-azanella-db-01.stanford.edu -p 5432
2026-01-30 11:47:49.983: sr_check_worker pid 351959: LOG:  worker process received restart request
2026-01-30 11:47:49.984: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:49.984: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:47:49.984: main pid 175576: LOG:  === Failover done. shutdown host dlss-azanella-db-01.stanford.edu(5432) ===
+ '[' 0 -ne 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' [email protected] ls /tmp
Warning: Permanently added 'dlss-azanella-db-02.stanford.edu' (ED25519) to the list of known hosts.
+ '[' 0 -ne 0 ']'
++ /usr/lib/postgresql/16/bin/psql -V
++ awk '{print $3}'
++ sed 's/\..*//'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
+ PGVERSION=16
+ '[' 16 -ge 12 ']'
+ RECOVERYCONF=/var/lib/postgresql/16/main/myrecovery.conf
+ echo follow_primary.sh: pg_rewind for node 0
follow_primary.sh: pg_rewind for node 0
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'checkpoint;'
CHECKPOINT
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_create_physical_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' [email protected] '

    set -o errexit

    /usr/bin/sudo /usr/bin/systemctl stop postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -w -m f -D /var/lib/postgresql/16/main stop

    /usr/lib/postgresql/16/bin/pg_rewind -D /var/lib/postgresql/16/main --source-server="user=postgres host=dlss-azanella-db-02.stanford.edu port=5432 dbname=postgres"

    [ -d "/var/lib/postgresql/16/main" ] && rm -rf /var/lib/postgresql/16/main/pg_replslot/*

    cat > /var/lib/postgresql/16/main/myrecovery.conf << EOT
primary_conninfo = '\''host=dlss-azanella-db-02.stanford.edu port=5432 user=repl application_name=dlss-azanella-db-01.stanford.edu passfile='\'''\''/var/lib/postgresql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
primary_slot_name = '\''dlss_azanella_db_01_stanford_edu'\''
EOT

    if [ 16 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''"                -e "/^include_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''/d" /etc/postgresql/16/main/postgresql.conf
        touch /var/lib/postgresql/16/main/standby.signal
    else
        echo "standby_mode = '\''on'\''" >> /var/lib/postgresql/16/main/myrecovery.conf
    fi

    /usr/bin/sudo /usr/bin/systemctl start postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -l /dev/null -w -D /var/lib/postgresql/16/main start

'
Warning: Permanently added 'dlss-azanella-db-01.stanford.edu' (ED25519) to the list of known hosts.
pg_rewind: servers diverged at WAL location 0/7E000148 on timeline 73
pg_rewind: rewinding from last common checkpoint at 0/7E000098 on timeline 73
2026-01-30 11:47:50.984: pcp_main pid 351958: LOG:  restart request received in pcp child process
2026-01-30 11:47:50.986: main pid 175576: LOG:  PCP child 351958 exits with status 0 in failover()
2026-01-30 11:47:50.987: main pid 175576: LOG:  fork a new PCP child pid 353904 in failover()
2026-01-30 11:47:50.987: pcp_main pid 353904: LOG:  PCP process: 353904 started
2026-01-30 11:47:50.988: sr_check_worker pid 353905: LOG:  process started
pg_rewind: Done!
+ '[' 0 -eq 0 ']'
+ /sbin/pcp_attach_node -w -h localhost -U pgpool -p 9898 -n 0
2026-01-30 11:47:53.793: pcp_child pid 353914: LOG:  received failback request for node_id: 0 from pid [353914]
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog is processing the failover command [FAILBACK_REQUEST] received from local pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  The failover request does not need quorum to hold
2026-01-30 11:47:53.793: watchdog pid 175581: DETAIL:  proceeding with the failover
2026-01-30 11:47:53.793: watchdog pid 175581: HINT:  REQ_DETAIL_CONFIRMED
2026-01-30 11:47:53.793: pcp_child pid 353914: LOG:  signal_user1_to_parent_with_reason(0)
2026-01-30 11:47:53.793: main pid 175576: LOG:  Pgpool-II parent process received SIGUSR1
2026-01-30 11:47:53.793: main pid 175576: LOG:  Pgpool-II parent process has received failover request
pcp_attach_node -- Command Successful
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog is informed of failover start by the main process
+ '[' 0 -ne 0 ']'
2026-01-30 11:47:53.794: main pid 175576: LOG:  === Starting fail back. reconnect host dlss-azanella-db-01.stanford.edu(5432) ===
2026-01-30 11:47:53.794: main pid 175576: LOG:  Node 1 is not down (status: 1)
+ echo follow_primary.sh: end: follow primary command is completed successfully
follow_primary.sh: end: follow primary command is completed successfully
+ exit 0
2026-01-30 11:47:53.794: main pid 175576: LOG:  Do not restart children because we are failing back node id 0 host: dlss-azanella-db-01.stanford.edu port: 5432 and we are in streaming replication mode and not all backends were down
2026-01-30 11:47:53.794: main pid 175576: LOG:  find_primary_node_repeatedly: follow primary is ongoing. return current primary: 1
2026-01-30 11:47:53.794: main pid 175576: LOG:  failover: set new primary node: 1
2026-01-30 11:47:53.794: main pid 175576: LOG:  failover: set new main node: 0
2026-01-30 11:47:53.794: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:53.794: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:47:53.794: sr_check_worker pid 353905: LOG:  worker process received restart request
2026-01-30 11:47:53.794: main pid 175576: LOG:  === Failback done. reconnect host dlss-azanella-db-01.stanford.edu(5432) ===
2026-01-30 11:47:53.794: follow_child pid 353841: LOG:  === Follow primary command for node 0 ended ===
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:47:53
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:47:49

Also, I can perform a switchover in the other direction right away:

root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 --switchover -w
pcp_promote_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:52:01
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:52:05

I can also successfully simulate a failover and perform a recovery in both directions:

root@dlss-azanella-db-01:~# systemctl stop postgresql@16-main
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:55:22
dlss-azanella-db-02.stanford.edu 5432 2 0.500000 up up primary primary 0 none none 2026-01-30 11:55:22
root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:56:03
dlss-azanella-db-02.stanford.edu 5432 2 0.500000 up up primary primary 0 none none 2026-01-30 11:55:22
root@dlss-azanella-db-02:~# systemctl stop postgresql@16-main
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:57:05
dlss-azanella-db-02.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:57:05
root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:57:05
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:57:31

But if I do not wait for the next checkpoint_timeout, a pcp_promote_node –switchover says the command succeeds, but follow_primary fails:

root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 --switchover -w
pcp_promote_node -- Command Successful
2026-01-30 11:59:10.048: follow_child pid 355683: LOG:  === Starting follow primary command for node 0 ===
2026-01-30 11:59:10.048: follow_child pid 355683: LOG:  execute command: /etc/pgpool2/follow_primary.sh  0 dlss-azanella-db-01.stanford.edu  5432 /var/lib/postgresql/16/main  1 dlss-azanella-db-02.stanford.edu 0 0 5432 /var/lib/postgresql/16/main
+ NODE_ID=0
+ NODE_HOST=dlss-azanella-db-01.stanford.edu
+ NODE_PORT=5432
+ NODE_PGDATA=/var/lib/postgresql/16/main
+ NEW_PRIMARY_NODE_ID=1
+ NEW_PRIMARY_NODE_HOST=dlss-azanella-db-02.stanford.edu
+ OLD_MAIN_NODE_ID=0
+ OLD_PRIMARY_NODE_ID=0
+ NEW_PRIMARY_NODE_PORT=5432
+ NEW_PRIMARY_NODE_PGDATA=/var/lib/postgresql/16/main
+ PGHOME=/usr/lib/postgresql/16
+ REPLUSER=repl
+ PCP_USER=pgpool
+ PGPOOL_PATH=/sbin
+ PCP_PORT=9898
++ echo dlss-azanella-db-01.stanford.edu
++ tr -- -. _
+ REPL_SLOT_NAME=dlss_azanella_db_01_stanford_edu
+ POSTGRESQL_STARTUP_USER=postgres
+ SSH_KEY_FILE=id_rsa
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa'
+ PGCONFIG_DIR=/etc/postgresql/16/main
+ echo follow_primary.sh: start: Standby node 0
follow_primary.sh: start: Standby node 0
+ /usr/lib/postgresql/16/bin/pg_isready -h dlss-azanella-db-01.stanford.edu -p 5432
2026-01-30 11:59:10.058: sr_check_worker pid 355388: LOG:  worker process received restart request
2026-01-30 11:59:10.058: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:59:10.058: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:59:10.059: main pid 175576: LOG:  === Failover done. shutdown host dlss-azanella-db-01.stanford.edu(5432) ===
+ '[' 0 -ne 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' [email protected] ls /tmp
Warning: Permanently added 'dlss-azanella-db-02.stanford.edu' (ED25519) to the list of known hosts.
+ '[' 0 -ne 0 ']'
++ /usr/lib/postgresql/16/bin/psql -V
++ awk '{print $3}'
++ sed 's/\..*//'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
+ PGVERSION=16
+ '[' 16 -ge 12 ']'
+ RECOVERYCONF=/var/lib/postgresql/16/main/myrecovery.conf
+ echo follow_primary.sh: pg_rewind for node 0
follow_primary.sh: pg_rewind for node 0
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'checkpoint;'
CHECKPOINT
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_create_physical_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' [email protected] '

    set -o errexit

    /usr/bin/sudo /usr/bin/systemctl stop postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -w -m f -D /var/lib/postgresql/16/main stop

    /usr/lib/postgresql/16/bin/pg_rewind -D /var/lib/postgresql/16/main --source-server="user=postgres host=dlss-azanella-db-02.stanford.edu port=5432 dbname=postgres"

    [ -d "/var/lib/postgresql/16/main" ] && rm -rf /var/lib/postgresql/16/main/pg_replslot/*

    cat > /var/lib/postgresql/16/main/myrecovery.conf << EOT
primary_conninfo = '\''host=dlss-azanella-db-02.stanford.edu port=5432 user=repl application_name=dlss-azanella-db-01.stanford.edu passfile='\'''\''/var/lib/postgresql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
primary_slot_name = '\''dlss_azanella_db_01_stanford_edu'\''
EOT

    if [ 16 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''"                -e "/^include_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''/d" /etc/postgresql/16/main/postgresql.conf
        touch /var/lib/postgresql/16/main/standby.signal
    else
        echo "standby_mode = '\''on'\''" >> /var/lib/postgresql/16/main/myrecovery.conf
    fi

    /usr/bin/sudo /usr/bin/systemctl start postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -l /dev/null -w -D /var/lib/postgresql/16/main start

'
Warning: Permanently added 'dlss-azanella-db-01.stanford.edu' (ED25519) to the list of known hosts.
2026-01-30 11:59:11.059: pcp_main pid 355387: LOG:  restart request received in pcp child process
2026-01-30 11:59:11.061: main pid 175576: LOG:  PCP child 355387 exits with status 0 in failover()
2026-01-30 11:59:11.061: main pid 175576: LOG:  fork a new PCP child pid 355743 in failover()
2026-01-30 11:59:11.061: pcp_main pid 355743: LOG:  PCP process: 355743 started
2026-01-30 11:59:11.062: sr_check_worker pid 355744: LOG:  process started
pg_rewind: servers diverged at WAL location 0/84000060 on timeline 77
pg_rewind: error: could not open file "/var/lib/postgresql/16/main/pg_wal/0000004D0000000000000083": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/83000138
+ '[' 1 -eq 0 ']'
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_drop_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
+ echo ERROR: follow_primary.sh: end: follow primary command failed
ERROR: follow_primary.sh: end: follow primary command failed
+ exit 1
2026-01-30 11:59:11.133: follow_child pid 355683: LOG:  === Follow primary command for node 0 ended ===
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:59:10
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:59:10

Although follow_primary failed, the failover promotion succeeded:

failover.sh: end: new_main_node_id=1 on dlss-azanella-db-02.stanford.edu was successfully promoted to primary

02 is the primary, but there is no replication to 01, even though logs show follow_primary executed the checkpoint. At this point I can recover 01 as the standby and end up with a healthy cluster again:

root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 12:07:04
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:59:10

And at this point, unless I perform a manual checkpoint, or wait until the next checkpoint_timeout has expired a switchover will do the same thing: pcp_promote_node –switchover will say it succeeded, but logs and pcp_node_info will show a promoted node without replication. If I perform a manual checkpoint or wait until the next checkpoint_timeout, a pcp_promote_node –switchover will say it succeeded, and logs will show follow_primary succeeded, and pcp_node_info will show a promoted node with streaming replication.

I guess pcp_promote_node does succeed, in the sense that I end up with a promoted node. But as the switchover does not succeed, I would have expected pcp_promote_node –switchover to error out when the logs show follow_primary does not succeed, and when pcp_node_info does not show a switchover. Unless that is the expected behavior, or I have something misconfigured?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions