CloudTadaInsights

Lesson 15: Recovering Failed Nodes

Recovering Failed Nodes

Learning Objectives

After this lesson, you will:

  • Rejoin old primary after failover
  • Use pg_rewind to sync data
  • Rebuild replica with pg_basebackup
  • Handle timeline divergence
  • Recover from split-brain scenarios
  • Automate recovery with Patroni

1. Node Recovery Overview

1.1. Recovery Scenarios

Khi nào cần recover node?

Scenario 1: Old primary sau failover

TEXT
Before:
  node1 (primary) → FAILS
  node2 (replica) → promoted to primary

After:
  node1: Needs rejoin as replica
  node2: Current primary

Scenario 2: Replica disconnected

TEXT
Before:
  node3 (replica) → Network partition / Crash

After:
  node3: Needs to catch up with primary

Scenario 3: Hardware replacement

TEXT
Before:
  node2: Disk failure

After:
  node2: New disk, needs full rebuild

Scenario 4: Timeline divergence

TEXT
Before:
  node1 accepted writes AFTER losing leader lock

After:
  node1: Diverged timeline, conflicts with cluster

1.2. Recovery Methods

MethodWhen to useTimeData loss
Auto-rejoinNode was clean shutdown~10sNone
pg_rewindTimeline divergence~1-5minNone
pg_basebackupMajor corruption / Full rebuild~30min+None
Manual recoveryComplex split-brain scenariosVariesPossible

2. Auto-Rejoin (Patroni Default)

2.1. How auto-rejoin works

When node comes back online:

TEXT
1. Patroni starts
2. Checks DCS for cluster state
3. Finds current leader (e.g., node2)
4. Compares local timeline with cluster timeline
5. If compatible → auto-rejoin as replica
6. If diverged → need pg_rewind or reinit

2.2. Example: Clean rejoin

Setup:

TEXT
# Current cluster state
patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | Replica | running |  2 |         0 |
# +--------+-------------+---------+---------+----+-----------+

Simulate node3 failure:

TEXT
# On node3: Stop Patroni cleanly
sudo systemctl stop patroni

# Cluster now:
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | -       | stopped |  - |           | ← Down

Recovery:

TEXT
# On node3: Start Patroni
sudo systemctl start patroni

# Watch logs
sudo journalctl -u patroni -f

Log output:

TEXT
2024-11-25 10:00:00 INFO: Starting Patroni...
2024-11-25 10:00:01 INFO: Connected to DCS (etcd)
2024-11-25 10:00:02 INFO: Cluster timeline: 2, local timeline: 2 ✅
2024-11-25 10:00:03 INFO: Current leader: node1
2024-11-25 10:00:04 INFO: Rejoining as replica
2024-11-25 10:00:05 INFO: Starting PostgreSQL in recovery mode
2024-11-25 10:00:08 INFO: Replication started, streaming from node1
2024-11-25 10:00:10 INFO: Successfully rejoined cluster ✅

Verify:

TEXT
patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | Replica | running |  2 |         0 | ← Rejoined!
# +--------+-------------+---------+---------+----+-----------+

Time: ~10 seconds ✅

2.3. Configuration for auto-rejoin

TEXT
# In patroni.yml
postgresql:
  use_pg_rewind: true  # Enable automatic pg_rewind if needed
  remove_data_directory_on_rewind_failure: false  # Safety
  remove_data_directory_on_diverged_timelines: false  # Safety

# Patroni will attempt:
# 1. Auto-rejoin (if timelines match)
# 2. pg_rewind (if timeline diverged but recoverable)
# 3. Full reinit (if pg_rewind fails and auto-reinit enabled)

3. Using pg_rewind

3.1. What is pg_rewind?

pg_rewind = Tool to resync a PostgreSQL instance that diverged from the current timeline.

When needed:

TEXT
Scenario: Old primary received writes AFTER failover

Timeline:
  T+0: node1 (primary), node2 (replica)
  T+1: Network partition
  T+2: node2 promoted (timeline: 1 → 2)
  T+3: node1 still thinks it's primary, accepts writes (timeline: 1)
  T+4: Network restored
  T+5: Conflict! node1 timeline=1, cluster timeline=2

Solution: pg_rewind node1 to match node2's timeline

How it works:

TEXT
1. Find common ancestor (last shared WAL position)
2. Replay WAL from new primary
3. Overwrite conflicting blocks
4. Node rejoins as replica on new timeline

3.2. Prerequisites for pg_rewind

Requirements:

TEXT
# In patroni.yml → postgresql.parameters
wal_log_hints: 'on'  # Required! (or use full_page_writes)

# Or use data checksums (set during initdb):
# initdb --data-checksums

# Also ensure:
max_wal_senders: 10  # For replication
wal_level: replica   # For replication

Why wal_log_hints?

TEXT
Without wal_log_hints:
  pg_rewind cannot determine which blocks changed
  → Cannot resync
  → Must use full rebuild (pg_basebackup)

With wal_log_hints:
  PostgreSQL tracks all block changes
  → pg_rewind can identify divergence
  → Fast resync ✅

Trade-off: ~1-2% write performance overhead

3.3. Manual pg_rewind

Scenario: node1 (old primary) needs resync after failover.

Step 1: Stop PostgreSQL on node1

TEXT
# On node1
sudo systemctl stop patroni
sudo systemctl stop postgresql

Step 2: Run pg_rewind

TEXT
# On node1: Rewind to match node2 (current primary)
sudo -u postgres pg_rewind \
  --target-pgdata=/var/lib/postgresql/18/data \
  --source-server="host=10.0.1.12 port=5432 user=replicator dbname=postgres" \
  --progress \
  --debug

# Output:
# connected to server
# servers diverged at WAL location 0/3000000 on timeline 1
# rewinding from last common checkpoint at 0/2000000 on timeline 1
# reading source file list
# reading target file list
# reading WAL in target
# need to copy 124 MB (total source directory size is 2048 MB)
# creating backup label and updating control file
# syncing target data directory
# Done!

Step 3: Create standby.signal

TEXT
# On node1: Mark as standby
sudo -u postgres touch /var/lib/postgresql/18/data/standby.signal

Step 4: Update primary_conninfo

TEXT
# On node1: Point to new primary (node2)
sudo -u postgres tee /var/lib/postgresql/18/data/postgresql.auto.conf <<EOF
primary_conninfo = 'host=10.0.1.12 port=5432 user=replicator password=replica_password'
EOF

Step 5: Start PostgreSQL

TEXT
# On node1
sudo systemctl start patroni

# Patroni will start PostgreSQL in recovery mode

Step 6: Verify

TEXT
patronictl list postgres

# node1 should now be a Replica following node2 ✅

Time: ~1-5 minutes (depends on divergence size)

3.4. Automatic pg_rewind (Patroni)

Enable in patroni.yml:

TEXT
# Patroni will automatically run pg_rewind if needed
postgresql:
  use_pg_rewind: true
  
  parameters:
    wal_log_hints: 'on'  # Required!

Behavior:

TEXT
When node rejoins after failover:
  1. Patroni detects timeline divergence
  2. Automatically runs pg_rewind
  3. Restarts PostgreSQL as replica
  4. Node rejoins cluster

No manual intervention needed! ✅

Example log:

TEXT
2024-11-25 10:05:00 INFO: Local timeline 1, cluster timeline 2
2024-11-25 10:05:01 WARNING: Timeline divergence detected
2024-11-25 10:05:02 INFO: use_pg_rewind enabled, attempting rewind...
2024-11-25 10:05:03 INFO: Running pg_rewind...
2024-11-25 10:05:45 INFO: pg_rewind completed successfully
2024-11-25 10:05:46 INFO: Starting PostgreSQL as replica
2024-11-25 10:05:50 INFO: Rejoined cluster ✅

4. Full Rebuild with pg_basebackup

4.1. When to use pg_basebackup

Use cases:

  1. pg_rewind failed - Data too diverged
  2. Corruption detected - Data integrity issues
  3. Major version upgrade - Different PostgreSQL versions
  4. New node - Adding fresh replica to cluster
  5. Disk replaced - Empty data directory
  6. Paranoid safety - Want guaranteed clean state

Trade-off: Slower (~30min-2hrs for large DB) but guaranteed clean.

4.2. Manual pg_basebackup

Step 1: Stop and clean node

TEXT
# On node to rebuild (e.g., node3)
sudo systemctl stop patroni
sudo systemctl stop postgresql

# Remove old data directory
sudo rm -rf /var/lib/postgresql/18/data/*

Step 2: Take base backup from primary

TEXT
# On node3: Backup from current primary (node2)
sudo -u postgres pg_basebackup \
  -h 10.0.1.12 \
  -p 5432 \
  -U replicator \
  -D /var/lib/postgresql/18/data \
  -Fp \
  -Xs \
  -P \
  -R

# Flags:
# -h: Host (primary)
# -U: Replication user
# -D: Target data directory
# -Fp: Plain format (not tar)
# -Xs: Stream WAL during backup
# -P: Show progress
# -R: Create standby.signal and replication config

Output:

TEXT
Password: [enter replicator password]
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000000 on timeline 2
pg_basebackup: starting background WAL receiver
24567/24567 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/4000168
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

Step 3: Verify configuration

TEXT
# On node3: Check standby.signal created
ls /var/lib/postgresql/18/data/standby.signal

# Check primary_conninfo
cat /var/lib/postgresql/18/data/postgresql.auto.conf | grep primary_conninfo

Step 4: Start node

TEXT
# On node3
sudo systemctl start patroni

# Node will rejoin as replica

Step 5: Verify

TEXT
patronictl list postgres

# node3 should be streaming from primary ✅

Time: ~30min-2hrs (depends on database size)

4.3. Patroni automatic reinit

Enable auto-reinit:

TEXT
# In patroni.yml
postgresql:
  use_pg_rewind: true
  
  # If pg_rewind fails, auto-reinit
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: true

# WARNING: Data directory will be DELETED and recreated
# Only enable if you trust automation!

Behavior:

TEXT
When node rejoins:
  1. Try auto-rejoin → FAILED (diverged)
  2. Try pg_rewind → FAILED (corruption)
  3. Automatically remove data directory
  4. Run pg_basebackup from current primary
  5. Rejoin as replica

Fully automated! But destructive! ⚠️

4.4. Patroni reinit command

Manual trigger:

TEXT
# Force reinit on node3
patronictl reinit postgres node3

# Patroni will:
# 1. Stop PostgreSQL on node3
# 2. Remove data directory
# 3. Run pg_basebackup from leader
# 4. Start as replica

# Prompt:
# Are you sure you want to reinitialize members node3? [y/N]: y

Monitor progress:

TEXT
# On node3: Watch logs
sudo journalctl -u patroni -f

# Expected:
# INFO: Removing data directory...
# INFO: Running pg_basebackup...
# INFO: Backup completed (24 GB in 15 minutes)
# INFO: Starting PostgreSQL...
# INFO: Rejoined cluster ✅

5. Timeline Divergence Resolution

5.1. Understanding timelines

Timeline = History branch counter

TEXT
Initial:
  Timeline 1 (all nodes)

After first failover:
  Old primary: Timeline 1
  New primary: Timeline 2 ← Incremented

After second failover:
  Timeline 3 ← Incremented again

Why timelines exist:

TEXT
Prevent data conflict:
  If two nodes both think they're primary,
  they write on different timelines.
  → Conflict detected
  → Manual intervention required

5.2. Detecting timeline divergence

Check local timeline:

TEXT
# On any node
sudo -u postgres psql -c "
  SELECT timeline_id 
  FROM pg_control_checkpoint();
"

# Example:
# timeline_id
# ------------
#           2

Check cluster timeline:

TEXT
# Via Patroni
patronictl list postgres | head -2

# + Cluster: postgres (7001234567890123456) ----+----+-----------+
#                                               ↑ Timeline in cluster ID

# Or via REST API
curl -s http://10.0.1.12:8008/patroni | jq '.timeline'
# Output: 2

Compare:

TEXT
# If node timeline ≠ cluster timeline
# → Node needs pg_rewind or reinit

5.3. Scenario: Timeline divergence after split-brain

Setup:

TEXT
T+0: 3-node cluster, node1 = primary (timeline 2)
T+1: Network partition splits node1 from node2/node3
T+2: node1 thinks it's still primary (timeline 2)
T+3: node2/node3 elect node2 as primary (timeline 3)
T+4: Both node1 and node2 accept writes!
  - node1: timeline 2, accepting writes ❌
  - node2: timeline 3, accepting writes ✅
  - Split-brain! ⚠️
T+5: Network restored
T+6: Conflict detected

Resolution:

TEXT
# Step 1: Verify which timeline is "correct"
patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | -       | stopped |  2 |           | ← WRONG timeline
# | node2  | 10.0.1.12   | Leader  | running |  3 |           | ← CORRECT
# | node3  | 10.0.1.13   | Replica | running |  3 |         0 |
# +--------+-------------+---------+---------+----+-----------+

# Step 2: Save diverged data from node1 (if needed)
sudo -u postgres pg_dumpall -h 10.0.1.11 > /backup/node1-diverged-data.sql

# Step 3: Rewind node1 to match timeline 3
# If pg_rewind works:
patronictl reinit postgres node1

# If pg_rewind fails (likely due to significant divergence):
# Manual pg_basebackup required
sudo systemctl stop patroni  # On node1
sudo rm -rf /var/lib/postgresql/18/data/*
sudo -u postgres pg_basebackup -h 10.0.1.12 -D /var/lib/postgresql/18/data -U replicator -R -P
sudo systemctl start patroni

# Step 4: Manually reconcile diverged data (if important)
# Review /backup/node1-diverged-data.sql
# Manually merge important transactions into node2

Prevention:

TEXT
# Configure Patroni to prevent split-brain
bootstrap:
  dcs:
    # Primary loses leader lock → immediately demote
    ttl: 30
    retry_timeout: 10
    
  postgresql:
    parameters:
      # Prevent writes if not sure about leadership
      synchronous_commit: 'remote_apply'  # Requires sync replica

6. Split-Brain Prevention and Recovery

6.1. How Patroni prevents split-brain

Mechanism: DCS Leader Lock

TEXT
Primary MUST hold leader lock in DCS:

If primary loses DCS connection:
  1. Cannot renew leader lock
  2. TTL expires (e.g., 30 seconds)
  3. Primary DEMOTES itself (becomes read-only)
  4. Replicas detect no leader
  5. Election begins

Key: Primary NEVER operates without DCS lock ✅

Code flow (pseudo):

TEXT
while True:
    if is_leader:
        if can_renew_leader_lock():
            # Still leader, continue
            accept_writes()
        else:
            # Lost DCS connection!
            log.error("Lost leader lock, DEMOTING!")
            demote_to_replica()
            reject_writes()
    
    sleep(loop_wait)

6.2. Fencing mechanisms

PostgreSQL-level fencing:

TEXT
-- When demoted, set read-only
ALTER SYSTEM SET default_transaction_read_only = 'on';
SELECT pg_reload_conf();

-- All new transactions will fail:
-- ERROR: cannot execute INSERT in a read-only transaction

OS-level fencing (advanced):

TEXT
# STONITH (Shoot The Other Node In The Head)
# Via callbacks in patroni.yml

callbacks:
  on_start: /var/lib/postgresql/callbacks/on_start.sh
  on_stop: /var/lib/postgresql/callbacks/on_stop.sh
  on_role_change: /var/lib/postgresql/callbacks/on_role_change.sh

# on_role_change.sh example:
#!/bin/bash
ROLE=$1  # "master" or "replica"

if [ "$ROLE" == "replica" ]; then
  # Lost leadership, ensure NO writes possible
  sudo iptables -A INPUT -p tcp --dport 5432 -j REJECT
  # Block incoming connections to PostgreSQL
fi

if [ "$ROLE" == "master" ]; then
  # Gained leadership, allow writes
  sudo iptables -D INPUT -p tcp --dport 5432 -j REJECT
fi

6.3. Scenario: Recover from split-brain

Detection:

TEXT
# Symptoms:
# - Multiple nodes claim to be primary
# - Patroni shows errors
# - Applications seeing inconsistent data

# Check cluster state
patronictl list postgres

# If you see multiple "Leader" or conflicts:
# SPLIT-BRAIN DETECTED! ⚠️

Recovery steps:

TEXT
# Step 1: STOP ALL NODES immediately
for node in node1 node2 node3; do
  ssh $node "sudo systemctl stop patroni"
done

# Step 2: Determine "source of truth"
# Usually: Node with most recent data / highest timeline
for node in node1 node2 node3; do
  echo "=== $node ==="
  ssh $node "sudo -u postgres psql -c \"
    SELECT timeline_id, pg_last_wal_receive_lsn()
    FROM pg_control_checkpoint();
  \""
done

# Step 3: Choose winner (e.g., node2 has highest timeline)
WINNER="node2"

# Step 4: Backup diverged data from losers
ssh node1 "sudo -u postgres pg_dumpall > /backup/node1-diverged.sql"
ssh node3 "sudo -u postgres pg_dumpall > /backup/node3-diverged.sql"

# Step 5: Wipe losers and rebuild from winner
for node in node1 node3; do
  ssh $node "sudo rm -rf /var/lib/postgresql/18/data/*"
  ssh $node "sudo -u postgres pg_basebackup \
    -h $WINNER \
    -D /var/lib/postgresql/18/data \
    -U replicator -R -P"
done

# Step 6: Clear DCS state (fresh start)
etcdctl del --prefix /service/postgres/

# Step 7: Start winner first
ssh $WINNER "sudo systemctl start patroni"

# Wait for winner to become leader
sleep 10

# Step 8: Start other nodes
ssh node1 "sudo systemctl start patroni"
ssh node3 "sudo systemctl start patroni"

# Step 9: Verify cluster
patronictl list postgres

# Should show:
# node2: Leader
# node1: Replica (following node2)
# node3: Replica (following node2)
# All same timeline ✅

# Step 10: Reconcile diverged data manually
# Review /backup/*-diverged.sql files
# Merge critical transactions if needed

7. Monitoring Node Recovery

7.1. Key metrics

TEXT
-- Replication status
SELECT application_name, 
       state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
       replay_lag,
       sync_state
FROM pg_stat_replication;

-- Timeline check
SELECT timeline_id FROM pg_control_checkpoint();

-- Recovery status (on replica)
SELECT pg_is_in_recovery(),
       pg_last_wal_receive_lsn(),
       pg_last_wal_replay_lsn(),
       pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;

7.2. Patroni REST API monitoring

TEXT
# Check node status
curl -s http://10.0.1.11:8008/patroni | jq

# Key fields:
# {
#   "state": "running",
#   "role": "replica",
#   "timeline": 3,
#   "replication": [
#     {
#       "usename": "replicator",
#       "application_name": "node1",
#       "state": "streaming",
#       "sync_state": "async",
#       "replay_lsn": "0/5000000"
#     }
#   ]
# }

7.3. Alerting on recovery issues

TEXT
# Prometheus alert
groups:
  - name: node_recovery
    rules:
      - alert: PatroniNodeDown
        expr: up{job="patroni"} == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Patroni node {{ $labels.instance }} is down"
      
      - alert: PatroniTimelineMismatch
        expr: |
          count by (cluster) (patroni_timeline) 
          != 
          count by (cluster, timeline) (patroni_timeline)
        labels:
          severity: critical
        annotations:
          summary: "Timeline mismatch detected - possible split-brain"
      
      - alert: PatroniReplicationLagHigh
        expr: patroni_replication_lag_bytes > 104857600  # 100MB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Replication lag > 100MB on {{ $labels.instance }}"

8. Best Practices

✅ DO

  1. Enable wal_log_hints - Required for pg_rewind
  2. Test recovery regularly - Monthly drills
  3. Monitor timelines - Alert on divergence
  4. Have backups - Before risky operations
  5. Document procedures - Recovery runbooks
  6. Use Patroni auto-recovery - Less manual intervention
  7. Verify after recovery - Test replication, queries
  8. Keep DCS healthy - etcd cluster critical
  9. Log everything - Audit trail for incidents
  10. Practice split-brain recovery - Hope never needed, but be ready

❌ DON'T

  1. Don't skip wal_log_hints - pg_rewind will fail
  2. Don't assume auto-recovery works - Test it!
  3. Don't ignore timeline mismatches - Critical issue
  4. Don't manually promote during recovery - Let Patroni handle
  5. Don't delete data without backup - Diverged data may be important
  6. Don't run split-brain clusters - Fix immediately
  7. Don't forget callbacks - Fencing prevents split-brain
  8. Don't over-automate reinit - Risk data loss

9. Lab Exercises

Lab 1: Auto-rejoin after clean shutdown

Tasks:

  1. Stop one replica: sudo systemctl stop patroni
  2. Make changes on primary
  3. Start replica: sudo systemctl start patroni
  4. Verify auto-rejoin and lag catch-up
  5. Time the recovery

Lab 2: pg_rewind after simulated failover

Tasks:

  1. Record current primary
  2. Manually stop primary: sudo systemctl stop patroni
  3. Wait for failover to complete
  4. Start old primary (should auto-rewind)
  5. Verify old primary rejoined as replica
  6. Check timeline increment

Lab 3: Full rebuild with pg_basebackup

Tasks:

  1. Stop a replica
  2. Delete data directory: sudo rm -rf /var/lib/postgresql/18/data/*
  3. Manually run pg_basebackup from primary
  4. Start replica
  5. Verify replication restored
  6. Measure rebuild time

Lab 4: Patroni reinit command

Tasks:

  1. Use patronictl reinit postgres node3
  2. Monitor logs during process
  3. Verify automated rebuild
  4. Compare time vs manual pg_basebackup

Lab 5: Timeline divergence simulation

Tasks:

  1. Create network partition (iptables)
  2. Wait for failover
  3. Manually promote old primary (force split-brain)
  4. Write different data to both "primaries"
  5. Restore network
  6. Observe conflict detection
  7. Practice recovery procedure

10. Troubleshooting

Issue: pg_rewind fails

Errorpg_rewind: fatal: could not find common ancestor

Cause: wal_log_hints not enabled or data too diverged.

Solution:

TEXT
# Check wal_log_hints
sudo -u postgres psql -c "SHOW wal_log_hints;"

# If off, enable:
sudo -u postgres psql -c "ALTER SYSTEM SET wal_log_hints = on;"
sudo systemctl restart postgresql

# If still fails, use pg_basebackup instead
patronictl reinit postgres node1

Issue: Replica stuck in recovery

Symptoms: Replica shows "running" but high lag.

Diagnosis:

TEXT
# Check replication status
sudo -u postgres psql -h 10.0.1.11 -c "
  SELECT * FROM pg_stat_replication;
"

# Check replica logs
sudo journalctl -u postgresql -n 100

Common causes:

  • WAL receiver crashed
  • Network issues
  • Disk full on replica
  • Archive restore errors

Solution:

TEXT
# Restart replication
sudo systemctl restart patroni

# If persists, reinit
patronictl reinit postgres node3

Issue: Cannot connect after recovery

ErrorFATAL: the database system is starting up

Cause: PostgreSQL still replaying WAL.

Solution: Wait for recovery to complete, or check logs for errors.

TEXT
# Check recovery progress
sudo -u postgres psql -h 10.0.1.13 -c "
  SELECT pg_is_in_recovery(),
         pg_last_wal_receive_lsn(),
         pg_last_wal_replay_lsn();
"

11. Tổng kết

Recovery Methods Summary

MethodSpeedData LossUse Case
Auto-rejoinFastestNoneClean shutdown/restart
pg_rewindFastNoneTimeline divergence
pg_basebackupSlowNoneCorruption, major divergence
Manual recoveryVariesPossibleSplit-brain, complex issues

Key Concepts

✅ Auto-rejoin - Patroni handles clean recovery automatically

✅ pg_rewind - Resync after timeline divergence (requires wal_log_hints)

✅ pg_basebackup - Full rebuild from primary (slow but safe)

✅ Timeline - History branch, increments on failover

✅ Split-brain - Multiple primaries (prevented by DCS leader lock)

Recovery Checklist

  •  Node failure detected
  •  Determine recovery method needed
  •  Backup diverged data (if any)
  •  Execute recovery (auto or manual)
  •  Verify timeline matches cluster
  •  Verify replication streaming
  •  Test read/write operations
  •  Check replication lag
  •  Update monitoring/documentation

Next Steps

Bài 16 sẽ cover Backup và Point-in-Time Recovery:

  • pg_basebackup strategies
  • WAL archiving configuration
  • Point-in-Time Recovery (PITR) procedures
  • Backup automation and scheduling
  • Disaster recovery planning

Share this article

You might also like

Browse all articles

Lesson 20: Security Best Practices

Learn about Lesson 20: Security Best Practices in PostgreSQL HA clusters with Patroni and etcd.

#Patroni#PostgreSQL#high availability

Lesson 19: Logging và Troubleshooting

Learn about Lesson 19: Logging và Troubleshooting in PostgreSQL HA clusters with Patroni and etcd.

#Patroni#PostgreSQL#high availability

Lesson 14: Planned Switchover

Learn about Lesson 14: Switchover - Planned Switchover in PostgreSQL HA clusters with Patroni and etcd.

#Patroni#PostgreSQL#high availability

Lesson 13: Automatic Failover

Learn about Lesson 13: Automatic Failover in PostgreSQL HA clusters with Patroni and etcd.

#Patroni#PostgreSQL#high availability

Lesson 12: Patroni REST API

Learn about Lesson 12: Patroni REST API in PostgreSQL HA clusters with Patroni and etcd.

#Patroni#PostgreSQL#high availability