CloudTadaInsights

Lesson 24: Upgrade Strategies

Upgrade Strategies

After this lesson, you will be able to:

  • Plan and execute PostgreSQL major version upgrades.
  • Upgrade Patroni with zero downtime.
  • Use pg_upgrade for in-place upgrades.
  • Implement logical replication for upgrades.
  • Rollback failed upgrades safely.

1. Upgrade Planning

1.1. Pre-upgrade checklist

TEXT
☐ Review PostgreSQL release notes
☐ Check extension compatibility
☐ Test upgrade in staging environment
☐ Backup all data (full + WAL archive)
☐ Document current versions
☐ Schedule maintenance window
☐ Notify stakeholders
☐ Prepare rollback plan
☐ Verify disk space (need 2x current data size)
☐ Check for deprecated features in new version
☐ Update monitoring/alerting
☐ Prepare downtime communication

1.2. Version compatibility matrix

From → ToMethodDowntimeRisk
17 → 18pg_upgradeMinutesLow
15 → 18pg_upgradeMinutesMedium
12 → 18Logical replicationNoneMedium
9.6 → 18Dump/restoreHoursHigh

1.3. Document current state

BASH
# PostgreSQL version
psql -c "SELECT version();"

# Installed extensions
psql -c "\dx"

# Database sizes
psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database;"

# Patroni version
patronictl version

# etcd version
etcdctl version

2. PostgreSQL Minor Version Upgrade

2.1. Minor upgrade process (e.g., 18.0 → 18.1)

BASH
# Minor upgrades are easy - just update packages

# On each node (one at a time):

# 1. Update packages
sudo apt-get update
sudo apt-get install --only-upgrade postgresql-18

# 2. Restart Patroni (will restart PostgreSQL)
sudo systemctl restart patroni

# 3. Verify new version
psql -c "SELECT version();"

# Patroni handles failover automatically during restart

2.2. Rolling minor upgrade

BASH
# Upgrade replicas first
for node in node2 node3; do
  echo "Upgrading $node..."
  ssh $node "sudo apt-get update && sudo apt-get install -y --only-upgrade postgresql-18"
  ssh $node "sudo systemctl restart patroni"
  sleep 30  # Wait for replica to catch up
done

# Switchover to upgraded replica
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Upgrade old leader (now replica)
ssh node1 "sudo apt-get update && sudo apt-get install -y --only-upgrade postgresql-18"
ssh node1 "sudo systemctl restart patroni"

3. PostgreSQL Major Version Upgrade with pg_upgrade

3.1. Architecture

TEXT
Before (PostgreSQL 17):
  node1 (17, Leader)
  node2 (17, Replica)
  node3 (17, Replica)

During upgrade:
  node1 (17, Leader) ← Still serving traffic
  node2 (18, NEW) ← Upgrading
  node3 (17, Replica)

After upgrade:
  node1 (18, Leader) ← Upgraded
  node2 (18, Replica)
  node3 (18, Replica)

3.2. Install new PostgreSQL version

BASH
# Install PostgreSQL 18 alongside 17
sudo apt-get install -y postgresql-18 postgresql-18-contrib

# Both versions now installed:
# /usr/lib/postgresql/17/
# /usr/lib/postgresql/18/

3.3. Prepare for upgrade (node2 - first replica)

BASH
# 1. Stop Patroni on node2
sudo systemctl stop patroni

# 2. Create new data directory for v18
sudo mkdir -p /var/lib/postgresql/18/data
sudo chown postgres:postgres /var/lib/postgresql/18/data

# 3. Initialize new cluster
sudo -u postgres /usr/lib/postgresql/18/bin/initdb \
  -D /var/lib/postgresql/18/data \
  --encoding=UTF8 \
  --data-checksums

# 4. Run pg_upgrade
sudo -u postgres /usr/lib/postgresql/18/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/17/data \
  --new-datadir=/var/lib/postgresql/18/data \
  --old-bindir=/usr/lib/postgresql/17/bin \
  --new-bindir=/usr/lib/postgresql/18/bin \
  --check  # Dry run first!

# If check passes, run actual upgrade:
sudo -u postgres /usr/lib/postgresql/18/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/17/data \
  --new-datadir=/var/lib/postgresql/18/data \
  --old-bindir=/usr/lib/postgresql/17/bin \
  --new-bindir=/usr/lib/postgresql/18/bin \
  --link  # Use hard links (faster)

# Expected output:
# Performing Consistency Checks
# -----------------------------
# ...
# Upgrade Complete
# ----------------

3.4. Update Patroni configuration for v18

YAML
# /etc/patroni/patroni.yml
postgresql:
  bin_dir: /usr/lib/postgresql/18/bin  # Changed from 17
  data_dir: /var/lib/postgresql/18/data  # Changed from 17
  # ... rest of config
BASH
# Start Patroni with new version
sudo systemctl start patroni

# Verify node2 is now running v18
psql -h node2 -U postgres -c "SELECT version();"

3.5. Upgrade remaining nodes

BASH
# Repeat process for node3
ssh node3 "sudo systemctl stop patroni"
ssh node3 "# ... same pg_upgrade steps ..."
ssh node3 "sudo systemctl start patroni"

# Finally, upgrade node1 (current leader)
# Switchover to node2 first
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Now upgrade node1
ssh node1 "sudo systemctl stop patroni"
ssh node1 "# ... same pg_upgrade steps ..."
ssh node1 "sudo systemctl start patroni"

# All nodes now on v18!
patronictl list

3.6. Post-upgrade tasks

BASH
# Run generated optimize scripts
sudo -u postgres ./analyze_new_cluster.sh
sudo -u postgres ./reindex_hash.sh  # If upgrading from < 10

# Update extensions
psql -c "ALTER EXTENSION pg_stat_statements UPDATE;"

# Vacuum analyze all databases
vacuumdb --all --analyze-in-stages

# Remove old cluster (after verifying everything works!)
# sudo -u postgres ./delete_old_cluster.sh

4. Zero-Downtime Upgrade with Logical Replication

4.1. Architecture

TEXT
Production (v17):
  node1 (v17, Leader) ← Serving traffic
    ↓ Logical replication
New cluster (v18):
  node4 (v18, Leader) ← Receiving changes
  node5 (v18, Replica)
  
After cutover:
  Application → node4 (v18) ← New primary

4.2. Setup new v18 cluster

BASH
# Install PostgreSQL 18 on new servers
# Setup Patroni cluster (node4, node5, node6)
# See previous lessons for installation

# Verify new cluster
patronictl -c /etc/patroni/patroni-v18.yml list

4.3. Create publication on v17 (source)

SQL
-- On node1 (v17 leader)
CREATE PUBLICATION pg17_to_pg18 FOR ALL TABLES;

-- Or specific tables:
-- CREATE PUBLICATION pg17_to_pg18 FOR TABLE users, orders, products;

-- Verify
SELECT * FROM pg_publication;

4.4. Create subscription on v18 (target)

SQL
-- On node4 (v18 leader)
CREATE SUBSCRIPTION pg18_from_pg17
CONNECTION 'host=node1 port=5432 dbname=myapp user=replicator password=rep_pass'
PUBLICATION pg17_to_pg18
WITH (copy_data = true, create_slot = true);

-- Monitor initial sync
SELECT * FROM pg_stat_subscription;

-- Wait for initial data copy to complete
-- subname     | pg18_from_pg17
-- received_lsn | 0/3000000
-- ...

4.5. Monitor replication lag

SQL
-- On v17 (source)
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name LIKE '%pg18%';

-- On v18 (target)
SELECT subname, 
       received_lsn, 
       latest_end_lsn,
       pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) AS lag
FROM pg_stat_subscription;

4.6. Cutover procedure

BASH
# 1. Stop writes to v17 (put app in maintenance mode)
# Or set database to read-only:
psql -h node1 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = on;"
psql -h node1 -U postgres -c "SELECT pg_reload_conf();"

# 2. Wait for replication to catch up
psql -h node4 -U postgres -c "SELECT pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) FROM pg_stat_subscription;"
# Should be 0 bytes

# 3. Disable subscription on v18
psql -h node4 -U postgres -c "ALTER SUBSCRIPTION pg18_from_pg17 DISABLE;"

# 4. Drop subscription (optional, after confirming everything works)
# psql -h node4 -U postgres -c "DROP SUBSCRIPTION pg18_from_pg17;"

# 5. Update application connection strings to point to node4

# 6. Enable writes on v18
psql -h node4 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = off;"
psql -h node4 -U postgres -c "SELECT pg_reload_conf();"

# 7. Verify application works on v18

# 8. Keep v17 cluster running for rollback (1-2 weeks)

5. Patroni Version Upgrade

5.1. Check compatibility

BASH
# Check current Patroni version
patronictl version

# Check PostgreSQL compatibility
# Patroni 3.2.0+ supports PostgreSQL 18
# See: https://github.com/zalando/patroni/releases

5.2. Upgrade Patroni (Python package)

BASH
# On each node:

# 1. Upgrade via pip
sudo pip3 install --upgrade patroni[etcd]

# Or specific version:
# sudo pip3 install patroni[etcd]==3.2.2

# 2. Verify new version
patronictl version

# 3. Restart Patroni service
sudo systemctl restart patroni

# No downtime - Patroni handles failover automatically

5.3. Rolling Patroni upgrade

BASH
# Upgrade replicas first
for node in node2 node3; do
  echo "Upgrading Patroni on $node..."
  ssh $node "sudo pip3 install --upgrade patroni[etcd]"
  ssh $node "sudo systemctl restart patroni"
  sleep 10
done

# Switchover to upgraded replica
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Upgrade old leader
ssh node1 "sudo pip3 install --upgrade patroni[etcd]"
ssh node1 "sudo systemctl restart patroni"

6. etcd Upgrade

6.1. etcd minor upgrade

BASH
# On each etcd node:
sudo systemctl stop etcd
sudo apt-get update
sudo apt-get install --only-upgrade etcd
sudo systemctl start etcd

# Verify cluster health
etcdctl endpoint health

6.2. etcd major upgrade (e.g., 3.4 → 3.5)

BASH
# Follow official etcd upgrade guide
# https://etcd.io/docs/latest/upgrades/

# Key steps:
# 1. Backup etcd data
etcdctl snapshot save backup.db

# 2. Upgrade one member at a time
# 3. Verify cluster health after each upgrade
# 4. Update Patroni to use new etcd API version

7. Rollback Strategies

7.1. Rollback pg_upgrade

BASH
# Before deleting old cluster, you can rollback

# 1. Stop Patroni
sudo systemctl stop patroni

# 2. Restore old configuration
# /etc/patroni/patroni.yml
postgresql:
  bin_dir: /usr/lib/postgresql/17/bin  # Back to 17
  data_dir: /var/lib/postgresql/17/data

# 3. Start Patroni
sudo systemctl start patroni

# Old cluster resumes operation

7.2. Rollback logical replication

BASH
# If cutover to v18 fails, rollback to v17

# 1. Stop application

# 2. Set v17 to read-write
psql -h node1 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = off;"
psql -h node1 -U postgres -c "SELECT pg_reload_conf();"

# 3. Update application connection strings to v17

# 4. Resume normal operations

# Note: Any writes to v18 during cutover will be LOST!
# Consider setting up reverse replication v18 → v17 if needed

7.3. Rollback Patroni upgrade

BASH
# Downgrade Patroni if upgrade causes issues

sudo pip3 install patroni[etcd]==3.1.2  # Previous version
sudo systemctl restart patroni

8. Testing Upgrades

8.1. Staging environment test

BASH
# 1. Clone production to staging
pg_basebackup -h prod-leader -D /var/lib/postgresql/staging -X stream

# 2. Perform upgrade in staging
# ... follow upgrade procedures ...

# 3. Run application tests
# ... smoke tests, integration tests ...

# 4. Benchmark performance
pgbench -i -s 100 myapp
pgbench -c 10 -j 2 -t 1000 myapp

# 5. Document issues and timings

8.2. Upgrade rehearsal

BASH
# Practice upgrade multiple times
# Time each step
# Identify bottlenecks
# Refine procedures

# Example timing log:
# Step 1: Stop Patroni - 5 seconds
# Step 2: pg_upgrade --check - 30 seconds
# Step 3: pg_upgrade - 10 minutes
# Step 4: Start Patroni - 15 seconds
# Step 5: Replication catchup - 2 minutes
# Total: ~13 minutes

9. Best Practices

✅ DO

  1. Test in staging first: Multiple times.
  2. Backup everything: Full backup + WAL archive.
  3. Use pg_upgrade --check: Catch issues early.
  4. Document procedures: Step-by-step runbook.
  5. Schedule maintenance window: Off-peak hours.
  6. Monitor closely: During and after upgrade.
  7. Keep old version: Don't delete for 1-2 weeks.
  8. Use logical replication: For zero-downtime.
  9. Upgrade extensions: After PostgreSQL upgrade.
  10. Vacuum analyze: After major upgrade.

❌ DON'T

  1. Don't skip backups: Critical safety net.
  2. Don't upgrade all at once: Rolling upgrades.
  3. Don't delete old cluster immediately: Keep for rollback.
  4. Don't ignore release notes: Breaking changes.
  5. Don't skip testing: Staging is essential.
  6. Don't upgrade during peak hours: Plan maintenance window.
  7. Don't forget about extensions: May need updates.

10. Lab Exercises

Lab 1: Minor version upgrade

Tasks:

  1. Check current PostgreSQL version.
  2. Update packages on replica.
  3. Restart Patroni on replica.
  4. Switchover to upgraded replica.
  5. Upgrade old leader.

Lab 2: Major version upgrade with pg_upgrade

Tasks:

  1. Install PostgreSQL 18 on all nodes.
  2. Run pg_upgrade --check on replica.
  3. Perform pg_upgrade on replica.
  4. Update Patroni configuration.
  5. Complete rolling upgrade.

Lab 3: Zero-downtime upgrade with logical replication

Tasks:

  1. Setup new v18 cluster.
  2. Create publication on v17.
  3. Create subscription on v18.
  4. Monitor replication lag.
  5. Perform cutover.
  6. Verify application functionality.

Lab 4: Rollback procedure

Tasks:

  1. Simulate failed upgrade.
  2. Stop Patroni on all nodes.
  3. Restore old configuration.
  4. Restart old cluster.
  5. Verify rollback successful.

11. Summary

Upgrade Methods Comparison

MethodDowntimeComplexityRiskUse Case
pg_upgradeMinutesLowLowMinor version jumps
Logical replicationNoneHighMediumZero-downtime required
Dump/restoreHoursLowLowAncient versions
pg_upgrade --linkSecondsMediumMediumSame server upgrade

Typical Timeline

TEXT
Week 1-2: Planning and preparation
  - Review release notes
  - Test in staging
  - Document procedures

Week 3: Dress rehearsal
  - Full upgrade test in staging
  - Time each step
  - Identify issues

Week 4: Production upgrade
  - Execute during maintenance window
  - Monitor closely
  - Be ready to rollback

Week 5-6: Stabilization
  - Monitor for issues
  - Performance tuning
  - Keep old cluster for safety

Week 7+: Cleanup
  - Delete old cluster
  - Update documentation
  - Post-mortem review

Upgrade Checklist

TEXT
Pre-upgrade:
☐ Full backup completed
☐ Staging test successful
☐ Release notes reviewed
☐ Maintenance window scheduled
☐ Rollback plan documented
☐ Stakeholders notified

During upgrade:
☐ Backups verified
☐ pg_upgrade --check passed
☐ Upgrade completed
☐ Replication working
☐ Application connectivity verified

Post-upgrade:
☐ Extensions updated
☐ Vacuum analyze completed
☐ Performance validated
☐ Monitoring updated
☐ Documentation updated

Next Steps

Lesson 25 will cover Real-world Case Studies:

  • Production architecture examples
  • Scaling to 1000+ queries/second
  • Cost optimization techniques
  • Lessons learned from failures
  • Industry-specific implementations

Share this article

You might also like

Browse all articles

Lesson 9: Bootstrap PostgreSQL Cluster

Learn how to bootstrap a Patroni cluster including starting Patroni for the first time on 3 nodes, verifying cluster status with patronictl, checking replication, troubleshooting common issues, and testing basic failover.

#Patroni#bootstrap#cluster

Lesson 8: Detailed Patroni Configuration

Learn detailed Patroni configuration including all sections of patroni.yml, bootstrap options, PostgreSQL parameters tuning, authentication setup, tags and constraints, and timing parameters optimization.

#Patroni#configuration#parameters

Lesson 7: Installing Patroni

Learn how to install Patroni, including setting up Python dependencies, installing via pip, understanding the patroni.yml configuration structure, creating systemd service, and configuring Patroni on 3 nodes for PostgreSQL high availability.

#Patroni#installation#configuration

Lesson 4: Infrastructure Preparation for PostgreSQL HA

Setting up the infrastructure for PostgreSQL High Availability with Patroni and etcd, including hardware requirements, network configuration, firewall, SSH keys, and time synchronization.

#Database#PostgreSQL#Infrastructure

Lesson 3: Introduction to Patroni and etcd

Understanding Patroni and etcd for PostgreSQL High Availability, including DCS, Raft consensus algorithm, leader election, and split-brain prevention mechanisms.

#Database#PostgreSQL#Patroni