CloudTadaInsights

Lesson 1: Overview of PostgreSQL High Availability

Overview of PostgreSQL High Availability

Learning Objectives

After this lesson, you will:

  • Understand why High Availability (HA) is important for database systems
  • Master different HA deployment methods for PostgreSQL
  • Compare the advantages and disadvantages of Patroni, Repmgr and Pacemaker
  • Understand the overall architecture of PostgreSQL HA systems

1. Why do we need High Availability?

1.1. Problems with Single Point of Failure (SPOF)

In a traditional single-server database system:

Single Point of Failure (SPOF)

Consequences when database server fails:

  • Downtime: Applications cannot access data
  • Revenue loss: Each minute of downtime can result in significant revenue loss
  • Loss of reputation: Users cannot use the service
  • Data loss: If there is no timely backup

1.2. Common causes of downtime

CausePercentageImpact
Hardware failures (disk, RAM, CPU)30%High
Network failures20%Medium
Software bugs25%High
Planned maintenance15%Controllable
Human error10%High

1.3. What is High Availability?

High Availability (HA) is the ability of a system to maintain continuous operation even when one or more components fail.

HA measurement metrics:

AvailabilityDowntime/yearDowntime/monthLevel
99% (2 nines)3.65 days7.2 hoursLow
99.9% (3 nines)8.76 hours43.2 minutesMedium
99.99% (4 nines)52.56 minutes4.32 minutesHigh
99.999% (5 nines)5.26 minutes25.9 secondsVery High

1.4. Benefits of HA

Business Benefits:

  • Minimize downtime and revenue loss
  • Increase system reliability
  • Improve user experience
  • Meet SLA (Service Level Agreement)

Technical Benefits:

  • Automatic failover when primary server fails
  • Zero-downtime maintenance
  • Load balancing for read queries
  • Disaster recovery
  • Data protection

2. PostgreSQL HA Methods

2.1. Log-Shipping (WAL Shipping)

Log-Shipping (WAL Shipping)

How it works:

  • Primary server writes WAL (Write-Ahead Log) files
  • WAL files are copied to standby server
  • Standby server replays WAL to synchronize data

Advantages:

  • Simple, easy setup
  • Low resource consumption

Disadvantages:

  • High Recovery Time Objective (RTO) (minutes → hours)
  • No automatic failover
  • Data loss may occur
  • Standby cannot query (warm standby)

2.2. Streaming Replication

How it works:

  • Primary streams WAL records in real-time to standby
  • Standby applies changes immediately
  • Standby can serve read queries (hot standby)

Log-Shipping (WAL Shipping)

Advantages:

  • Low latency (< 1 second)
  • Hot standby can serve read queries
  • Synchronous mode reduces data loss

Disadvantages:

  • Still requires manual failover
  • Needs external tools for automation

2.3. Logical Replication

How it works:

  • Replicate at logical level (tables, rows)
  • Allows selective data replication
  • Publisher → Subscriber model

Advantages:

  • Replication between different PostgreSQL versions
  • Selective replication (only certain tables)
  • Multi-master possible (with BDR)

Disadvantages:

  • Higher overhead than physical replication
  • Not the main HA solution (usually used for data distribution)

2.4. Shared Storage (SAN)

Shared Storage (SAN)

Advantages:

  • Fast failover (just start PostgreSQL)
  • No data loss

Disadvantages:

  • Expensive (needs SAN infrastructure)
  • SAN becomes single point of failure
  • Complex to maintain

3. Comparison: Patroni vs Repmgr vs Pacemaker

3.1. Patroni

Characteristics:

  • Python-based
  • Uses DCS (etcd, Consul, ZooKeeper) to store cluster state
  • REST API for management
  • Intelligent automatic failover
  • Template-based configuration

Advantages:

  • ✅ Easy to install and configure
  • ✅ Powerful REST API
  • ✅ Good Kubernetes integration
  • ✅ Active development, large community
  • ✅ Automatic leader election
  • ✅ Rolling restart, zero-downtime updates

Disadvantages:

  • ❌ Depends on DCS (adds component)
  • ❌ Need to learn DCS (etcd/Consul)

Suitable use cases:

  • Cloud-native applications
  • Kubernetes deployments
  • Microservices architecture
  • High automation needs

3.2. Repmgr

Characteristics:

  • Open-source tool from 2ndQuadrant (EnterpriseDB)
  • Standalone tool, no DCS required
  • Witness node for quorum voting
  • Command-line based management

Advantages:

  • ✅ No external DCS needed
  • ✅ Simpler than Patroni
  • ✅ Good documentation
  • ✅ Mature and stable

Disadvantages:

  • ❌ Fewer automation features than Patroni
  • ❌ No REST API
  • ❌ Smaller community
  • ❌ More complex failover

Suitable use cases:

  • Traditional infrastructure
  • Simple, few nodes
  • Don't want to add DCS

3.3. Pacemaker + Corosync

Characteristics:

  • High Availability cluster framework (Linux-HA)
  • Manages multiple types of resources, not just PostgreSQL
  • Voting quorum mechanism
  • Fencing/STONITH to avoid split-brain

Advantages:

  • ✅ Mature, production-proven (20+ years)
  • ✅ Manages multiple services (PostgreSQL, web server, etc.)
  • ✅ Strong fencing mechanism
  • ✅ Supports shared storage

Disadvantages:

  • ❌ Very complex to setup and maintain
  • ❌ High learning curve
  • ❌ XML configuration difficult to read
  • ❌ Difficult debugging

Suitable use cases:

  • Enterprise environment
  • Need to manage multiple services
  • Has shared storage (SAN)
  • Team has Pacemaker experience

3.4. Overall comparison table

CriteriaPatroniRepmgrPacemaker
ComplexityMediumLowHigh
Learning curveMediumLowVery High
Setup timeFastFastSlow
Automatic failover✅ Excellent✅ Good✅ Excellent
REST API✅ Yes❌ No❌ No
Kubernetes support✅ Excellent⚠️ Limited❌ No
Community⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Documentation⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
DependenciesDCS (etcd/Consul)NoneNone
Best forModern/CloudSimple setupsEnterprise/Complex

3.5. Recommendations

Choose Patroni if:

  • Deploying on cloud or Kubernetes
  • Need automation and REST API
  • Team has experience with modern DevOps tools
  • This is the most popular choice today

Choose Repmgr if:

  • Simple setup, few nodes (2-3)
  • Don't want DCS dependency
  • Team familiar with traditional PostgreSQL tools

Choose Pacemaker if:

  • Complex enterprise environment
  • Already has Pacemaker infrastructure
  • Need to manage multiple services at once
  • Has shared storage (SAN)

4. Overall System Architecture Patroni + etcd

4.1. 3-node cluster architecture

Overall System Architecture Patroni + etcd

4.2. Main components

PostgreSQL

  • Main database engine
  • One node is Leader (read/write)
  • Other nodes are Replicas (read-only)
  • Uses Streaming Replication to synchronize

Patroni

  • Manages PostgreSQL lifecycle
  • Monitors node health
  • Performs automatic failover
  • Exposes REST API (
    ) to query cluster state
  • Reads/writes configuration to DCS

etcd (DCS - Distributed Configuration Store)

  • Stores cluster state and configuration
  • Leader election (decides which node is Leader)
  • Distributed lock mechanism
  • 3 etcd nodes form a quorum (majority voting)
  • Load balancer
  • Route write traffic → Leader
  • Route read traffic → Replicas (round-robin)
  • Health check and automatic routing when failover occurs

4.3. Operation flow

1. Normal Operations (Normal state)

TEXT
1. Application sends query → HAProxy
2. HAProxy checks health check
3. Route write → Leader, read → Replicas
4. Patroni on each node:
   - Send heartbeat to etcd every 10s
   - Update health status
   - Maintain leader lease

2. Leader Failure Detection

TEXT
1. Node 1 (Leader) fails → stop heartbeat
2. etcd detects: leader lease expired (30s)
3. Patroni on Node 2 and Node 3 realizes
4. Leader election is triggered

3. Automatic Failover Process

TEXT
Timeline: 0s  ──────────► 30s ──────► 45s ──────► 60s
          │              │           │            │
      Leader dies    etcd detects  New leader  Applications
                     lease expire   elected     reconnect
                                   (Node 2)
                                   
Node 1:   LEADER ──────► DOWN ──────────────────► STANDBY (after recovery)
Node 2:   REPLICA ─────────────────► LEADER ────► LEADER
Node 3:   REPLICA ──────────────────────────────► REPLICA

4. After Failover

TEXT
- Node 2 becomes new Leader
- Node 3 remains Replica, changes replication source to Node 2
- HAProxy automatically detects and routes traffic to Node 2
- Node 1 (when recovered) will rejoin as Replica

4.4. Important scenarios

Scenario 1: Planned Switchover

TEXT
# Admin wants maintenance on Node 1 (Leader)
$ patronictl switchover postgres-cluster

# Patroni will:
1. Pause writing to current Leader
2. Wait for Replica to sync completely (zero lag)
3. Promote Replica → Leader
4. Demote old Leader → Replica
5. Zero data loss, downtime < 5s

Scenario 2: Split-brain Prevention

TEXT
Situation: Network partition between nodes

etcd quorum (3 nodes):
- Partition A: Node 1, Node 2 (2 nodes = majority)
- Partition B: Node 3 (1 node = minority)

Result:
✅ Partition A: Continues operation, can elect leader
❌ Partition B: Cannot elect leader (insufficient quorum)

→ Prevents 2 leaders existing simultaneously!

Scenario 3: Node Recovery

TEXT
Node 1 recovers after dying:

1. Patroni starts and reads cluster state from etcd
2. Realizes Node 2 is currently Leader
3. Automatically rejoins as Replica
4. Uses pg_rewind to sync data if there's divergence
5. Begins streaming replication from Node 2

4.5. Configuration timeline (important parameters)

TEXT
# patroni.yml
bootstrap:
  dcs:
    ttl: 30                    # Leader lease time (30s)
    loop_wait: 10              # Check interval (10s)
    retry_timeout: 10          # Retry time
    maximum_lag_on_failover: 1048576  # Max lag for failover candidate (1MB)

Explanation:

  • ttl: 30: Leader must renew lease every 30s, otherwise considered dead
  • loop_wait: 10: Patroni checks health every 10s
  • Failover trigger: when (ttl - loop_wait) expires → ~20-30s

5. Summary

Key Takeaways

  1. High Availability is mandatory for production systems to reduce downtime and data loss
  2. Streaming Replication + Automatic Failover is the most popular HA method for PostgreSQL
  3. Patroni is the best choice for most modern use cases:
    • Easy setup and maintenance
    • Intelligent automatic failover
    • Powerful REST API
    • Good cloud/K8s integration
  4. 3-node architecture with Patroni + etcd provides:
    • Automatic failover (RTO < 30s)
    • Zero data loss with sync replication
    • Split-brain prevention
    • Scalability for read workloads

Homework

  1. Calculate downtime for your system with different availability levels (99%, 99.9%, 99.99%)
  2. Draw HA architecture for your specific use case (number of nodes, data centers, RTO/RPO requirements)
  3. Compare costs between using HA and accepting downtime for your business

Preparation for next lesson

Lesson 2 will dive deep into Streaming Replication - the foundation of PostgreSQL HA:

  • Detailed mechanism of WAL operation
  • Synchronous vs Asynchronous replication
  • Replication slots
  • Lab: Manual replication setup

References

You might also like

Browse all articles

Lesson 3: Introduction to Patroni and etcd

Understanding Patroni and etcd for PostgreSQL High Availability, including DCS, Raft consensus algorithm, leader election, and split-brain prevention mechanisms.

#Database#PostgreSQL#Patroni

Lesson 2: PostgreSQL Streaming Replication

Deep dive into PostgreSQL Streaming Replication, covering WAL mechanisms, synchronous vs asynchronous replication, replication slots, and hands-on lab setup.

#Database#PostgreSQL#Replication

Lesson 4: Infrastructure Preparation for PostgreSQL HA

Setting up the infrastructure for PostgreSQL High Availability with Patroni and etcd, including hardware requirements, network configuration, firewall, SSH keys, and time synchronization.

#Database#PostgreSQL#Infrastructure
Series

PostgreSQL Security Hardening Guide

Essential security features and hardening measures for PostgreSQL HA cluster deployment with Patroni, etcd, and PgBouncer. Follow this guide for production security best practices.

#Database#PostgreSQL#Security