CloudTadaInsights

Automating PostgreSQL High Availability with Ansible, Patroni, and etcd

Deploying a PostgreSQL High Availability cluster shouldn't be a manual, error-prone process. In this guide, we'll leverage Ansible automation to deploy a robust HA cluster using Patroni, etcd, and PgBouncer.

This setup provides automatic failover, distributed configuration, and high-performance connection pooling, all optimized for production-grade environments.

Architecture Overview

Our cluster consists of three nodes, each running PostgreSQL, Patroni, and PgBouncer, with a distributed etcd cluster for consensus.

TEXT
┌────────────────────────────────────────────────────────────────┐
│                      Application Layer                         │
│  (Spring Boot / Python / Node.js / Java / .NET / etc.)         │
└──────────────┬──────────────┬──────────────┬───────────────────┘
               │              │              │
          Port 6432      Port 6432      Port 6432
               │              │              │
┌──────────────▼──────┐ ┌─────▼──────┐ ┌────▼───────────────────┐
│   PgBouncer (Node1) │ │ PgBouncer  │ │  PgBouncer (Node3)     │
│   Connection Pool   │ │   (Node2)  │ │  Connection Pool       │
│   Max: 1000 clients │ │            │ │  Max: 1000 clients     │
└──────────┬──────────┘ └─────┬──────┘ └────┬───────────────────┘
           │                  │              │
      Port 5432          Port 5432      Port 5432
           │                  │              │
┌──────────▼──────────┐ ┌─────▼──────┐ ┌────▼───────────────────┐
│  PostgreSQL 18.1    │ │ PostgreSQL │ │  PostgreSQL 18.1       │
│  Primary (Leader)   │ │  Replica   │ │  Replica               │
│  Read/Write         │ │  Read Only │ │  Read Only             │
└──────────┬──────────┘ └─────┬──────┘ └────┬───────────────────┘
           │                  │              │
      Port 8008          Port 8008      Port 8008
           │                  │              │
┌──────────▼──────────┐ ┌─────▼──────┐ ┌────▼───────────────────┐
│   Patroni 4.1.0     │ │  Patroni   │ │  Patroni 4.1.0         │
│   HA Manager        │ │ HA Manager │ │  HA Manager            │
└──────────┬──────────┘ └─────┬──────┘ └────┬───────────────────┘
           │                  │              │
      Port 2379          Port 2379      Port 2379
           │                  │              │
┌──────────▼──────────────────▼──────────────▼───────────────────┐
│                etcd 3.5.25 Cluster                             │
│          Distributed Configuration & Leader Election           │
│   Node1 (etcd1)  │  Node2 (etcd2)  │  Node3 (etcd3)            │
└────────────────────────────────────────────────────────────────┘

Key Features:

  • High Availability: Automatic failover (30-45s) managed by Patroni.
  • Connection Pooling: PgBouncer for efficient resource utilization.
  • Data Integrity: Optimized for SSDs with reliable streaming replication.
  • Production Ready: Full configuration externalized to a single .env file.

Step 1. Requirements & Preparation

Before running the playbooks, ensure your environment meets the following specifications.

Hardware (per node)

  • Minimum: 2 Cores, 4GB RAM, 20GB Disk.
  • Recommended: 4-8 Cores, 16-32GB RAM, 100GB+ NVMe SSD.

Network Configuration

The following ports must be accessible internally within the cluster:

ServicePortProtocolPurpose
PgBouncer6432TCPConnection pooling (PRIMARY ACCESS)
PostgreSQL5432TCPDirect DB connections (Admin)
Patroni API8008TCPHealth checks & Cluster management
etcd Client2379TCPetcd-to-Patroni communication
etcd Peer2380TCPetcd intra-cluster replication

Step 2. Project Configuration

All cluster settings are managed through a central .env file.

1. Clone the Automation Repository

BASH
git clone https://github.com/xdev-asia-labs/postgres-patroni-etcd-install.git
cd postgres-patroni-etcd-install

2. Configure Environment Variables

Copy the template and update with your node IPs and strong passwords.

BASH
cp .env.example .env
nano .env

Critical Variables to Set:

BASH
# Node IP Addresses
NODE1_IP=10.0.0.11
NODE2_IP=10.0.0.12
NODE3_IP=10.0.0.13

# PostgreSQL Passwords
POSTGRESQL_SUPERUSER_PASSWORD=your_strong_password
POSTGRESQL_REPLICATION_PASSWORD=your_strong_password
POSTGRESQL_ADMIN_PASSWORD=your_strong_password

# Performance tuning for 16GB RAM
POSTGRESQL_SHARED_BUFFERS=4GB
POSTGRESQL_EFFECTIVE_CACHE_SIZE=12GB

3. Update Ansible Inventory

Edit inventory/hosts.yml to reflect your node infrastructure:

YAML
all:
  children:
    postgres:
      hosts:
        pg-node1:
          ansible_host: 10.0.0.11
          patroni_name: node1
          etcd_name: etcd1
        pg-node2:
          ansible_host: 10.0.0.12
          patroni_name: node2
          etcd_name: etcd2
        pg-node3:
          ansible_host: 10.0.0.13
          patroni_name: node3
          etcd_name: etcd3

Step 3. Deploying the Cluster

Load your environment variables and execute the main site playbook.

BASH
# Load environment variables
set -a && source .env && set +a

# Run the deployment
ansible-playbook playbooks/site.yml -i inventory/hosts.yml

You can also deploy specific components using tags: --tags postgresql, --tags etcd, --tags patroni, or --tags pgbouncer.

Step 4. Verification & Management

Once the deployment completes, use patronictl to verify the cluster state from any node.

BASH
ssh root@${NODE1_IP} "patronictl -c /etc/patroni/patroni.yml list"

Expected Cluster Output:

Cluster: postgres (7441307089994301601)

MemberHostRoleStateTLLag in MB
pg-node110.0.0.11Leaderrunning2
pg-node210.0.0.12Replicarunning20
pg-node310.0.0.13Replicarunning20

Common Management Tasks

  • Check etcd health: etcdctl --endpoints=http://10.0.0.11:2379 endpoint health
  • Manual Switchover: patronictl -c /etc/patroni/patroni.yml switchover
  • Check PgBouncer Stats: Connect to port 6432 and run SHOW POOLS;

Step 5. Application Connection

Important: Applications should always connect to PgBouncer (port 6432) rather than PostgreSQL (port 5432) for better resource management.

JDBC (Java/Spring Boot)

YAML
spring:
  datasource:
    url: jdbc:postgresql://10.0.0.11:6432,10.0.0.12:6432,10.0.0.13:6432/postgres?targetServerType=primary&loadBalanceHosts=true
    username: postgres
    password: ${POSTGRESQL_SUPERUSER_PASSWORD}

Python (psycopg2)

PYTHON
import psycopg2
conn = psycopg2.connect(
    host="10.0.0.11,10.0.0.12,10.0.0.13",
    port=6432,
    database="postgres",
    target_session_attrs="read-write"
)

Transaction vs Session Pooling

Choosing the right pool mode in PgBouncer is critical for application performance:

  • Transaction Mode (Recommended for Microservices): Each query borrows a connection and releases it immediately. This is the default in our setup (3000 client → 225 backends).
  • Session Mode (Recommended for Interactive Tools): Required for tools like DBeaver or pgAdmin that maintain temporary tables or prepared statements.

Troubleshooting Common Issues

1. Environment Variables Not Loading

Symptom: Ansible uses default values instead of .env settings. Solution: Always ensure you load the environment before running playbooks:

BASH
set -a && source .env && set +a

2. Patroni Fails to Start

Symptom: systemctl status patroni shows a failed state. Solution: Check the logs using journalctl -u patroni -n 100. Common causes include:

  • Incorrect bin_dir in .env (ensure it matches your PostgreSQL version).
  • Conflict with a manually started PostgreSQL instance. Patroni must be the one managing the startup.

3. etcd Cluster Unhealthy

Symptom: Endpoint health check fails. Solution: Verify connectivity between nodes on ports 2379 and 2380. Check individual node status:

BASH
ssh root@${NODE1_IP} "systemctl status etcd"

Security & Performance Characteristics

This deployment includes production-grade hardening:

  • SCRAM-SHA-256 Authentication: Replaces insecure MD5.
  • Firewall Isolation: Automated UFW/firewalld rules for cluster communication.
  • Scalable Performance: Optimized for 16GB RAM with shared_buffers at 25% and effective_cache_size at 75%.

Conclusion

Automating your PostgreSQL HA deployment with Ansible removes the complexity of manual configurations. By combining Patroni's cluster management with etcd's consensus and PgBouncer's connection pooling, you create a resilient, high-performance database infrastructure that is truly production-ready.

You might also like

Browse all articles
Series

PostgreSQL Security Hardening Guide

Essential security features and hardening measures for PostgreSQL HA cluster deployment with Patroni, etcd, and PgBouncer. Follow this guide for production security best practices.

#Database#PostgreSQL#Security

Lesson 4: Infrastructure Preparation for PostgreSQL HA

Setting up the infrastructure for PostgreSQL High Availability with Patroni and etcd, including hardware requirements, network configuration, firewall, SSH keys, and time synchronization.

#Database#PostgreSQL#Infrastructure

Lesson 3: Introduction to Patroni and etcd

Understanding Patroni and etcd for PostgreSQL High Availability, including DCS, Raft consensus algorithm, leader election, and split-brain prevention mechanisms.

#Database#PostgreSQL#Patroni