Deploying a PostgreSQL High Availability cluster shouldn't be a manual, error-prone process. In this guide, we'll leverage Ansible automation to deploy a robust HA cluster using Patroni, etcd, and PgBouncer.
This setup provides automatic failover, distributed configuration, and high-performance connection pooling, all optimized for production-grade environments.
Architecture Overview
Our cluster consists of three nodes, each running PostgreSQL, Patroni, and PgBouncer, with a distributed etcd cluster for consensus.
Key Features:
- High Availability: Automatic failover (30-45s) managed by Patroni.
- Connection Pooling: PgBouncer for efficient resource utilization.
- Data Integrity: Optimized for SSDs with reliable streaming replication.
- Production Ready: Full configuration externalized to a single
.envfile.
Step 1. Requirements & Preparation
Before running the playbooks, ensure your environment meets the following specifications.
Hardware (per node)
- Minimum: 2 Cores, 4GB RAM, 20GB Disk.
- Recommended: 4-8 Cores, 16-32GB RAM, 100GB+ NVMe SSD.
Network Configuration
The following ports must be accessible internally within the cluster:
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| PgBouncer | 6432 | TCP | Connection pooling (PRIMARY ACCESS) |
| PostgreSQL | 5432 | TCP | Direct DB connections (Admin) |
| Patroni API | 8008 | TCP | Health checks & Cluster management |
| etcd Client | 2379 | TCP | etcd-to-Patroni communication |
| etcd Peer | 2380 | TCP | etcd intra-cluster replication |
Step 2. Project Configuration
All cluster settings are managed through a central .env file.
1. Clone the Automation Repository
2. Configure Environment Variables
Copy the template and update with your node IPs and strong passwords.
Critical Variables to Set:
3. Update Ansible Inventory
Edit inventory/hosts.yml to reflect your node infrastructure:
Step 3. Deploying the Cluster
Load your environment variables and execute the main site playbook.
You can also deploy specific components using tags: --tags postgresql, --tags etcd, --tags patroni, or --tags pgbouncer.
Step 4. Verification & Management
Once the deployment completes, use patronictl to verify the cluster state from any node.
Expected Cluster Output:
Cluster: postgres (7441307089994301601)
| Member | Host | Role | State | TL | Lag in MB |
|---|---|---|---|---|---|
| pg-node1 | 10.0.0.11 | Leader | running | 2 | |
| pg-node2 | 10.0.0.12 | Replica | running | 2 | 0 |
| pg-node3 | 10.0.0.13 | Replica | running | 2 | 0 |
Common Management Tasks
- Check etcd health:
etcdctl --endpoints=http://10.0.0.11:2379 endpoint health - Manual Switchover:
patronictl -c /etc/patroni/patroni.yml switchover - Check PgBouncer Stats: Connect to port 6432 and run
SHOW POOLS;
Step 5. Application Connection
Important: Applications should always connect to PgBouncer (port 6432) rather than PostgreSQL (port 5432) for better resource management.
JDBC (Java/Spring Boot)
Python (psycopg2)
Transaction vs Session Pooling
Choosing the right pool mode in PgBouncer is critical for application performance:
- Transaction Mode (Recommended for Microservices): Each query borrows a connection and releases it immediately. This is the default in our setup (3000 client → 225 backends).
- Session Mode (Recommended for Interactive Tools): Required for tools like DBeaver or pgAdmin that maintain temporary tables or prepared statements.
Troubleshooting Common Issues
1. Environment Variables Not Loading
Symptom: Ansible uses default values instead of .env settings.
Solution: Always ensure you load the environment before running playbooks:
2. Patroni Fails to Start
Symptom: systemctl status patroni shows a failed state.
Solution: Check the logs using journalctl -u patroni -n 100. Common causes include:
- Incorrect
bin_dirin.env(ensure it matches your PostgreSQL version). - Conflict with a manually started PostgreSQL instance. Patroni must be the one managing the startup.
3. etcd Cluster Unhealthy
Symptom: Endpoint health check fails. Solution: Verify connectivity between nodes on ports 2379 and 2380. Check individual node status:
Security & Performance Characteristics
This deployment includes production-grade hardening:
- SCRAM-SHA-256 Authentication: Replaces insecure MD5.
- Firewall Isolation: Automated UFW/firewalld rules for cluster communication.
- Scalable Performance: Optimized for 16GB RAM with
shared_buffersat 25% andeffective_cache_sizeat 75%.
Conclusion
Automating your PostgreSQL HA deployment with Ansible removes the complexity of manual configurations. By combining Patroni's cluster management with etcd's consensus and PgBouncer's connection pooling, you create a resilient, high-performance database infrastructure that is truly production-ready.