PostgreSQL High Availability with Patroni & etcd
Lesson 1: Overview of PostgreSQL High Availability
- Why do we need HA?
- HA methods for PostgreSQL
- Comparison: Patroni vs Repmgr vs Pacemaker
- Overview architecture of the system
Lesson 2: Streaming Replication in PostgreSQL
- How Streaming Replication works
- WAL (Write-Ahead Logging)
- Synchronous vs Asynchronous Replication
- Replication Slots
- Lab: Setup simple replication (Primary-Standby)
Lesson 3: Introduction to Patroni and etcd
- What is Patroni? How it works
- DCS (Distributed Configuration Store) - etcd/Consul/ZooKeeper
- Consensus algorithm (Raft)
- Leader election & Failover mechanism
- Split-brain problem and solutions
Lesson 4: Infrastructure Preparation
- Hardware & software requirements
- Network and firewall configuration
- Setup 3 VMs/Servers (VirtualBox/VMware/Cloud)
- SSH key-based authentication
- Time synchronization (NTP/chrony)
Lesson 5: PostgreSQL Installation
- Install PostgreSQL from package repository
- Install from source (optional)
- Basic postgresql.conf configuration
- Understanding pg_hba.conf
- Lab: Install PostgreSQL on 3 nodes
Lesson 6: etcd Cluster Installation and Configuration
- Download and install etcd
- Configure 3-node etcd cluster
- Create systemd service
- Check etcd cluster health
- Basic etcdctl commands
- Lab: Complete etcd cluster setup
Lesson 7: Patroni Installation
- Install Python dependencies
- Install Patroni via pip
- patroni.yml file structure
- Create systemd service for Patroni
- Lab: Install Patroni on 3 nodes
Lesson 8: Detailed Patroni Configuration
- Analyze each section of patroni.yml file
- Bootstrap configuration
- PostgreSQL parameters tuning
- Authentication setup
- Tags and constraints
Lesson 9: Bootstrap PostgreSQL Cluster
- First-time Patroni startup
- Automatic bootstrap process
- Check cluster status with patronictl
- Troubleshooting common issues
- Lab: Successful cluster bootstrap
Lesson 10: Replication Management
- Synchronous vs Asynchronous replicas
- Configure synchronous_mode
- synchronous_node_count
- Monitor replication lag
- Lab: Configure sync replication
Lesson 11: Callback Scripts and Hooks
- on_start, on_stop, on_role_change callbacks
- Custom scripts for notifications
- Integration with monitoring systems
- Lab: Write script to send alerts during failover
Lesson 12: REST API and patronictl
- Patroni REST API endpoints
- Using patronictl commands
- Automation with API
- Lab: Manage cluster via CLI and API
Failover & Recovery** (4 lessons)
Lesson 13: Automatic Failover
- Failure detection mechanism
- Leader election process
- Failover timeline
- Testing automatic failover
- Lab: Simulate primary node failure
Lesson 14: Planned Switchover
- When is switchover needed?
- Planned switchover vs Failover
- Zero-downtime maintenance
- Lab: Perform switchover
Lesson 15: Recovering Failed Nodes
- Rejoin failed primary
- pg_rewind mechanism
- Rebuild replica from backup
- Lab: Recovery scenarios
Lesson 16: Backup and Point-in-Time Recovery
- pg_basebackup
- WAL archiving
- Configure continuous archiving
- PITR (Point-in-Time Recovery)
- Lab: Restore database from backup
Lesson 17: Monitoring Patroni Cluster
- Metrics to monitor
- Integration with Prometheus + Grafana
- postgres_exporter
- Alerting rules
- Lab: Setup monitoring stack
Lesson 18: Performance Tuning
- PostgreSQL configuration tuning
- Connection pooling (PgBouncer)
- Load balancing (HAProxy/pgpool)
- Read replicas scaling
- Lab: Optimize cluster performance
Lesson 19: Logging and Troubleshooting
- PostgreSQL logs
- Patroni logs
- etcd logs
- Common issues and fixes
- Debug techniques
Lesson 20: Security Best Practices
- SSL/TLS configuration
- Authentication methods
- Network security
- Encryption at rest
- Audit logging
- Lab: Hardening cluster security
Lesson 21: Multi-datacenter Setup
- Cross-DC replication strategies
- Asynchronous cascading replication
- Disaster recovery planning
- Geographic load balancing
Lesson 22: Patroni with Kubernetes
- Patroni operator
- StatefulSets
- Persistent Volumes
- Helm charts
- Lab: Deploy on K8s
Lesson 23: Patroni Configuration Management
- Dynamic configuration changes
- DCS-based configuration
- patronictl edit-config
- Config validation
- Lab: Update config without downtime
Lesson 24: Upgrade Strategies
- PostgreSQL major version upgrade
- Patroni version upgrade
- Zero-downtime upgrade techniques
- Rollback procedures
- Lab: Upgrade cluster from PG 14 to 15
Lesson 25: Real-world Case Studies
- Production architecture examples
- Scaling strategies
- Cost optimization
- Lessons learned
Lesson 26: Automation with Ansible
- Ansible playbooks for deployment
- Configuration management
- Automated testing
- CI/CD integration
- Lab: Automated deployment
Lesson 27: Disaster Recovery Drills
- DR planning
- Testing procedures
- Incident response
- Post-mortem analysis
- Lab: Full DR simulation
Lesson 28: HA Architecture Design
- Requirements gathering
- Architecture design document
- Capacity planning
- Cost estimation
Lesson 29: Deploy Production-ready Cluster
- Complete deployment from scratch
- Documentation
- Runbook creation
- Knowledge transfer
- Final assessment
Prerequisites:
- Basic Linux
- PostgreSQL fundamentals
- Networking basics
- Shell scripting
Course objectives:
- Deploy production-ready PostgreSQL HA cluster
- Manage and operate cluster
- Handle incidents and troubleshooting
- Optimize performance and security