CloudTadaInsights

Lesson 29: Deploy Production-ready Cluster

Deploy Production-ready Cluster

After this lesson, you will be able to:

  • Deploy a complete production-ready PostgreSQL HA cluster.
  • Implement all best practices learned in this course.
  • Create comprehensive operational documentation.
  • Perform final validation and handoff.
  • Complete the capstone assessment.

1. Pre-Deployment Checklist

1.1. Infrastructure readiness

TEXT
☐ Hardware/VMs provisioned
  ☐ 3+ PostgreSQL nodes (Leader + 2 Replicas minimum)
  ☐ 3 etcd nodes (can co-locate with PostgreSQL)
  ☐ 2 HAProxy/Load balancer nodes
  ☐ 1 Monitoring server (Prometheus + Grafana)
  ☐ 1 Bastion host (for secure access)

☐ Network configuration
  ☐ VPC/VLAN created with appropriate CIDR
  ☐ Subnets configured (public + private)
  ☐ Security groups/firewall rules defined
  ☐ NAT gateway for internet access
  ☐ VPN for remote access (optional)

☐ Storage provisioned
  ☐ Data volumes (SSD, appropriate IOPS)
  ☐ WAL archive storage (S3/NFS)
  ☐ Backup storage (S3/GCS/tape)
  ☐ Log storage (centralized logging)

☐ DNS configuration
  ☐ postgres-master.example.com → HAProxy master
  ☐ postgres-replica.example.com → HAProxy replicas
  ☐ postgres-admin.example.com → Direct access (VPN only)

☐ Security
  ☐ SSL certificates generated and installed
  ☐ SSH keys distributed
  ☐ Secrets management (Vault/AWS Secrets Manager)
  ☐ Audit logging configured

☐ Monitoring
  ☐ Prometheus installed and configured
  ☐ Grafana dashboards imported
  ☐ Alert rules configured
  ☐ PagerDuty/Slack integration tested

☐ Documentation
  ☐ Architecture diagrams updated
  ☐ Runbooks created
  ☐ Contact list (on-call rotation)
  ☐ Escalation procedures

2. Step-by-Step Deployment

2.1. Phase 1: Base system setup (Day 1)

BASH
#!/bin/bash
# deploy_phase1.sh - Base system setup

set -e

NODES=("pg-node1" "pg-node2" "pg-node3")

echo "=== Phase 1: Base System Setup ==="

for node in "${NODES[@]}"; do
  echo "Configuring $node..."
  
  # Update system
  ssh $node "sudo apt-get update && sudo apt-get upgrade -y"
  
  # Install required packages
  ssh $node "sudo apt-get install -y \
    curl wget vim git htop \
    net-tools python3 python3-pip \
    postgresql-common"
  
  # Configure system limits
  ssh $node "sudo tee /etc/security/limits.d/postgres.conf" <<EOF
postgres soft nofile 65536
postgres hard nofile 65536
postgres soft nproc 8192
postgres hard nproc 8192
EOF
  
  # Configure sysctl
  ssh $node "sudo tee /etc/sysctl.d/99-postgres.conf" <<EOF
vm.swappiness = 1
vm.overcommit_memory = 2
vm.dirty_ratio = 10
vm.dirty_background_ratio = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6
EOF
  
  ssh $node "sudo sysctl -p /etc/sysctl.d/99-postgres.conf"
  
  # Create directories
  ssh $node "sudo mkdir -p /var/lib/postgresql/wal_archive"
  ssh $node "sudo mkdir -p /var/lib/postgresql/backups"
  
  echo "$node configured ✓"
done

echo "Phase 1 complete! ✅"

2.2. Phase 2: etcd cluster (Day 1)

BASH
#!/bin/bash
# deploy_phase2.sh - Deploy etcd cluster

echo "=== Phase 2: etcd Cluster Setup ==="

# Using Ansible for etcd deployment
ansible-playbook -i inventory.ini etcd-playbook.yml

# Verify etcd cluster
echo "Verifying etcd cluster health..."
ssh pg-node1 "etcdctl endpoint health --cluster"

# Expected output:
# http://10.0.1.11:2379 is healthy: successfully committed proposal: took = 1.234ms
# http://10.0.1.12:2379 is healthy: successfully committed proposal: took = 1.456ms
# http://10.0.1.13:2379 is healthy: successfully committed proposal: took = 1.678ms

echo "Phase 2 complete! ✅"

2.3. Phase 3: PostgreSQL + Patroni (Day 2)

BASH
#!/bin/bash
# deploy_phase3.sh - Deploy PostgreSQL + Patroni

echo "=== Phase 3: PostgreSQL + Patroni Setup ==="

# Deploy with Ansible
ansible-playbook -i inventory.ini postgresql-patroni-playbook.yml

# Wait for cluster initialization
echo "Waiting for Patroni cluster to initialize..."
sleep 60

# Verify cluster
ssh pg-node1 "patronictl -c /etc/patroni/patroni.yml list"

# Expected output:
# + Cluster: postgres-cluster -------+----+-----------+
# | Member   | Host       | Role    | State     | TL | Lag in MB |
# +----------+------------+---------+-----------+----+-----------+
# | pg-node1 | 10.0.1.11  | Leader  | running   |  1 |           |
# | pg-node2 | 10.0.1.12  | Replica | streaming |  1 |         0 |
# | pg-node3 | 10.0.1.13  | Replica | streaming |  1 |         0 |
# +----------+------------+---------+-----------+----+-----------+

echo "Phase 3 complete! ✅"

2.4. Phase 4: Connection pooling (Day 2)

BASH
#!/bin/bash
# deploy_phase4.sh - Deploy PgBouncer

echo "=== Phase 4: PgBouncer Setup ==="

ansible-playbook -i inventory.ini pgbouncer-playbook.yml

# Test connection through PgBouncer
psql -h pg-node1 -p 6432 -U postgres -c "SHOW POOLS;"

echo "Phase 4 complete! ✅"

2.5. Phase 5: Load balancing (Day 3)

BASH
#!/bin/bash
# deploy_phase5.sh - Deploy HAProxy

echo "=== Phase 5: HAProxy Setup ==="

ansible-playbook -i inventory.ini haproxy-playbook.yml

# Test connections
echo "Testing master connection..."
psql -h postgres-master.example.com -U postgres -c "SELECT inet_server_addr();"

echo "Testing replica connection..."
psql -h postgres-replica.example.com -U postgres -c "SELECT inet_server_addr();"

echo "Phase 5 complete! ✅"

2.6. Phase 6: Monitoring (Day 3)

BASH
#!/bin/bash
# deploy_phase6.sh - Deploy monitoring stack

echo "=== Phase 6: Monitoring Setup ==="

ansible-playbook -i inventory.ini monitoring-playbook.yml

# Verify Prometheus targets
curl http://prometheus.example.com:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job=="postgres") | {instance: .labels.instance, health: .health}'

# Import Grafana dashboards
for dashboard in dashboards/*.json; do
  curl -X POST \
    http://admin:admin@grafana.example.com:3000/api/dashboards/db \
    -H "Content-Type: application/json" \
    -d @$dashboard
done

echo "Phase 6 complete! ✅"
echo "Grafana: http://grafana.example.com:3000"

2.7. Phase 7: Backup configuration (Day 4)

BASH
#!/bin/bash
# deploy_phase7.sh - Configure backups

echo "=== Phase 7: Backup Configuration ==="

# Deploy pgBackRest or WAL-G
ansible-playbook -i inventory.ini backup-playbook.yml

# Schedule backup cron jobs
ssh pg-node1 "sudo -u postgres crontab -l" <<EOF
# Daily full backup at 2 AM
0 2 * * * /usr/local/bin/pg_backup.sh full

# Hourly incremental backup
0 * * * * /usr/local/bin/pg_backup.sh incremental

# Continuous WAL archiving (handled by PostgreSQL archive_command)
EOF

# Test backup
echo "Testing backup..."
ssh pg-node1 "sudo -u postgres /usr/local/bin/pg_backup.sh full --test"

# Test restore (to separate directory)
echo "Testing restore..."
ssh pg-node1 "sudo -u postgres /usr/local/bin/pg_restore.sh /var/lib/postgresql/restore_test"

echo "Phase 7 complete! ✅"

3. Post-Deployment Validation

3.1. Functional testing

BASH
#!/bin/bash
# validate_deployment.sh

echo "=== Deployment Validation ==="

# Test 1: Cluster health
echo "Test 1: Cluster health"
patronictl -c /etc/patroni/patroni.yml list
if [ $? -eq 0 ]; then
  echo "✅ Cluster is healthy"
else
  echo "❌ Cluster health check failed"
  exit 1
fi

# Test 2: Replication lag
echo "Test 2: Replication lag"
LAG=$(psql -h pg-node2 -U postgres -Atc "
  SELECT pg_wal_lsn_diff(
    pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn()
  );
")
if [ $LAG -lt 1048576 ]; then  # < 1MB
  echo "✅ Replication lag acceptable: $LAG bytes"
else
  echo "⚠️  High replication lag: $LAG bytes"
fi

# Test 3: Write operations
echo "Test 3: Write operations"
psql -h postgres-master.example.com -U postgres <<EOF
CREATE TABLE validation_test (id serial primary key, data text, created_at timestamp default now());
INSERT INTO validation_test (data) VALUES ('Test data 1'), ('Test data 2'), ('Test data 3');
SELECT * FROM validation_test;
EOF
if [ $? -eq 0 ]; then
  echo "✅ Write operations successful"
else
  echo "❌ Write operations failed"
  exit 1
fi

# Test 4: Read from replica
echo "Test 4: Read from replica"
psql -h postgres-replica.example.com -U postgres -c "SELECT * FROM validation_test;"
if [ $? -eq 0 ]; then
  echo "✅ Read from replica successful"
else
  echo "❌ Read from replica failed"
  exit 1
fi

# Test 5: Automatic failover
echo "Test 5: Automatic failover (simulation)"
read -p "Press Enter to simulate leader failure..."
CURRENT_LEADER=$(patronictl -c /etc/patroni/patroni.yml list | grep Leader | awk '{print $2}')
ssh $CURRENT_LEADER "sudo systemctl stop patroni"
echo "Waiting 30 seconds for failover..."
sleep 30
patronictl -c /etc/patroni/patroni.yml list
NEW_LEADER=$(patronictl -c /etc/patroni/patroni.yml list | grep Leader | awk '{print $2}')
if [ "$CURRENT_LEADER" != "$NEW_LEADER" ]; then
  echo "✅ Automatic failover successful: $CURRENT_LEADER → $NEW_LEADER"
  # Restore old leader
  ssh $CURRENT_LEADER "sudo systemctl start patroni"
else
  echo "❌ Failover did not occur"
  exit 1
fi

# Test 6: Backup and restore
echo "Test 6: Backup and restore"
sudo -u postgres /usr/local/bin/pg_backup.sh full
if [ $? -eq 0 ]; then
  echo "✅ Backup successful"
else
  echo "❌ Backup failed"
  exit 1
fi

# Test 7: Monitoring
echo "Test 7: Monitoring"
curl -s http://prometheus.example.com:9090/api/v1/query?query=up | jq '.data.result[] | select(.metric.job=="postgres")'
if [ $? -eq 0 ]; then
  echo "✅ Monitoring operational"
else
  echo "❌ Monitoring check failed"
  exit 1
fi

echo ""
echo "🎉 All validation tests passed!"
echo "Production cluster is ready! ✅"

3.2. Performance testing

BASH
#!/bin/bash
# performance_test.sh

echo "=== Performance Testing ==="

# Test 1: Single connection throughput
echo "Test 1: Single connection throughput"
pgbench -i -s 100 testdb
pgbench -c 1 -j 1 -t 10000 testdb
# Expected: > 500 TPS

# Test 2: Multi-connection throughput
echo "Test 2: Multi-connection throughput (10 connections)"
pgbench -c 10 -j 2 -t 10000 testdb
# Expected: > 3,000 TPS

# Test 3: Read-only workload
echo "Test 3: Read-only workload on replica"
pgbench -c 10 -j 2 -S -t 10000 -h postgres-replica.example.com testdb
# Expected: > 5,000 TPS

# Test 4: Connection pooling efficiency
echo "Test 4: Connection pooling (100 connections)"
pgbench -c 100 -j 4 -t 1000 -h pg-node1 -p 6432 testdb
# Should handle without errors

# Test 5: Replication lag under load
echo "Test 5: Replication lag under load"
pgbench -c 50 -j 4 -T 60 testdb &
PGBENCH_PID=$!
while kill -0 $PGBENCH_PID 2>/dev/null; do
  LAG=$(psql -h pg-node2 -U postgres -Atc "SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn());")
  echo "Current replication lag: $LAG bytes"
  sleep 5
done
# Expected: Lag < 10MB throughout test

echo "Performance testing complete!"

4. Operational Documentation

4.1. Runbook structure

MARKDOWN
# PostgreSQL HA Cluster Runbook

## 1. Cluster Overview
- Architecture: 3-node Patroni cluster with HAProxy
- Leader: pg-node1 (10.0.1.11)
- Replicas: pg-node2 (10.0.1.12), pg-node3 (10.0.1.13)
- Load balancers: haproxy1, haproxy2
- Monitoring: Prometheus + Grafana
- Backup: Daily full to S3, continuous WAL archiving

## 2. Common Tasks

### 2.1. Check cluster status
patronictl -c /etc/patroni/patroni.yml list

### 2.2. Check replication lag
psql -h pg-node2 -U postgres -c "
  SELECT client_addr,
         pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
  FROM pg_stat_replication;
"

### 2.3. Perform planned switchover
patronictl -c /etc/patroni/patroni.yml switchover postgres-cluster \
  --leader pg-node1 \
  --candidate pg-node2 \
  --scheduled 'now'

### 2.4. Add new replica
# See detailed procedure in Section 8

### 2.5. Manual backup
sudo -u postgres /usr/local/bin/pg_backup.sh full

## 3. Troubleshooting

### 3.1. Cluster split-brain
**Symptoms**: Multiple leaders reported
**Resolution**: See Section 9.1

### 3.2. High replication lag
**Symptoms**: Lag > 100MB for > 5 minutes
**Resolution**: See Section 9.2

### 3.3. Disk space exhaustion
**Symptoms**: Disk usage > 90%
**Resolution**: See Section 9.3

## 4. Emergency Procedures

### 4.1. Complete cluster failure
1. Check etcd cluster health
2. If etcd down, restore from backup
3. Reinitialize Patroni cluster
4. Restore data from backup if needed

### 4.2. Data corruption
1. Stop writes (set read-only)
2. Identify corruption extent
3. Perform PITR to point before corruption
4. Validate restored data
5. Resume normal operations

## 5. Escalation
- L1 Support: DevOps on-call (PagerDuty)
- L2 Support: DBA team (Slack: #dba-oncall)
- L3 Support: Senior DBA (Phone: xxx-xxx-xxxx)

4.2. Monitoring dashboard guide

MARKDOWN
# Grafana Dashboard Guide

## Primary Dashboard: PostgreSQL Cluster Overview

### Panels:

1. **Cluster Health**
   - Shows current leader
   - Replica count
   - Failed/stopped nodes
   - Alert: Any node down for > 1 minute

2. **Query Performance**
   - Queries per second (QPS)
   - Average query duration
   - 95th percentile latency
   - Alert: p95 latency > 100ms

3. **Replication Lag**
   - Lag in bytes for each replica
   - Lag in seconds
   - Alert: Lag > 10MB or > 10 seconds

4. **Resource Usage**
   - CPU usage per node
   - Memory usage
   - Disk I/O
   - Alert: CPU > 80%, Memory > 90%, Disk > 85%

5. **Connections**
   - Active connections
   - Idle connections
   - PgBouncer pool usage
   - Alert: Connections > 90% of max_connections

6. **Disk Space**
   - Data directory usage
   - WAL directory usage
   - Backup storage usage
   - Alert: Any filesystem > 85%

7. **Backup Status**
   - Last backup time
   - Backup size
   - WAL archiving status
   - Alert: No backup in 25 hours

## How to Use:
- Access: http://grafana.example.com:3000
- Username: admin (stored in 1Password)
- Time range: Last 1 hour (default), adjustable
- Refresh: 10 seconds auto-refresh

5. Knowledge Transfer

5.1. Training checklist

TEXT
☐ PostgreSQL fundamentals
  ☐ Architecture (processes, memory, storage)
  ☐ Replication (streaming, logical)
  ☐ Backup and recovery (PITR)

☐ Patroni operations
  ☐ Cluster management (patronictl commands)
  ☐ Configuration management (edit-config)
  ☐ Failover and switchover
  ☐ Troubleshooting common issues

☐ Monitoring and alerting
  ☐ Grafana dashboards interpretation
  ☐ Prometheus queries
  ☐ Alert handling procedures
  ☐ PagerDuty escalation

☐ Backup and restore
  ☐ Manual backup execution
  ☐ Restore procedures (full and PITR)
  ☐ Backup validation

☐ Incident response
  ☐ Runbook navigation
  ☐ Communication protocols
  ☐ Post-mortem process

☐ Maintenance tasks
  ☐ Vacuum and analyze
  ☐ Index maintenance
  ☐ Configuration changes
  ☐ Version upgrades

5.2. Handoff meeting agenda

MARKDOWN
# Production Cluster Handoff Meeting

Date: [Date]
Duration: 2 hours
Attendees: Project team, Operations team, Management

## Agenda:

1. **Introduction** (10 min)
   - Project overview
   - Architecture summary

2. **Live Demo** (30 min)
   - Cluster status check
   - Query execution
   - Monitoring dashboards
   - Simulate failover
   - Restore from backup

3. **Documentation Review** (20 min)
   - Architecture diagrams
   - Runbooks
   - Monitoring guide
   - Backup procedures

4. **Handoff Materials** (15 min)
   - Access credentials (1Password)
   - Git repository access
   - Monitoring URL and credentials
   - PagerDuty integration
   - Contact list

5. **Q&A** (30 min)
   - Open questions from operations team
   - Clarifications

6. **Action Items** (10 min)
   - Shadow period: 2 weeks
   - First on-call rotation
   - Knowledge assessment date

7. **Sign-off** (5 min)
   - Formal handoff acceptance
   - Support plan for first 30 days

## Deliverables:
- [ ] Architecture documentation (Confluence)
- [ ] Runbooks (GitHub)
- [ ] Monitoring dashboards (Grafana)
- [ ] Access credentials (1Password)
- [ ] Contact list (PagerDuty)
- [ ] Training materials (Google Drive)

6. Production Go-Live Checklist

TEXT
D-7 (One week before):
☐ All validation tests passed
☐ Performance benchmarks met
☐ Monitoring and alerting verified
☐ Backup and restore tested
☐ Runbooks reviewed and approved
☐ Operations team trained
☐ Stakeholders notified of go-live date
☐ Rollback plan documented

D-1 (Day before):
☐ Final smoke tests passed
☐ All data migrated (if applicable)
☐ DNS records prepared (not yet applied)
☐ Load balancer configured
☐ On-call rotation confirmed
☐ War room scheduled (Zoom/Slack)
☐ Communication plan ready

D-Day (Go-live):
☐ 08:00: Final system check
☐ 09:00: Enable monitoring alerts
☐ 10:00: Update DNS to point to new cluster
☐ 10:15: Verify application connectivity
☐ 10:30: Monitor for errors (30 min)
☐ 11:00: Declare success or rollback
☐ 12:00: Post go-live review meeting
☐ EOD: Document any issues and resolutions

D+1 (Day after):
☐ Review monitoring data (full 24 hours)
☐ Check backup completed successfully
☐ Verify replication lag within targets
☐ Confirm no alerts or incidents
☐ Operations team debrief

D+7 (One week after):
☐ Performance review against baselines
☐ Cost analysis (actual vs estimated)
☐ Lessons learned session
☐ Update documentation with findings
☐ Formal project closure

7. Final Assessment

7.1. Capstone project requirements

MARKDOWN
# Capstone Project: Deploy Production-Ready PostgreSQL HA Cluster

## Objective:
Deploy a fully functional, production-ready PostgreSQL High Availability cluster that meets all requirements specified in Lesson 28.

## Requirements:

1. **Architecture** (20 points)
   - [ ] 3-node Patroni cluster deployed
   - [ ] etcd cluster configured
   - [ ] HAProxy load balancing implemented
   - [ ] Network properly segmented

2. **High Availability** (20 points)
   - [ ] Automatic failover functional
   - [ ] RTO < 30 seconds demonstrated
   - [ ] RPO = 0 (synchronous replication)
   - [ ] No single point of failure

3. **Backup & Recovery** (15 points)
   - [ ] Automated daily backups configured
   - [ ] WAL archiving functional
   - [ ] PITR tested successfully
   - [ ] Backup retention policy implemented

4. **Monitoring & Alerting** (15 points)
   - [ ] Prometheus monitoring deployed
   - [ ] Grafana dashboards configured
   - [ ] Alert rules defined
   - [ ] PagerDuty/Slack integration working

5. **Security** (10 points)
   - [ ] SSL/TLS encryption enabled
   - [ ] Network firewall rules configured
   - [ ] Audit logging enabled
   - [ ] Secrets properly managed

6. **Documentation** (10 points)
   - [ ] Architecture diagram created
   - [ ] Runbooks written
   - [ ] Monitoring guide documented
   - [ ] Handoff materials prepared

7. **Testing** (10 points)
   - [ ] All functional tests passed
   - [ ] Performance benchmarks met
   - [ ] Failover drill successful
   - [ ] PITR restore validated

## Deliverables:

1. Working PostgreSQL HA cluster (accessible for validation)
2. Architecture documentation (Markdown/Confluence)
3. Runbooks (GitHub repository)
4. Monitoring dashboards (Grafana export)
5. Test results and evidence (screenshots/logs)
6. Video presentation (15 minutes)

## Grading:
- Total: 100 points
- Pass: 70+ points
- Excellence: 90+ points

## Submission:
- Due: [Date]
- Format: GitHub repository + video link
- Presentation: Live demo + Q&A (30 minutes)

7.2. Assessment rubric

CriteriaExcellent (9-10)Good (7-8)Satisfactory (5-6)Needs Improvement (0-4)
ArchitectureAll components deployed, well-designed, scalableMost components present, minor issuesBasic setup, some components missingIncomplete or non-functional
HARTO < 30s, RPO = 0, no downtimeRTO < 60s, minimal RPO, brief downtimeRTO > 60s, some data loss possibleFrequent failures, unacceptable RTO/RPO
BackupAutomated, tested, documentedAutomated, testedManual process, untestedNot implemented
MonitoringComprehensive, automated alertsBasic monitoring, some alertsManual checks onlyNo monitoring
SecurityAll best practices implementedMost security measures in placeBasic securityInsecure configuration
DocumentationComprehensive, clear, actionableGood documentation, minor gapsBasic docs, some missing infoPoor or missing docs
TestingAll tests passed, thoroughMost tests passedSome tests passedTesting incomplete

8. Summary

Key Achievements

Congratulations! You have completed the PostgreSQL High Availability course.

You have learned:

  • ✅ PostgreSQL replication and HA concepts
  • ✅ Patroni cluster deployment and management
  • ✅ etcd distributed configuration store
  • ✅ Monitoring with Prometheus and Grafana
  • ✅ Backup and recovery (PITR)
  • ✅ Failover and switchover procedures
  • ✅ Security best practices
  • ✅ Multi-datacenter setups
  • ✅ Kubernetes deployment
  • ✅ Configuration management
  • ✅ Upgrade strategies
  • ✅ Real-world case studies
  • ✅ Automation with Ansible
  • ✅ Disaster recovery drills
  • ✅ Architecture design
  • ✅ Production deployment

You are now ready to:

  • Deploy and manage production PostgreSQL HA clusters.
  • Design high-availability database architectures.
  • Troubleshoot and resolve HA issues.
  • Implement best practices for database reliability.
  • Train and mentor others on PostgreSQL HA.

Next Steps

Continue your learning:

  1. Advanced Topics:
    • PostgreSQL internals and performance tuning
    • Logical replication and multi-master setups
    • Sharding and horizontal scaling (Citus)
    • PostgreSQL on Kubernetes at scale
  2. Certifications:
    • PostgreSQL Certified Professional (PGCP)
    • AWS Database Specialty
    • Kubernetes Administrator (CKA)
  3. Community:
    • Join PostgreSQL mailing lists
    • Contribute to Patroni/PostgreSQL projects
    • Attend PostgreSQL conferences (PGConf)
    • Share knowledge through blog posts/talks
  4. Practice:
    • Build personal projects with HA
    • Contribute to open-source databases
    • Participate in chaos engineering experiments
    • Mentor junior DBAs

Resources

Documentation:

Community:

Training:

  • Percona PostgreSQL Training
  • 2ndQuadrant PostgreSQL Courses
  • Crunchy Data PostgreSQL Training

Conferences:

  • PGConf.US (annual)
  • PostgreSQL Conference Europe
  • FOSDEM PostgreSQL DevRoom

Final Words

Thank you for completing this course!

Remember:

  • High availability is a journey, not a destination.
  • Always test your failover procedures.
  • Document everything.
  • Automate where possible.
  • Monitor relentlessly.
  • Learn from failures.
  • Share your knowledge.

Good luck with your PostgreSQL HA deployments!

Feel free to reach out with questions or feedback.

Happy hacking! 🚀🐘

Share this article

You might also like

Browse all articles

Lesson 9: Bootstrap PostgreSQL Cluster

Learn how to bootstrap a Patroni cluster including starting Patroni for the first time on 3 nodes, verifying cluster status with patronictl, checking replication, troubleshooting common issues, and testing basic failover.

#Patroni#bootstrap#cluster

Lesson 8: Detailed Patroni Configuration

Learn detailed Patroni configuration including all sections of patroni.yml, bootstrap options, PostgreSQL parameters tuning, authentication setup, tags and constraints, and timing parameters optimization.

#Patroni#configuration#parameters

Lesson 7: Installing Patroni

Learn how to install Patroni, including setting up Python dependencies, installing via pip, understanding the patroni.yml configuration structure, creating systemd service, and configuring Patroni on 3 nodes for PostgreSQL high availability.

#Patroni#installation#configuration

Lesson 4: Infrastructure Preparation for PostgreSQL HA

Setting up the infrastructure for PostgreSQL High Availability with Patroni and etcd, including hardware requirements, network configuration, firewall, SSH keys, and time synchronization.

#Database#PostgreSQL#Infrastructure

Lesson 3: Introduction to Patroni and etcd

Understanding Patroni and etcd for PostgreSQL High Availability, including DCS, Raft consensus algorithm, leader election, and split-brain prevention mechanisms.

#Database#PostgreSQL#Patroni