CloudTadaInsights

Lesson 28: HA Architecture Design

HA Architecture Design

After this lesson, you will be able to:

  • Gather requirements for HA cluster design.
  • Create comprehensive architecture documents.
  • Perform capacity planning calculations.
  • Estimate infrastructure costs.
  • Conduct design reviews effectively.

1. Requirements Gathering

1.1. Business requirements template

TEXT
# PostgreSQL HA Requirements

## Business Context
- Application: [E-commerce platform]
- Users: [50M registered, 2M DAU]
- Business hours: [24/7 global]
- Peak traffic: [Black Friday, 10x normal]

## Availability Requirements
- Target uptime: [99.99% = 52 min downtime/year]
- Planned maintenance windows: [None - zero-downtime required]
- Acceptable downtime per incident: [< 5 minutes]

## Performance Requirements
- Expected QPS: [5,000 average, 15,000 peak]
- Query latency target: [p50: 10ms, p95: 50ms, p99: 200ms]
- Write throughput: [1,000 TPS average, 3,000 TPS peak]

## Data Requirements
- Current data size: [2TB]
- Growth rate: [20% per year]
- Retention period: [7 years for compliance]
- Backup frequency: [Daily full, continuous WAL archiving]

## Disaster Recovery
- RPO (Recovery Point Objective): [< 5 minutes]
- RTO (Recovery Time Objective): [< 15 minutes]
- Geographic redundancy: [Required - multi-region]

## Security & Compliance
- Data encryption: [At rest and in transit]
- Audit logging: [All queries must be logged]
- Compliance: [PCI DSS, GDPR, SOC 2]

## Budget Constraints
- Infrastructure budget: [$10K-15K/month]
- Staffing: [1 Senior DBA, 1 Junior DBA]
- Acceptable cost per transaction: [< $0.0001]

1.2. Technical requirements

TEXT
# Technical Requirements

## Workload Characteristics
- Read/Write ratio: [80% reads, 20% writes]
- Transaction types:
  * OLTP (transactional): 70%
  * OLAP (analytical): 30%
- Query complexity: [Mix of simple and complex joins]
- Largest tables: [orders: 500M rows, users: 50M rows]

## Integration Requirements
- Application stack: [Node.js, Python, Java]
- Connection pooling: [Required - PgBouncer]
- Load balancing: [HAProxy or cloud LB]
- Monitoring: [Prometheus + Grafana]
- CI/CD: [GitLab CI]

## Operational Requirements
- Deployment method: [Kubernetes preferred, VMs acceptable]
- Backup storage: [S3-compatible object storage]
- Log aggregation: [ELK or Loki]
- Alerting: [PagerDuty integration]
- Documentation: [Confluence/GitHub Wiki]

2. Architecture Design Document

2.1. High-level architecture

TEXT
Internet
                                       ↓
                              [CloudFlare CDN]
                                       ↓
                          [AWS Application Load Balancer]
                                       ↓
                    ┌──────────────────┴──────────────────┐
                    ↓                                      ↓
            [us-east-1a]                            [us-east-1b]
        ┌─────────────────┐                    ┌─────────────────┐
        │ Application     │                    │ Application     │
        │ Servers (ECS)   │                    │ Servers (ECS)   │
        └────────┬────────┘                    └────────┬────────┘
                 ↓                                       ↓
              [HAProxy]                              [HAProxy]
           10.0.1.100:5432                       10.0.2.100:5432
                 ↓                                       ↓
    ┌────────────┴───────────┬──────────────────────────┴────────┐
    ↓                        ↓                                    ↓
[PostgreSQL Leader]  [PostgreSQL Replica]              [PostgreSQL Replica]
  10.0.1.11              10.0.1.12                         10.0.2.11
  r6g.4xlarge            r6g.2xlarge                       r6g.2xlarge
  16 vCPU, 128GB         8 vCPU, 64GB                      8 vCPU, 64GB
  2TB io2 SSD            1TB io2 SSD                       1TB io2 SSD
  20K IOPS               10K IOPS                          10K IOPS
  Patroni + etcd         Patroni + etcd                    Patroni + etcd
  
Supporting Services:
├─ PgBouncer (connection pooling) - Co-located with PostgreSQL
├─ etcd cluster (3 nodes) - Co-located with PostgreSQL
├─ Prometheus (monitoring) - Separate t3.large
├─ Grafana (visualization) - Separate t3.small
├─ Backup storage - S3 bucket (s3://pg-backups)
└─ WAL archive - S3 bucket (s3://pg-wal-archive)

DR Site (us-west-2):
└─ [PostgreSQL Replica] - Async replication from us-east-1
   10.100.1.11
   r6g.xlarge (standby, can scale up on DR activation)

2.2. Network design

TEXT
VPC: 10.0.0.0/16

Subnets:
├─ Public subnet (us-east-1a): 10.0.1.0/24
│  └─ NAT Gateway, Bastion host
├─ Private subnet (us-east-1a): 10.0.10.0/24
│  └─ PostgreSQL leader, replica1, etcd1
├─ Public subnet (us-east-1b): 10.0.2.0/24
│  └─ NAT Gateway
├─ Private subnet (us-east-1b): 10.0.20.0/24
│  └─ PostgreSQL replica2, etcd2
└─ Private subnet (us-east-1c): 10.0.30.0/24
   └─ etcd3

Security Groups:
├─ PostgreSQL SG:
│  ├─ Inbound: 5432 from Application SG
│  ├─ Inbound: 8008 from Monitoring SG (Patroni API)
│  ├─ Inbound: 5432 from PostgreSQL SG (replication)
│  └─ Outbound: All
├─ Application SG:
│  ├─ Inbound: 443 from ALB
│  └─ Outbound: 5432 to PostgreSQL SG
└─ Monitoring SG:
   ├─ Inbound: 9090 from VPN
   └─ Outbound: 8008 to PostgreSQL SG

2.3. Data flow diagram

TEXT
Write Path:
Client → ALB → Application → HAProxy (master) → PostgreSQL Leader
                                                    ↓
                                         [Synchronous replication]
                                                    ↓
                                            PostgreSQL Replica1
                                                    ↓
                                         [Asynchronous replication]
                                                    ↓
                                            PostgreSQL Replica2 (AZ2)
                                                    ↓
                                         [Asynchronous replication]
                                                    ↓
                                            PostgreSQL DR (us-west-2)

Read Path (80% of traffic):
Client → ALB → Application → HAProxy (replicas) → Round-robin:
                                                    ├─ PostgreSQL Replica1
                                                    ├─ PostgreSQL Replica2
                                                    └─ PostgreSQL Replica (DR, optional)

Backup Path:
PostgreSQL Leader → WAL archiving → S3 (wal-archive)
                  ↓
            Daily pg_basebackup → S3 (backups)
                  ↓
            Monthly full backup → S3 Glacier

3. Capacity Planning

3.1. Compute capacity

PYTHON
# Capacity planning calculator

# Given requirements:
avg_qps = 5000  # queries per second
peak_qps = 15000  # peak queries per second
avg_query_time_ms = 10  # milliseconds
connection_per_query = 1

# Calculate connections needed
avg_connections = (avg_qps * avg_query_time_ms) / 1000
peak_connections = (peak_qps * avg_query_time_ms) / 1000

print(f"Average concurrent connections: {avg_connections}")
# Output: 50 connections

print(f"Peak concurrent connections: {peak_connections}")
# Output: 150 connections

# With connection pooling (transaction mode):
pooler_multiplier = 10  # Each DB connection serves 10 app connections
app_connections = peak_connections * pooler_multiplier
db_connections = peak_connections

print(f"Application connections: {app_connections}")
# Output: 1500 connections

print(f"Database connections (with pooler): {db_connections}")
# Output: 150 connections

# PostgreSQL configuration:
max_connections = 200  # 150 + 50 overhead

3.2. Memory capacity

PYTHON
# PostgreSQL memory calculation

# Rule of thumb: 25% of RAM for shared_buffers
total_ram_gb = 128
shared_buffers_gb = total_ram_gb * 0.25
print(f"shared_buffers: {shared_buffers_gb}GB")
# Output: 32GB

# effective_cache_size: 50-75% of RAM
effective_cache_size_gb = total_ram_gb * 0.75
print(f"effective_cache_size: {effective_cache_size_gb}GB")
# Output: 96GB

# work_mem per connection
# Formula: (RAM - shared_buffers) / max_connections / 2
available_ram_gb = total_ram_gb - shared_buffers_gb
work_mem_mb = (available_ram_gb * 1024) / max_connections / 2
print(f"work_mem: {work_mem_mb:.0f}MB per connection")
# Output: 240MB per connection

# Validate total memory usage
max_memory_usage_gb = shared_buffers_gb + (max_connections * work_mem_mb / 1024)
print(f"Maximum memory usage: {max_memory_usage_gb:.1f}GB")
# Output: 80GB (within 128GB limit ✅)

3.3. Storage capacity

PYTHON
# Storage planning

# Current data size
current_data_tb = 2  # TB

# Growth rate
annual_growth_rate = 0.20  # 20% per year
years_to_plan = 3

# Projected data size
projected_data_tb = current_data_tb * (1 + annual_growth_rate) ** years_to_plan
print(f"Data size in {years_to_plan} years: {projected_data_tb:.2f}TB")
# Output: 3.46TB

# WAL volume
# Estimate: 10% of data size per day
wal_per_day_gb = (current_data_tb * 1024) * 0.10
wal_retention_days = 7
total_wal_gb = wal_per_day_gb * wal_retention_days
print(f"WAL storage needed (7 days): {total_wal_gb:.0f}GB")
# Output: 1434GB ~ 1.4TB

# Backup storage
# Full backup + 7 days of WAL
backup_storage_tb = projected_data_tb + (total_wal_gb / 1024)
print(f"Backup storage needed: {backup_storage_tb:.2f}TB")
# Output: 4.86TB

# Total storage per instance
# Data + WAL + temp + overhead (20%)
storage_per_instance_tb = (projected_data_tb + (total_wal_gb / 1024)) * 1.20
print(f"Storage per instance: {storage_per_instance_tb:.2f}TB")
# Output: 5.83TB ~ 6TB

# Recommend: Provision 8TB for growth buffer

3.4. IOPS calculation

PYTHON
# IOPS requirements

# Given:
write_tps = 3000  # transactions per second (peak)
reads_per_write = 4  # Average reads per write
checkpoint_interval_sec = 300  # 5 minutes

# Write IOPS
# Each transaction: 1 write to WAL + 1 write to data (during checkpoint)
wal_iops = write_tps * 1  # WAL writes
checkpoint_iops = write_tps * 1 / (checkpoint_interval_sec / 5)  # Amortized
total_write_iops = wal_iops + checkpoint_iops
print(f"Write IOPS: {total_write_iops:.0f}")
# Output: 3060 IOPS

# Read IOPS
# 80% read ratio, 20% write ratio
total_tps = write_tps / 0.20  # Total transactions = write TPS / write percentage
read_tps = total_tps * 0.80
read_iops = read_tps * reads_per_write
print(f"Read IOPS: {read_iops:.0f}")
# Output: 48000 IOPS

# Total IOPS
total_iops = total_write_iops + read_iops
print(f"Total IOPS required: {total_iops:.0f}")
# Output: 51060 IOPS

# Recommendation: Provision 60K IOPS for headroom
# AWS io2 SSD: $0.065/IOPS/month
iops_provisioned = 60000

4. Cost Estimation

4.1. AWS infrastructure costs

TEXT
Compute (EC2):
├─ Leader: r6g.4xlarge (16 vCPU, 128GB RAM)
│  └─ $0.672/hour x 730 hours = $490/month
├─ Replica1: r6g.2xlarge (8 vCPU, 64GB RAM)
│  └─ $0.336/hour x 730 hours = $245/month
├─ Replica2: r6g.2xlarge (8 vCPU, 64GB RAM)
│  └─ $0.336/hour x 730 hours = $245/month
├─ DR Replica: r6g.xlarge (4 vCPU, 32GB RAM)
│  └─ $0.168/hour x 730 hours = $123/month
└─ Total compute: $1,103/month

Storage (EBS io2):
├─ Leader: 8TB @ $0.125/GB = $1,000/month
│  └─ IOPS: 20K @ $0.065/IOPS = $1,300/month
├─ Replica1: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 10K @ $0.065/IOPS = $650/month
├─ Replica2: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 10K @ $0.065/IOPS = $650/month
├─ DR: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 5K @ $0.065/IOPS = $325/month
└─ Total storage: $6,175/month

Backup Storage (S3):
├─ S3 Standard (30 days): 5TB @ $0.023/GB = $115/month
├─ S3 Glacier (7 years): 50TB @ $0.004/GB = $200/month
└─ Total backup storage: $315/month

Network:
├─ Data transfer out: 2TB @ $0.09/GB = $180/month
├─ Inter-AZ transfer: 5TB @ $0.01/GB = $50/month
└─ Total network: $230/month

Supporting Services:
├─ Application Load Balancer: $23/month
├─ NAT Gateway (2): $65/month
├─ Monitoring (Prometheus, Grafana): $100/month
└─ Total supporting: $188/month

Grand Total: $8,011/month

Reserved Instance Savings (1-year):
- Compute: $1,103 → $770 (30% savings)
- Revised total: $7,678/month

4.2. Cost optimization opportunities

TEXT
Potential Savings:
1. Use gp3 instead of io2 for non-critical replicas
   Savings: ~$2,000/month

2. Use Spot instances for DR replica
   Savings: ~$90/month

3. Lifecycle S3 backups to Glacier faster (7 days vs 30 days)
   Savings: ~$50/month

4. Reduce DR replica size (only scale up during DR)
   Savings: ~$60/month

5. Use AWS Savings Plans
   Additional 10-15% savings: ~$750/month

Optimized Total: ~$4,728/month (41% savings)

5. Design Review Process

5.1. Design review checklist

TEXT
☐ Requirements Review
  ☐ Business requirements documented
  ☐ Technical requirements clear
  ☐ Non-functional requirements (performance, security)
  ☐ Constraints identified (budget, timeline)

☐ Architecture Review
  ☐ High-level diagram created
  ☐ Component responsibilities defined
  ☐ Data flow documented
  ☐ Network topology validated
  ☐ Security controls identified

☐ Capacity Planning
  ☐ Compute resources sized appropriately
  ☐ Storage capacity calculated
  ☐ IOPS requirements met
  ☐ Network bandwidth sufficient
  ☐ Growth projections considered

☐ High Availability
  ☐ SPoF (Single Points of Failure) eliminated
  ☐ Failover mechanisms tested
  ☐ RTO/RPO targets achievable
  ☐ DR plan documented
  ☐ Backup/restore validated

☐ Performance
  ☐ Query performance tested
  ☐ Load testing completed
  ☐ Bottlenecks identified
  ☐ Tuning recommendations documented
  ☐ Monitoring and alerting in place

☐ Security
  ☐ Encryption at rest and in transit
  ☐ Network segmentation
  ☐ Access controls (IAM, RBAC)
  ☐ Audit logging enabled
  ☐ Compliance requirements met

☐ Operational Readiness
  ☐ Runbooks created
  ☐ Monitoring dashboards configured
  ☐ Alerting rules defined
  ☐ On-call rotation established
  ☐ Training completed

☐ Cost Management
  ☐ Cost estimation completed
  ☐ Budget approved
  ☐ Cost optimization opportunities identified
  ☐ Ongoing cost monitoring plan

5.2. Review meeting agenda

TEXT
# Architecture Design Review Meeting

Date: 2024-11-25
Duration: 90 minutes
Attendees: DBA team, SRE, Dev team, Product manager, Security

Agenda:
1. Introduction (5 min)
   - Project overview
   - Review objectives

2. Requirements Review (10 min)
   - Business requirements
   - Q&A

3. Architecture Presentation (30 min)
   - High-level design
   - Component details
   - Data flow
   - Network design

4. Capacity Planning (15 min)
   - Compute, storage, IOPS calculations
   - Growth projections

5. Cost Estimation (10 min)
   - Infrastructure costs
   - Optimization opportunities

6. Security & Compliance (10 min)
   - Security controls
   - Compliance mapping

7. Open Discussion (10 min)
   - Concerns and risks
   - Alternative approaches

8. Action Items & Next Steps (5 min)
   - Assign owners
   - Set deadlines

Follow-up:
- Circulate meeting notes within 24 hours
- Address action items within 1 week
- Final approval from stakeholders

6. Risk Assessment

6.1. Risk matrix

RiskLikelihoodImpactSeverityMitigation
Leader node failureMediumLowMediumAutomatic failover with Patroni
Datacenter outageLowHighMediumMulti-AZ deployment + DR site
Data corruptionLowHighMediumPITR backups, checksums enabled
Capacity exhaustionMediumMediumMediumMonitoring + auto-scaling
Security breachLowCriticalHighEncryption, network segmentation, audit logs
Cost overrunMediumMediumMediumBudget alerts, cost optimization
Staff turnoverHighMediumMediumDocumentation, cross-training
Vendor lock-inLowMediumLowUse open-source tools (Patroni vs RDS)

6.2. Mitigation strategies

  1. Leader node failure
    • Patroni automatic failover (RTO < 30s)
    • Health checks every 10s
    • Synchronous replication to 1 replica
    • Runbook for manual intervention
  2. Datacenter outage
    • Multi-AZ deployment (2 AZs in primary region)
    • DR site in different region (us-west-2)
    • Quarterly DR drills
    • Documented failover procedures
  3. Data corruption
    • pg_checksums enabled
    • Daily full backups + continuous WAL archiving
    • PITR tested monthly
    • Backup retention: 30 days hot, 7 years cold
  4. Capacity exhaustion
    • Prometheus alerts at 70% CPU/memory/disk
    • PgBouncer for connection management
    • Read replicas for horizontal scaling
    • Annual capacity planning review
  5. Security breach
    • Encryption at rest (LUKS) and in transit (SSL/TLS)
    • Network segmentation (private subnets)
    • MFA for all admin access
    • Quarterly security audits
    • Intrusion detection system (IDS)
  6. Cost overrun
    • AWS Budgets with alerts at 80%, 100%, 120%
    • Monthly cost review meetings
    • Reserved instances for predictable workloads
    • Automatic shutdown of non-production environments
  7. Staff turnover
    • Comprehensive documentation (Confluence)
    • Runbooks for common tasks
    • Cross-training program
    • Bus factor > 2 for critical knowledge

7. Lab Exercises

Lab 1: Requirements gathering

Tasks:

  1. Interview stakeholders (role-play).
  2. Document business requirements.
  3. Define technical requirements.
  4. Identify constraints.
  5. Create requirements document.

Lab 2: Architecture design

Tasks:

  1. Create high-level architecture diagram.
  2. Design network topology.
  3. Document data flow.
  4. Define security controls.
  5. Present to team for review.

Lab 3: Capacity planning

Tasks:

  1. Calculate compute requirements.
  2. Estimate storage needs.
  3. Determine IOPS requirements.
  4. Plan for 3-year growth.
  5. Document assumptions.

Lab 4: Cost estimation

Tasks:

  1. Price out infrastructure on AWS/GCP/Azure.
  2. Compare managed vs self-hosted options.
  3. Identify cost optimization opportunities.
  4. Create budget proposal.
  5. Present to management.

8. Summary

Design Principles

  1. Simplicity: Start simple, add complexity as needed.
  2. Resilience: Eliminate single points of failure.
  3. Scalability: Plan for 3x growth.
  4. Security: Defense in depth.
  5. Observability: Monitor everything.
  6. Cost-effectiveness: Optimize for cost/performance ratio.
  7. Maintainability: Document and automate.

Key Deliverables

  1. Requirements document.
  2. Architecture diagrams.
  3. Capacity planning spreadsheet.
  4. Cost estimation.
  5. Risk assessment.
  6. Design review presentation.
  7. Runbooks and documentation.

Next Steps

Lesson 29 will cover Deploy Production-Ready Cluster:

  • Complete end-to-end deployment guide
  • Production deployment checklist
  • Operational runbooks
  • Knowledge transfer
  • Final assessment

Share this article

You might also like

Browse all articles

Lesson 25: Real-world Case Studies

Analyzing real-world PostgreSQL HA deployments, scaling strategies, and cost optimization techniques.

#PostgreSQL#Case Study#Scaling

Lesson 9: Bootstrap PostgreSQL Cluster

Learn how to bootstrap a Patroni cluster including starting Patroni for the first time on 3 nodes, verifying cluster status with patronictl, checking replication, troubleshooting common issues, and testing basic failover.

#Patroni#bootstrap#cluster

Lesson 8: Detailed Patroni Configuration

Learn detailed Patroni configuration including all sections of patroni.yml, bootstrap options, PostgreSQL parameters tuning, authentication setup, tags and constraints, and timing parameters optimization.

#Patroni#configuration#parameters

Lesson 7: Installing Patroni

Learn how to install Patroni, including setting up Python dependencies, installing via pip, understanding the patroni.yml configuration structure, creating systemd service, and configuring Patroni on 3 nodes for PostgreSQL high availability.

#Patroni#installation#configuration

Lesson 4: Infrastructure Preparation for PostgreSQL HA

Setting up the infrastructure for PostgreSQL High Availability with Patroni and etcd, including hardware requirements, network configuration, firewall, SSH keys, and time synchronization.

#Database#PostgreSQL#Infrastructure