CloudTada | Infrastructure & DevOps Insights

HA Architecture Design

After this lesson, you will be able to:

Gather requirements for HA cluster design.
Create comprehensive architecture documents.
Perform capacity planning calculations.
Estimate infrastructure costs.
Conduct design reviews effectively.

1. Requirements Gathering

1.1. Business requirements template

TEXT

# PostgreSQL HA Requirements

## Business Context
- Application: [E-commerce platform]
- Users: [50M registered, 2M DAU]
- Business hours: [24/7 global]
- Peak traffic: [Black Friday, 10x normal]

## Availability Requirements
- Target uptime: [99.99% = 52 min downtime/year]
- Planned maintenance windows: [None - zero-downtime required]
- Acceptable downtime per incident: [< 5 minutes]

## Performance Requirements
- Expected QPS: [5,000 average, 15,000 peak]
- Query latency target: [p50: 10ms, p95: 50ms, p99: 200ms]
- Write throughput: [1,000 TPS average, 3,000 TPS peak]

## Data Requirements
- Current data size: [2TB]
- Growth rate: [20% per year]
- Retention period: [7 years for compliance]
- Backup frequency: [Daily full, continuous WAL archiving]

## Disaster Recovery
- RPO (Recovery Point Objective): [< 5 minutes]
- RTO (Recovery Time Objective): [< 15 minutes]
- Geographic redundancy: [Required - multi-region]

## Security & Compliance
- Data encryption: [At rest and in transit]
- Audit logging: [All queries must be logged]
- Compliance: [PCI DSS, GDPR, SOC 2]

## Budget Constraints
- Infrastructure budget: [$10K-15K/month]
- Staffing: [1 Senior DBA, 1 Junior DBA]
- Acceptable cost per transaction: [< $0.0001]

1.2. Technical requirements

TEXT

# Technical Requirements

## Workload Characteristics
- Read/Write ratio: [80% reads, 20% writes]
- Transaction types:
  * OLTP (transactional): 70%
  * OLAP (analytical): 30%
- Query complexity: [Mix of simple and complex joins]
- Largest tables: [orders: 500M rows, users: 50M rows]

## Integration Requirements
- Application stack: [Node.js, Python, Java]
- Connection pooling: [Required - PgBouncer]
- Load balancing: [HAProxy or cloud LB]
- Monitoring: [Prometheus + Grafana]
- CI/CD: [GitLab CI]

## Operational Requirements
- Deployment method: [Kubernetes preferred, VMs acceptable]
- Backup storage: [S3-compatible object storage]
- Log aggregation: [ELK or Loki]
- Alerting: [PagerDuty integration]
- Documentation: [Confluence/GitHub Wiki]

2. Architecture Design Document

2.1. High-level architecture

TEXT

Internet
                                       ↓
                              [CloudFlare CDN]
                                       ↓
                          [AWS Application Load Balancer]
                                       ↓
                    ┌──────────────────┴──────────────────┐
                    ↓                                      ↓
            [us-east-1a]                            [us-east-1b]
        ┌─────────────────┐                    ┌─────────────────┐
        │ Application     │                    │ Application     │
        │ Servers (ECS)   │                    │ Servers (ECS)   │
        └────────┬────────┘                    └────────┬────────┘
                 ↓                                       ↓
              [HAProxy]                              [HAProxy]
           10.0.1.100:5432                       10.0.2.100:5432
                 ↓                                       ↓
    ┌────────────┴───────────┬──────────────────────────┴────────┐
    ↓                        ↓                                    ↓
[PostgreSQL Leader]  [PostgreSQL Replica]              [PostgreSQL Replica]
  10.0.1.11              10.0.1.12                         10.0.2.11
  r6g.4xlarge            r6g.2xlarge                       r6g.2xlarge
  16 vCPU, 128GB         8 vCPU, 64GB                      8 vCPU, 64GB
  2TB io2 SSD            1TB io2 SSD                       1TB io2 SSD
  20K IOPS               10K IOPS                          10K IOPS
  Patroni + etcd         Patroni + etcd                    Patroni + etcd
  
Supporting Services:
├─ PgBouncer (connection pooling) - Co-located with PostgreSQL
├─ etcd cluster (3 nodes) - Co-located with PostgreSQL
├─ Prometheus (monitoring) - Separate t3.large
├─ Grafana (visualization) - Separate t3.small
├─ Backup storage - S3 bucket (s3://pg-backups)
└─ WAL archive - S3 bucket (s3://pg-wal-archive)

DR Site (us-west-2):
└─ [PostgreSQL Replica] - Async replication from us-east-1
   10.100.1.11
   r6g.xlarge (standby, can scale up on DR activation)

2.2. Network design

TEXT

VPC: 10.0.0.0/16

Subnets:
├─ Public subnet (us-east-1a): 10.0.1.0/24
│  └─ NAT Gateway, Bastion host
├─ Private subnet (us-east-1a): 10.0.10.0/24
│  └─ PostgreSQL leader, replica1, etcd1
├─ Public subnet (us-east-1b): 10.0.2.0/24
│  └─ NAT Gateway
├─ Private subnet (us-east-1b): 10.0.20.0/24
│  └─ PostgreSQL replica2, etcd2
└─ Private subnet (us-east-1c): 10.0.30.0/24
   └─ etcd3

Security Groups:
├─ PostgreSQL SG:
│  ├─ Inbound: 5432 from Application SG
│  ├─ Inbound: 8008 from Monitoring SG (Patroni API)
│  ├─ Inbound: 5432 from PostgreSQL SG (replication)
│  └─ Outbound: All
├─ Application SG:
│  ├─ Inbound: 443 from ALB
│  └─ Outbound: 5432 to PostgreSQL SG
└─ Monitoring SG:
   ├─ Inbound: 9090 from VPN
   └─ Outbound: 8008 to PostgreSQL SG

2.3. Data flow diagram

TEXT

Write Path:
Client → ALB → Application → HAProxy (master) → PostgreSQL Leader
                                                    ↓
                                         [Synchronous replication]
                                                    ↓
                                            PostgreSQL Replica1
                                                    ↓
                                         [Asynchronous replication]
                                                    ↓
                                            PostgreSQL Replica2 (AZ2)
                                                    ↓
                                         [Asynchronous replication]
                                                    ↓
                                            PostgreSQL DR (us-west-2)

Read Path (80% of traffic):
Client → ALB → Application → HAProxy (replicas) → Round-robin:
                                                    ├─ PostgreSQL Replica1
                                                    ├─ PostgreSQL Replica2
                                                    └─ PostgreSQL Replica (DR, optional)

Backup Path:
PostgreSQL Leader → WAL archiving → S3 (wal-archive)
                  ↓
            Daily pg_basebackup → S3 (backups)
                  ↓
            Monthly full backup → S3 Glacier

3. Capacity Planning

3.1. Compute capacity

PYTHON

# Capacity planning calculator

# Given requirements:
avg_qps = 5000  # queries per second
peak_qps = 15000  # peak queries per second
avg_query_time_ms = 10  # milliseconds
connection_per_query = 1

# Calculate connections needed
avg_connections = (avg_qps * avg_query_time_ms) / 1000
peak_connections = (peak_qps * avg_query_time_ms) / 1000

print(f"Average concurrent connections: {avg_connections}")
# Output: 50 connections

print(f"Peak concurrent connections: {peak_connections}")
# Output: 150 connections

# With connection pooling (transaction mode):
pooler_multiplier = 10  # Each DB connection serves 10 app connections
app_connections = peak_connections * pooler_multiplier
db_connections = peak_connections

print(f"Application connections: {app_connections}")
# Output: 1500 connections

print(f"Database connections (with pooler): {db_connections}")
# Output: 150 connections

# PostgreSQL configuration:
max_connections = 200  # 150 + 50 overhead

3.2. Memory capacity

PYTHON

# PostgreSQL memory calculation

# Rule of thumb: 25% of RAM for shared_buffers
total_ram_gb = 128
shared_buffers_gb = total_ram_gb * 0.25
print(f"shared_buffers: {shared_buffers_gb}GB")
# Output: 32GB

# effective_cache_size: 50-75% of RAM
effective_cache_size_gb = total_ram_gb * 0.75
print(f"effective_cache_size: {effective_cache_size_gb}GB")
# Output: 96GB

# work_mem per connection
# Formula: (RAM - shared_buffers) / max_connections / 2
available_ram_gb = total_ram_gb - shared_buffers_gb
work_mem_mb = (available_ram_gb * 1024) / max_connections / 2
print(f"work_mem: {work_mem_mb:.0f}MB per connection")
# Output: 240MB per connection

# Validate total memory usage
max_memory_usage_gb = shared_buffers_gb + (max_connections * work_mem_mb / 1024)
print(f"Maximum memory usage: {max_memory_usage_gb:.1f}GB")
# Output: 80GB (within 128GB limit ✅)

3.3. Storage capacity

PYTHON

# Storage planning

# Current data size
current_data_tb = 2  # TB

# Growth rate
annual_growth_rate = 0.20  # 20% per year
years_to_plan = 3

# Projected data size
projected_data_tb = current_data_tb * (1 + annual_growth_rate) ** years_to_plan
print(f"Data size in {years_to_plan} years: {projected_data_tb:.2f}TB")
# Output: 3.46TB

# WAL volume
# Estimate: 10% of data size per day
wal_per_day_gb = (current_data_tb * 1024) * 0.10
wal_retention_days = 7
total_wal_gb = wal_per_day_gb * wal_retention_days
print(f"WAL storage needed (7 days): {total_wal_gb:.0f}GB")
# Output: 1434GB ~ 1.4TB

# Backup storage
# Full backup + 7 days of WAL
backup_storage_tb = projected_data_tb + (total_wal_gb / 1024)
print(f"Backup storage needed: {backup_storage_tb:.2f}TB")
# Output: 4.86TB

# Total storage per instance
# Data + WAL + temp + overhead (20%)
storage_per_instance_tb = (projected_data_tb + (total_wal_gb / 1024)) * 1.20
print(f"Storage per instance: {storage_per_instance_tb:.2f}TB")
# Output: 5.83TB ~ 6TB

# Recommend: Provision 8TB for growth buffer

3.4. IOPS calculation

PYTHON

# IOPS requirements

# Given:
write_tps = 3000  # transactions per second (peak)
reads_per_write = 4  # Average reads per write
checkpoint_interval_sec = 300  # 5 minutes

# Write IOPS
# Each transaction: 1 write to WAL + 1 write to data (during checkpoint)
wal_iops = write_tps * 1  # WAL writes
checkpoint_iops = write_tps * 1 / (checkpoint_interval_sec / 5)  # Amortized
total_write_iops = wal_iops + checkpoint_iops
print(f"Write IOPS: {total_write_iops:.0f}")
# Output: 3060 IOPS

# Read IOPS
# 80% read ratio, 20% write ratio
total_tps = write_tps / 0.20  # Total transactions = write TPS / write percentage
read_tps = total_tps * 0.80
read_iops = read_tps * reads_per_write
print(f"Read IOPS: {read_iops:.0f}")
# Output: 48000 IOPS

# Total IOPS
total_iops = total_write_iops + read_iops
print(f"Total IOPS required: {total_iops:.0f}")
# Output: 51060 IOPS

# Recommendation: Provision 60K IOPS for headroom
# AWS io2 SSD: $0.065/IOPS/month
iops_provisioned = 60000

4. Cost Estimation

4.1. AWS infrastructure costs

TEXT

Compute (EC2):
├─ Leader: r6g.4xlarge (16 vCPU, 128GB RAM)
│  └─ $0.672/hour x 730 hours = $490/month
├─ Replica1: r6g.2xlarge (8 vCPU, 64GB RAM)
│  └─ $0.336/hour x 730 hours = $245/month
├─ Replica2: r6g.2xlarge (8 vCPU, 64GB RAM)
│  └─ $0.336/hour x 730 hours = $245/month
├─ DR Replica: r6g.xlarge (4 vCPU, 32GB RAM)
│  └─ $0.168/hour x 730 hours = $123/month
└─ Total compute: $1,103/month

Storage (EBS io2):
├─ Leader: 8TB @ $0.125/GB = $1,000/month
│  └─ IOPS: 20K @ $0.065/IOPS = $1,300/month
├─ Replica1: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 10K @ $0.065/IOPS = $650/month
├─ Replica2: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 10K @ $0.065/IOPS = $650/month
├─ DR: 6TB @ $0.125/GB = $750/month
│  └─ IOPS: 5K @ $0.065/IOPS = $325/month
└─ Total storage: $6,175/month

Backup Storage (S3):
├─ S3 Standard (30 days): 5TB @ $0.023/GB = $115/month
├─ S3 Glacier (7 years): 50TB @ $0.004/GB = $200/month
└─ Total backup storage: $315/month

Network:
├─ Data transfer out: 2TB @ $0.09/GB = $180/month
├─ Inter-AZ transfer: 5TB @ $0.01/GB = $50/month
└─ Total network: $230/month

Supporting Services:
├─ Application Load Balancer: $23/month
├─ NAT Gateway (2): $65/month
├─ Monitoring (Prometheus, Grafana): $100/month
└─ Total supporting: $188/month

Grand Total: $8,011/month

Reserved Instance Savings (1-year):
- Compute: $1,103 → $770 (30% savings)
- Revised total: $7,678/month

4.2. Cost optimization opportunities

TEXT

Potential Savings:
1. Use gp3 instead of io2 for non-critical replicas
   Savings: ~$2,000/month

2. Use Spot instances for DR replica
   Savings: ~$90/month

3. Lifecycle S3 backups to Glacier faster (7 days vs 30 days)
   Savings: ~$50/month

4. Reduce DR replica size (only scale up during DR)
   Savings: ~$60/month

5. Use AWS Savings Plans
   Additional 10-15% savings: ~$750/month

Optimized Total: ~$4,728/month (41% savings)

5. Design Review Process

5.1. Design review checklist

TEXT

☐ Requirements Review
  ☐ Business requirements documented
  ☐ Technical requirements clear
  ☐ Non-functional requirements (performance, security)
  ☐ Constraints identified (budget, timeline)

☐ Architecture Review
  ☐ High-level diagram created
  ☐ Component responsibilities defined
  ☐ Data flow documented
  ☐ Network topology validated
  ☐ Security controls identified

☐ Capacity Planning
  ☐ Compute resources sized appropriately
  ☐ Storage capacity calculated
  ☐ IOPS requirements met
  ☐ Network bandwidth sufficient
  ☐ Growth projections considered

☐ High Availability
  ☐ SPoF (Single Points of Failure) eliminated
  ☐ Failover mechanisms tested
  ☐ RTO/RPO targets achievable
  ☐ DR plan documented
  ☐ Backup/restore validated

☐ Performance
  ☐ Query performance tested
  ☐ Load testing completed
  ☐ Bottlenecks identified
  ☐ Tuning recommendations documented
  ☐ Monitoring and alerting in place

☐ Security
  ☐ Encryption at rest and in transit
  ☐ Network segmentation
  ☐ Access controls (IAM, RBAC)
  ☐ Audit logging enabled
  ☐ Compliance requirements met

☐ Operational Readiness
  ☐ Runbooks created
  ☐ Monitoring dashboards configured
  ☐ Alerting rules defined
  ☐ On-call rotation established
  ☐ Training completed

☐ Cost Management
  ☐ Cost estimation completed
  ☐ Budget approved
  ☐ Cost optimization opportunities identified
  ☐ Ongoing cost monitoring plan

5.2. Review meeting agenda

TEXT

# Architecture Design Review Meeting

Date: 2024-11-25
Duration: 90 minutes
Attendees: DBA team, SRE, Dev team, Product manager, Security

Agenda:
1. Introduction (5 min)
   - Project overview
   - Review objectives

2. Requirements Review (10 min)
   - Business requirements
   - Q&A

3. Architecture Presentation (30 min)
   - High-level design
   - Component details
   - Data flow
   - Network design

4. Capacity Planning (15 min)
   - Compute, storage, IOPS calculations
   - Growth projections

5. Cost Estimation (10 min)
   - Infrastructure costs
   - Optimization opportunities

6. Security & Compliance (10 min)
   - Security controls
   - Compliance mapping

7. Open Discussion (10 min)
   - Concerns and risks
   - Alternative approaches

8. Action Items & Next Steps (5 min)
   - Assign owners
   - Set deadlines

Follow-up:
- Circulate meeting notes within 24 hours
- Address action items within 1 week
- Final approval from stakeholders

6. Risk Assessment

6.1. Risk matrix

Risk	Likelihood	Impact	Severity	Mitigation
Leader node failure	Medium	Low	Medium	Automatic failover with Patroni
Datacenter outage	Low	High	Medium	Multi-AZ deployment + DR site
Data corruption	Low	High	Medium	PITR backups, checksums enabled
Capacity exhaustion	Medium	Medium	Medium	Monitoring + auto-scaling
Security breach	Low	Critical	High	Encryption, network segmentation, audit logs
Cost overrun	Medium	Medium	Medium	Budget alerts, cost optimization
Staff turnover	High	Medium	Medium	Documentation, cross-training
Vendor lock-in	Low	Medium	Low	Use open-source tools (Patroni vs RDS)

6.2. Mitigation strategies

Leader node failure
- Patroni automatic failover (RTO < 30s)
- Health checks every 10s
- Synchronous replication to 1 replica
- Runbook for manual intervention
Datacenter outage
- Multi-AZ deployment (2 AZs in primary region)
- DR site in different region (us-west-2)
- Quarterly DR drills
- Documented failover procedures
Data corruption
- pg_checksums enabled
- Daily full backups + continuous WAL archiving
- PITR tested monthly
- Backup retention: 30 days hot, 7 years cold
Capacity exhaustion
- Prometheus alerts at 70% CPU/memory/disk
- PgBouncer for connection management
- Read replicas for horizontal scaling
- Annual capacity planning review
Security breach
- Encryption at rest (LUKS) and in transit (SSL/TLS)
- Network segmentation (private subnets)
- MFA for all admin access
- Quarterly security audits
- Intrusion detection system (IDS)
Cost overrun
- AWS Budgets with alerts at 80%, 100%, 120%
- Monthly cost review meetings
- Reserved instances for predictable workloads
- Automatic shutdown of non-production environments
Staff turnover
- Comprehensive documentation (Confluence)
- Runbooks for common tasks
- Cross-training program
- Bus factor > 2 for critical knowledge