Planned Switchover
Learning Objectives
After this lesson, you will:
- Distinguish between switchover and failover
- Perform planned switchover safely
- Understand graceful vs immediate switchover
- Minimize downtime during maintenance
- Automate switchover for rolling updates
- Handle switchover in production
1. Switchover Overview
1.1. Switchover là gì?
Switchover = Có kế hoạch promote một replica lên làm primary.
So sánh với Failover:
| Aspect | Failover | Switchover |
|---|---|---|
| Trigger | Primary failure (unplanned) | Manual/scheduled (planned) |
| Downtime | 30-60 seconds | 0-10 seconds |
| Data loss | Possible (if async) | Zero (controlled) |
| Control | Automatic | Manual/scripted |
| Timing | Unpredictable | Scheduled |
1.2. Khi nào cần switchover?
Common scenarios:
A. Hardware maintenance
B. Software upgrades
C. Database migration
D. Datacenter migration
E. Testing
1.3. Switchover Benefits
✅ Zero data loss - All transactions committed before switch
✅ Controlled timing - During maintenance window
✅ Lower risk - Coordinated, tested process
✅ Minimal downtime - 0-10 seconds vs 30-60 for failover
✅ Reversible - Can switchover back if issues
2. Types of Switchover
2.1. Graceful Switchover (Default)
Process:
Command:
2.2. Immediate Switchover
Process:
Command:
2.3. Scheduled Switchover
Process:
Command:
3. Switchover Prerequisites
3.1. Cluster health check
3.2. Replication lag check
3.3. Target candidate check
3.4. Connection availability
4. Performing Switchover
4.1. Interactive Switchover (Recommended)
Step-by-step:
Output:
4.2. Non-interactive Switchover
Direct command:
4.3. Scheduled Switchover
Schedule for maintenance window:
Verify scheduled switchover:
Cancel scheduled switchover:
4.4. Switchover with REST API
Trigger via API:
5. Switchover Timeline
5.1. Detailed flow
5.2. What happens to active connections?
During switchover:
Application behavior:
6. Verification After Switchover
6.1. Cluster status
6.2. Replication status
6.3. Write test
6.4. Timeline verification
7. Switchover Best Practices
7.1. Pre-switchover checklist
7.2. Minimize downtime strategies
A. Connection pooler
B. Read-replica routing
C. Application-level retry
7.3. Communication plan
Before switchover:
During switchover:
After switchover:
8. Troubleshooting Switchover
8.1. Issue: Switchover command hangs
Symptoms: patronictl switchover never completes.
Diagnosis:
Solution:
8.2. Issue: Candidate not eligible
Symptoms: Error "candidate is not eligible".
Diagnosis:
Solution:
8.3. Issue: Old primary won't demote
Symptoms: Switchover fails, old primary still leader.
Diagnosis:
Solution:
8.4. Issue: Replication broken after switchover
Symptoms: Old primary not replicating from new primary.
Diagnosis:
Solution:
9. Switchover Automation
9.1. Scripted switchover
9.2. Ansible playbook
Run:
9.3. CI/CD integration
10. Rolling Updates with Switchover
10.1. Update strategy
Scenario: Update PostgreSQL from 17 → 18.
Steps:
10.2. Kernel update example
11. Lab Exercises
Lab 1: Basic switchover
Tasks:
- Check current primary:
patronictl list - Perform switchover:
patronictl switchover postgres - Measure downtime with continuous query loop
- Verify new topology
- Document observations
Lab 2: Scheduled switchover
Tasks:
- Schedule switchover for 2 minutes from now
- Monitor logs during wait period
- Observe automatic execution
- Cancel a scheduled switchover (repeat and test cancel)
Lab 3: Forced vs graceful
Tasks:
- Create long-running query:
SELECT pg_sleep(300); - Attempt graceful switchover (observe wait)
- Cancel and retry with --force
- Compare behavior and downtime
Lab 4: Rolling update simulation
Tasks:
- Start with 3-node cluster
- "Update" node3 (simulate by restarting)
- "Update" node2
- Switchover to node2
- "Update" node1
- Verify all nodes operational
Lab 5: Switchover under load
Tasks:
- Start pgbench:
pgbench -c 10 -T 300 - During load, perform switchover
- Analyze pgbench output for errors
- Calculate success rate
- Test with connection pooler (PgBouncer)
12. Tổng kết
Key Concepts
✅ Switchover = Planned, controlled role change
✅ Graceful = Wait for transactions (slower, safer)
✅ Immediate = Force termination (faster, riskier)
✅ Scheduled = Automated at specific time
✅ Zero downtime = Achievable with proper architecture
Switchover vs Failover
| Aspect | Switchover | Failover |
|---|---|---|
| Planning | Scheduled | Unplanned |
| Control | Manual | Automatic |
| Downtime | 0-10s | 30-60s |
| Data loss | None | Possible |
| Reversible | Yes | No |
Best Practices
- ✅ Test in staging first
- ✅ Schedule during low-traffic windows
- ✅ Use graceful mode (default)
- ✅ Verify lag = 0 before switchover
- ✅ Monitor during process
- ✅ Have rollback plan
- ✅ Communicate with stakeholders
- ✅ Document procedure
Next Steps
Bài 15 sẽ cover Recovering Failed Nodes:
- Rejoin old primary after failover
- pg_rewind usage and scenarios
- Full rebuild with pg_basebackup
- Timeline divergence resolution
- Split-brain recovery