Bootstrap PostgreSQL Cluster
Objectives
After this lesson, you will:
- Understand the Patroni cluster bootstrap process
- Start Patroni for the first time on 3 nodes
- Check cluster status with patronictl
- Verify replication is working
- Troubleshoot common issues
- Test basic failover
1. Pre-Bootstrap Checklist
1.1. Verify prerequisites
Before starting Patroni, verify all components are ready:
1.2. Network connectivity test
Verify connectivity between nodes:
1.3. Clean data directories
If data directory is not empty, delete for fresh start:
2. Understanding Bootstrap Process
2.1. Bootstrap flow
2.2. Race condition prevention
Patroni uses DCS to prevent multiple nodes from initializing cluster:
If 2 nodes start simultaneously:
- Faster node acquires
/initializekey - Other node sees key already exists → waits and clones from leader
3. Bootstrap Cluster - Step by Step
3.1. Start Patroni on Node 1
Terminal on Node 1:
Expected logs:
Verify Node 1:
3.2. Verify in etcd
3.3. Start Patroni on Node 2
Terminal on Node 2:
Expected logs:
Verify Node 2:
3.4. Start Patroni on Node 3
Terminal on Node 3:
Expected logs: Similar to Node 2.
Verify Node 3:
4. Verify Cluster Status
4.1. Using patronictl
Column meanings:
- Member: Node name
- Host: Connection address
- Role: Leader (primary) or Replica
- State: running, streaming, in archive recovery
- TL: Timeline (should be same for all)
- Lag in MB: Replication lag
4.2. Check topology
4.3. Using REST API
4.4. Check replication from PostgreSQL
On primary (node1):
Output:
On replicas (node2, node3):
4.5. Verify replication lag
5. Test Basic Operations
5.1. Create test database and table
On primary (connect to any node, patronictl will route to primary):
5.2. Verify replication
On replica (node2 or node3):
5.3. Test continuous replication
Terminal 1 (primary - node1):
Terminal 2 (replica - node2):
Data should increase every second → Replication working!
6. Common Bootstrap Issues
6.1. Issue: Patroni won't start
Symptoms:
Check logs:
Common causes & solutions:
A. Config file syntax error
Solution:
B. Cannot connect to etcd
Solution:
C. Permission denied on data directory
Solution:
D. Port already in use
Solution:
6.2. Issue: Cluster won't initialize
Symptoms: Patroni starts but does not initialize cluster.
Check logs:
Common causes:
A. Data directory not empty
Solution:
B. Initialize key stuck in etcd
Solution:
6.3. Issue: Replica cannot clone from primary
Symptoms: Node 2 or 3 cannot basebackup.
Check logs:
Common causes:
A. Network connectivity
Solution:
B. Authentication failed
Solution:
C. Insufficient space
Solution:
6.4. Issue: Nodes have different timelines
Symptoms:
Solution:
7. Enable Auto-start on Boot
8. Basic Cluster Management
8.1. Restart a node
8.2. Reload configuration
8.3. Pause/Resume auto-failover
8.4. Show configuration
9. Test Automatic Failover (Optional)
WARNING: Only test in non-production environments!
9.1. Simulate primary failure
9.2. Watch cluster failover
9.3. Verify new primary
Note: Timeline increased from 1 → 2 (indicates failover occurred).
9.4. Rejoin old primary
10. Lab Exercise
Lab 1: Bootstrap and verify
Tasks: 1. ✅ Start Patroni on 3 nodes in order 2. ✅ Verify cluster with patronictl list 3. ✅ Check replication status 4. ✅ Create test database and verify data replicates
Lab 2: Test replication lag
Tasks: 1. Insert 10,000 rows into primary 2. Measure replication lag on replicas 3. Monitor pg_stat_replication
Lab 3: Simulate node failure
Tasks: 1. Stop primary node 2. Watch automatic failover 3. Verify new primary elected 4. Rejoin old primary 5. Verify all nodes healthy
11. Summary
Key Takeaways
✅ Bootstrap: First node initializes, others clone
✅ Leader election: Automatic, DCS-based
✅ Replication: Automatic setup via pg_basebackup
✅ patronictl: Primary management tool
✅ Monitoring: Check via patronictl, REST API, pg_stat_replication
✅ Failover: Automatic when primary fails
Checklist after Bootstrap
- All 3 nodes showing in
patronictl list - 1 Leader, 2 Replicas
- All nodes same Timeline
- Replication lag = 0 MB
- Test data replicates to all nodes
- REST API responding on all nodes
- Patroni enabled for auto-start
- etcd cluster healthy
Current Architecture
Preparation for Lesson 10
Lesson 10 will go deeper into Replication Management:
- Synchronous vs Asynchronous replication
- Configure sync mode
- Monitor replication lag
- Handle replication issues