PostgreSQL Streaming Replication
Learning Objectives
After this lesson, you will:
- Deeply understand the Streaming Replication mechanism in PostgreSQL
- Master Write-Ahead Logging (WAL) and its role
- Distinguish between Synchronous and Asynchronous Replication
- Understand and use Replication Slots
- Practice manual replication setup (Primary-Standby)
1. Streaming Replication Operation Mechanism
1.1. Overview
Streaming Replication is the method PostgreSQL uses to replicate data from a Primary server to one or more Standby servers in real-time.
How Streaming Replication Works
1.2. Main components
WAL Sender (on Primary)
- Process dedicated to sending WAL records to Standby
- One WAL sender per Standby connection
- Monitoring:
SELECT * FROM pg_stat_replication;
WAL Receiver (on Standby)
- Process receives WAL records from Primary
- Writes WAL to local WAL files
- Sends feedback to Primary (LSN position, status)
Startup Process (on Standby)
- Replays WAL records into data files
- Similar to recovery process
- Can serve read queries (Hot Standby)
1.3. Detailed data flow
Transaction Commit Flow
Actual timing:
- Asynchronous: ~0-100ms lag
- Synchronous: ~1-10ms lag (depending on network latency)
2. Write-Ahead Logging (WAL)
2.1. What is WAL?
Write-Ahead Logging is a logging technique where:
"All changes must be written to the log BEFORE writing to data files"
WAL Principle:
Write-Ahead Logging (WAL)
2.2. WAL Files Structure
Location: $PGDATA/pg_wal/
Characteristics:
- Each file: 16MB (default)
- File name: Timeline ID + Segment Number
- Format:
TTTTTTTTXXXXXXXXYYYYYYYY- TTTTTTTT: Timeline (8 hex digits)
- XXXXXXXX: Log file number (8 hex)
- YYYYYYYY: Segment number (8 hex)
2.3. LSN (Log Sequence Number)
LSN is the position in the WAL stream, format: X/Y
- X: WAL file number
- Y: Offset within file
2.4. WAL Configuration Parameters
2.5. WAL and Crash Recovery
When PostgreSQL crashes:
Example:
3. Synchronous vs Asynchronous Replication
3.1. Asynchronous Replication (Default)
How it works:
Asynchronous Replication (Default)
Characteristics:
- ✅ High Performance: Primary doesn't wait for Standby
- ✅ Low Latency: Commit time doesn't depend on network
- ❌ Data Loss Possible: If Primary crashes before Standby receives WAL
- ❌ RPO > 0: Recovery Point Objective is not zero
Configuration:
Use cases:
- Standby in different datacenter (high latency)
- Prioritize performance over data safety
- Acceptable data loss (a few seconds)
3.2. Synchronous Replication
How it works:
Synchronous Replication
Characteristics:
- ✅ Zero Data Loss: Transaction only commits when Standby confirms
- ✅ RPO = 0: Perfect for critical data
- ❌ Performance Impact: ~2-10ms overhead per commit
- ❌ Availability Risk: Primary blocks if Standby fails
Configuration:
Synchronous Commit Levels:
| Level | Meaning | Data Safety | Performance |
|---|---|---|---|
| off | Don't wait for Standby | Low | Highest |
| local | Wait only for local disk | Medium | High |
| remote_write | Wait for Standby to write to OS cache | Good | Medium |
| on | Wait for Standby to flush to disk | Good | Slower |
| remote_apply | Wait for Standby to apply changes | Best | Slowest |
3.3. Quorum-based Synchronous Replication
PostgreSQL 9.6+: Flexible synchronous replication
Example: ANY 1
3.4. Sync vs Async Comparison
| Criteria | Async | Sync |
|---|---|---|
| Commit latency | ~1ms | ~5-10ms |
| Data loss risk | Yes (a few seconds) | No |
| RPO | Seconds | Zero |
| RTO | ~30-60s | ~30-60s |
| Primary performance | 100% | 95-98% |
| Network dependency | Low | High |
| Use case | Read replicas, Reporting | Critical data, Financial |
4. Replication Slots
4.1. Problem before Replication Slots
Scenario:
4.2. Replication Slots solve the problem
Replication Slot ensures Primary keeps WAL files until Standby consumes them.
Replication Slot
4.3. Create and manage Replication Slots
Create slot on Primary:
Use slot on Standby:
ini
Delete slot:
sql
4.4. Monitor Replication Slots
sql
4.5. Important notes
⚠️ Risks:
- If Standby is offline for long with slot → Primary keeps WAL forever
- Can fill Primary's disk
- Need monitoring and alerts
Best practice:
sql
5. Lab: Manual Streaming Replication Setup
5.1. Lab Objectives
Create a PostgreSQL cluster with:
- 1 Primary server
- 1 Standby server
- Streaming replication (asynchronous)
- Hot standby (read queries)
5.2. Environment
5.3. Step 1: Install PostgreSQL (both nodes)
bash
5.4. Step 2: Configure Primary (node1)
Create replication user:
bash
sql
Configure postgresql.conf:
bash
ini
Create archive directory:
bash
Configure pg_hba.conf:
bash
ini
Start Primary:
bash
Create replication slot:
bash
sql
5.5. Step 3: Setup Standby (node2)
Stop PostgreSQL and backup old data:
bash
Base backup from Primary:
bash
Sample output:
Check that standby.signal was created:
bash
Check postgresql.auto.conf:
bash
ini
Start Standby:
bash
5.6. Step 4: Verify Replication
On Primary (node1):
sql
On Standby (node2):
sql
5.7. Step 5: Test Replication
On Primary - Create test data:
sql
On Standby - Verify data:
sql
5.8. Step 6: Monitoring Queries
Replication delay monitoring:
sql
Alert if lag > 10MB:
sql
5.9. Troubleshooting Common Issues
Issue 1: Standby cannot connect to Primary
bash
Issue 2: High replication lag
sql
Issue 3: Slot fills up disk
sql
6. Best Practices
6.1. Configuration Tuning
ini
6.2. Monitoring Checklist
✅ Replication lag (bytes and time) ✅ Standby connection status ✅ WAL sender processes ✅ Disk space (pg_wal/ and archive/) ✅ Replication slots (retained WAL) ✅ Checkpoint performance
6.3. Security Recommendations
ini
ini
7. Summary
Key Takeaways
- Streaming Replication is the foundation of PostgreSQL HA:
- Realtime WAL streaming
- Hot Standby for read queries
- Basis for Patroni automated failover
- WAL (Write-Ahead Logging):
- Log before write to data
- Crash recovery mechanism
- Replication transport format
- Synchronous vs Asynchronous:
- Async: High performance, possible data loss
- Sync: Zero data loss, performance impact
- Quorum-based: Balance between both
- Replication Slots:
- Ensure WAL is not deleted too early
- Critical for standby stability
- Need monitoring to avoid disk full