CloudTada | Infrastructure & DevOps Insights

$ ls -lh $PGDATA/pg_wal/
-rw------- 1 postgres postgres 16M Nov 24 10:00 000000010000000000000001
-rw------- 1 postgres postgres 16M Nov 24 10:15 000000010000000000000002
-rw------- 1 postgres postgres 16M Nov 24 10:30 000000010000000000000003

Characteristics:

Each file: 16MB (default)
File name: Timeline ID + Segment Number
Format: TTTTTTTTXXXXXXXXYYYYYYYY
- TTTTTTTT: Timeline (8 hex digits)
- XXXXXXXX: Log file number (8 hex)
- YYYYYYYY: Segment number (8 hex)

2.3. LSN (Log Sequence Number)

LSN is the position in the WAL stream, format: X/Y

X: WAL file number
Y: Offset within file

TEXT

-- Check current LSN
SELECT pg_current_wal_lsn();  -- Primary
-- Output: 0/3000060

SELECT pg_last_wal_receive_lsn();  -- Standby (received)
SELECT pg_last_wal_replay_lsn();   -- Standby (applied)

2.4. WAL Configuration Parameters

TEXT

# postgresql.conf

# WAL Settings
wal_level = replica              # minimal, replica, or logical
                                 # replica: for streaming replication

wal_log_hints = on              # Needed for pg_rewind

# WAL Writing
wal_buffers = 16MB              # WAL buffer size in shared memory
wal_writer_delay = 200ms        # WAL writer sleep time

# WAL Files Management
min_wal_size = 80MB            # Minimum WAL files to keep
max_wal_size = 1GB             # Trigger checkpoint when exceeded

# Checkpoints
checkpoint_timeout = 5min       # Maximum between 2 checkpoints
checkpoint_completion_target = 0.9  # Spread checkpoint writes

2.5. WAL and Crash Recovery

When PostgreSQL crashes:

TEXT

1. Server restart
2. PostgreSQL reads last checkpoint location
3. Replay all WAL records from checkpoint → crash point
4. Restore database to consistent state
5. Ready to accept connections

Example:

TEXT

Timeline:
10:00 ─── Checkpoint ─── 10:05 ─── 10:08 (CRASH)
          (LSN: 0/1000)          (LSN: 0/3000)
          
Recovery:
- Start from LSN 0/1000
- Replay WAL → LSN 0/3000
- Database consistent at 10:08

3. Synchronous vs Asynchronous Replication

3.1. Asynchronous Replication (Default)

How it works:

Asynchronous Replication (Default)

Characteristics:

✅ High Performance: Primary doesn't wait for Standby
✅ Low Latency: Commit time doesn't depend on network
❌ Data Loss Possible: If Primary crashes before Standby receives WAL
❌ RPO > 0: Recovery Point Objective is not zero

Configuration:

TEXT

# postgresql.conf (Primary)
synchronous_commit = off  # or local

Use cases:

Standby in different datacenter (high latency)
Prioritize performance over data safety
Acceptable data loss (a few seconds)

3.2. Synchronous Replication

How it works:

Synchronous Replication

Characteristics:

✅ Zero Data Loss: Transaction only commits when Standby confirms
✅ RPO = 0: Perfect for critical data
❌ Performance Impact: ~2-10ms overhead per commit
❌ Availability Risk: Primary blocks if Standby fails

Configuration:

TEXT

# postgresql.conf (Primary)
synchronous_commit = on              # on, remote_write, remote_apply
synchronous_standby_names = 'standby1,standby2'  # Standby names

# recovery.conf or postgresql.auto.conf (Standby)
primary_conninfo = 'host=primary port=5432 user=replicator application_name=standby1'

Synchronous Commit Levels:

Level	Meaning	Data Safety	Performance
off	Don't wait for Standby	Low	Highest
local	Wait only for local disk	Medium	High
remote_write	Wait for Standby to write to OS cache	Good	Medium
on	Wait for Standby to flush to disk	Good	Slower
remote_apply	Wait for Standby to apply changes	Best	Slowest

3.3. Quorum-based Synchronous Replication

PostgreSQL 9.6+: Flexible synchronous replication

TEXT

# Wait for ANY 1 of 2 standbys
synchronous_standby_names = 'ANY 1 (standby1, standby2)'

# Wait for FIRST 2 of 3 standbys
synchronous_standby_names = 'FIRST 2 (standby1, standby2, standby3)'

# Wait for ALL standbys (same as old)
synchronous_standby_names = 'standby1, standby2'

Example: ANY 1

TEXT

3 Standbys: standby1 (DC1), standby2 (DC2), standby3 (DC3)

Transaction commits when:
✅ Primary committed + ANY 1 standby acknowledged

Scenario:
- standby1: ACK in 5ms
- standby2: ACK in 100ms (slow network)
- standby3: DOWN

→ Transaction commits after 5ms (waiting for standby1)
→ Good performance + Data safety

3.4. Sync vs Async Comparison

Criteria	Async	Sync
Commit latency	~1ms	~5-10ms
Data loss risk	Yes (a few seconds)	No
RPO	Seconds	Zero
RTO	~30-60s	~30-60s
Primary performance	100%	95-98%
Network dependency	Low	High
Use case	Read replicas, Reporting	Critical data, Financial

4. Replication Slots

4.1. Problem before Replication Slots

Scenario:

TEXT

1. Primary generates WAL files
2. Checkpoint happens → Old WAL cleaned up
3. Standby offline for hours
4. Standby comes back online
5. ❌ Needed WAL files have been deleted
6. ❌ Standby cannot catch up
7. ❌ Need to rebuild Standby from scratch

4.2. Replication Slots solve the problem

Replication Slot ensures Primary keeps WAL files until Standby consumes them.

Replication Slot

4.3. Create and manage Replication Slots

Create slot on Primary:

TEXT

-- Physical replication slot
SELECT * FROM pg_create_physical_replication_slot('standby1_slot');

-- View slot list
SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

-- Output:
 slot_name     | slot_type | active | restart_lsn | confirmed_flush_lsn
---------------+-----------+--------+-------------+--------------------
 standby1_slot | physical  | t      | 0/3000000   | NULL

Use slot on Standby:

ini

TEXT

# postgresql.auto.conf (Standby)
primary_slot_name = 'standby1_slot'

Delete slot:

sql

TEXT

SELECT pg_drop_replication_slot('standby1_slot');

4.4. Monitor Replication Slots

sql

TEXT

-- Check slot status
SELECT 
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as retained_wal
FROM pg_replication_slots;

-- Alert if retained_wal is too large (>10GB)

4.5. Important notes

⚠️ Risks:

If Standby is offline for long with slot → Primary keeps WAL forever
Can fill Primary's disk
Need monitoring and alerts

Best practice:

sql

TEXT

-- Set max WAL size to protect Primary
ALTER SYSTEM SET max_slot_wal_keep_size = '100GB';  -- PostgreSQL 13+

-- Or automatically drop inactive slot after 24h
SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE NOT active 
  AND pg_current_wal_lsn() - restart_lsn > 100*1024*1024*1024;  -- 100GB

5. Lab: Manual Streaming Replication Setup

5.1. Lab Objectives

Create a PostgreSQL cluster with:

1 Primary server
1 Standby server
Streaming replication (asynchronous)
Hot standby (read queries)

5.2. Environment

TEXT

Primary:  192.168.1.101 (node1)
Standby:  192.168.1.102 (node2)
PostgreSQL: 14
OS: Ubuntu 22.04

5.3. Step 1: Install PostgreSQL (both nodes)

bash

TEXT

# Install PostgreSQL 14
sudo apt update
sudo apt install -y postgresql-14 postgresql-contrib-14

# Stop service
sudo systemctl stop postgresql

5.4. Step 2: Configure Primary (node1)

Create replication user:

bash

TEXT

sudo -u postgres psql

sql

TEXT

-- Create user for replication
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'repl_password';

-- Exit
\q

Configure postgresql.conf:

bash

TEXT

sudo nano /etc/postgresql/14/main/postgresql.conf

ini

TEXT

# Connection
listen_addresses = '*'
port = 5432

# Replication
wal_level = replica
max_wal_senders = 5
max_replication_slots = 5
wal_keep_size = 1GB

# Hot Standby (not needed for primary but good to have)
hot_standby = on

# Archive (optional, recommended)
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/14/archive/%f && cp %p /var/lib/postgresql/14/archive/%f'

Create archive directory:

bash

TEXT

sudo mkdir -p /var/lib/postgresql/14/archive
sudo chown postgres:postgres /var/lib/postgresql/14/archive

Configure pg_hba.conf:

bash

TEXT

sudo nano /etc/postgresql/14/main/pg_hba.conf

ini

TEXT

# Replication connections
host    replication     replicator      192.168.1.102/32        md5
host    replication     replicator      127.0.0.1/32            md5

Start Primary:

bash

TEXT

sudo systemctl start postgresql
sudo systemctl status postgresql

Create replication slot:

bash

TEXT

sudo -u postgres psql

sql

TEXT

SELECT pg_create_physical_replication_slot('standby_slot');
SELECT * FROM pg_replication_slots;
\q

5.5. Step 3: Setup Standby (node2)

Stop PostgreSQL and backup old data:

bash

TEXT

sudo systemctl stop postgresql
sudo mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main.bak

Base backup from Primary:

bash

TEXT

# Use pg_basebackup
sudo -u postgres pg_basebackup \
    -h 192.168.1.101 \
    -D /var/lib/postgresql/14/main \
    -U replicator \
    -P \
    -v \
    -R \
    -X stream \
    -C -S standby_slot

# Options explanation:
# -h: Primary host
# -D: Data directory
# -U: Replication user
# -P: Show progress
# -v: Verbose
# -R: Create standby.signal and postgresql.auto.conf
# -X stream: Stream WAL during backup
# -C: Create replication slot
# -S: Slot name

Sample output:

TEXT

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created replication slot "standby_slot"
24567/24567 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

Check that standby.signal was created:

bash

TEXT

ls -l /var/lib/postgresql/14/main/standby.signal
# This file marks this as a standby server

Check postgresql.auto.conf:

bash

TEXT

sudo cat /var/lib/postgresql/14/main/postgresql.auto.conf

ini

TEXT

# Created automatically by pg_basebackup -R
primary_conninfo = 'user=replicator password=repl_password host=192.168.1.101 port=5432 sslmode=prefer sslcompression=0 krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'standby_slot'

Start Standby:

bash

TEXT

sudo systemctl start postgresql
sudo systemctl status postgresql

5.6. Step 4: Verify Replication

On Primary (node1):

sql

TEXT

sudo -u postgres psql

-- Check replication status
SELECT 
    client_addr,
    state,
    sync_state,
    replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as lag
FROM pg_stat_replication;

-- Output:
 client_addr   |   state   | sync_state | replay_lsn  |  lag
---------------+-----------+------------+-------------+-------
 192.168.1.102 | streaming | async      | 0/3000060   | 0 bytes

On Standby (node2):

sql

TEXT

sudo -u postgres psql

-- Check standby status
SELECT pg_is_in_recovery();  -- Should return 't' (true)

-- Check replication lag
SELECT 
    pg_last_wal_receive_lsn() AS receive,
    pg_last_wal_replay_lsn() AS replay,
    pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())) AS lag;

-- Output:
   receive   |   replay    |  lag
-------------+-------------+--------
 0/3000060   | 0/3000060   | 0 bytes

5.7. Step 5: Test Replication

On Primary - Create test data:

sql

TEXT

-- Create database and table
CREATE DATABASE testdb;
\c testdb

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO users (name) VALUES 
    ('Alice'),
    ('Bob'),
    ('Charlie');

SELECT * FROM users;

On Standby - Verify data:

sql

TEXT

\c testdb

-- Read queries work
SELECT * FROM users;

-- Output:
 id |  name   |       created_at
----+---------+------------------------
  1 | Alice   | 2024-11-24 10:30:15
  2 | Bob     | 2024-11-24 10:30:15
  3 | Charlie | 2024-11-24 10:30:15

-- Write queries are rejected
INSERT INTO users (name) VALUES ('David');
-- ERROR: cannot execute INSERT in a read-only transaction

5.8. Step 6: Monitoring Queries

Replication delay monitoring:

sql

TEXT

-- On Primary
CREATE OR REPLACE FUNCTION replication_lag_bytes()
RETURNS TABLE(client_addr INET, lag_bytes BIGINT) AS $
BEGIN
    RETURN QUERY
    SELECT 
        c.client_addr,
        pg_wal_lsn_diff(pg_current_wal_lsn(), c.replay_lsn)::BIGINT
    FROM pg_stat_replication c;
END;
$ LANGUAGE plpgsql;

-- Use
SELECT * FROM replication_lag_bytes();

Alert if lag > 10MB:

sql

TEXT

SELECT client_addr, 
       pg_size_pretty(lag_bytes) as lag
FROM replication_lag_bytes()
WHERE lag_bytes > 10*1024*1024;

5.9. Troubleshooting Common Issues

Issue 1: Standby cannot connect to Primary

bash

TEXT

# Check logs
sudo tail -f /var/lib/postgresql/14/main/log/postgresql-*.log

# Common errors:
# - "FATAL: password authentication failed"
#   → Check pg_hba.conf and password

# - "FATAL: no pg_hba.conf entry for replication"
#   → Add replication entry to pg_hba.conf

# - Connection refused
#   → Check firewall, listen_addresses

Issue 2: High replication lag

sql

TEXT

-- Check WAL sender busy
SELECT * FROM pg_stat_activity 
WHERE backend_type = 'walsender';

-- Check I/O on Standby
SELECT * FROM pg_stat_bgwriter;

Issue 3: Slot fills up disk

sql

TEXT

-- Check retained WAL
SELECT 
    slot_name,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as retained
FROM pg_replication_slots;

-- Drop inactive slot if needed
SELECT pg_drop_replication_slot('standby_slot');

6. Best Practices

6.1. Configuration Tuning

ini

TEXT

# Primary - postgresql.conf

# Network buffer (if many standbys)
max_wal_senders = 10  # Depending on standbys + 2 backup
# WAL retention
wal_keep_size = 2GB  # Keep enough WAL for standby to catch up
max_slot_wal_keep_size = 10GB  # Limit slot retention (PG 13+)

# Archive (backup strategy)
archive_mode = on
archive_command = 'cp %p /backup/archive/%f'

# Checkpoint tuning
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

✅ Replication lag (bytes and time) ✅ Standby connection status ✅ WAL sender processes ✅ Disk space (pg_wal/ and archive/) ✅ Replication slots (retained WAL) ✅ Checkpoint performance

6.3. Security Recommendations

ini

TEXT

# Use SSL for replication
ssl = on
ssl_cert_file = '/path/to/server.crt'
ssl_key_file = '/path/to/server.key'

# Standby connection string
primary_conninfo = '... sslmode=require sslcompression=1'

ini

TEXT

# pg_hba.conf - Use hostssl
hostssl replication replicator 192.168.1.0/24 md5

7. Summary

Key Takeaways

Streaming Replication is the foundation of PostgreSQL HA:
- Realtime WAL streaming
- Hot Standby for read queries
- Basis for Patroni automated failover
WAL (Write-Ahead Logging):
- Log before write to data
- Crash recovery mechanism
- Replication transport format
Synchronous vs Asynchronous:
- Async: High performance, possible data loss
- Sync: Zero data loss, performance impact
- Quorum-based: Balance between both
Replication Slots:
- Ensure WAL is not deleted too early
- Critical for standby stability
- Need monitoring to avoid disk full

Course

PostgreSQL High Availability A-Z

Share this article

You might also like