Detailed Patroni Configuration
Objectives
After this lesson, you will:
- Understand each section in the
patroni.ymlfile in depth - Configure bootstrap options
- Tune PostgreSQL parameters for HA
- Configure authentication and security
- Use tags and constraints
- Optimize timing parameters
1. Overview of Patroni Configuration
1.1. Configuration layers
Patroni has multiple configuration layers:
Priority order: Command line > Environment > Config file > DCS
1.2. Static vs Dynamic configuration
Static configuration (in patroni.yml):
- Node-specific settings (name, addresses)
- etcd connection info
- Data directory, bin directory
- Restart required to apply changes
Dynamic configuration (in DCS):
- PostgreSQL parameters
- Bootstrap settings
- TTL, loop_wait, retry_timeout
- Can be updated at runtime:
patronictl edit-config
2. Section: Scope and Namespace
2.1. Scope (Cluster name)
Scope is the unique name of the cluster in DCS.
Meaning:
- All nodes in the same cluster must have the same
scope - DCS keys are prefixed with scope:
/service/postgres/... - Allows multiple clusters on the same etcd cluster
Best practices:
2.2. Namespace
Namespace is the prefix for all keys in DCS.
Full DCS key structure:
Multiple clusters example:
3. Section: Node Information
3.1. Node name
Requirements:
- Unique in cluster
- Does not change after bootstrap
- Should use hostname or FQDN
Example naming conventions:
3.2. Host information
Patroni automatically detects hostname, but can be overridden if needed.
4. Section: REST API
4.1. Basic configuration
Parameters:
listen: Interface and port to bind (0.0.0.0 = all interfaces)connect_address: Address that other nodes use to connect
4.2. Authentication
When is authentication needed?:
- REST API exposed to internet
- Compliance requirements
- Multi-tenant environments
Use with curl:
4.3. SSL/TLS
Generate self-signed certificates:
4.4. REST API endpoints
Health check endpoints:
Management endpoints:
5. Section: Bootstrap
5.1. DCS settings
TTL (Time To Live):
- Leader lock expiration time
- If leader does not renew within TTL → lock expires
- Tradeoff:
- Low TTL (10s): Fast failover, but risk of false positives
- High TTL (60s): More stable, but longer downtime
- Recommended: 30 seconds
loop_wait:
- Interval between health checks
- Leader renews lock every
loop_waitseconds - Recommended: 10 seconds (1/3 of TTL)
retry_timeout:
- Timeout for DCS operations
- If DCS does not respond within timeout → consider failed
- Recommended: 10 seconds
maximum_lag_on_failover:
- Max replication lag to be eligible for promotion
- Replica with lag > threshold will not be chosen as primary
- 0 = no limit (any replica can be promoted)
- Recommended: 1MB for zero data loss preference
synchronous_mode:
false: Asynchronous replication (default)true: Enable synchronous replicationsynchronous_mode_strict: Strict sync mode (no writes if no sync standby)
5.2. PostgreSQL bootstrap parameters
use_pg_rewind:
- Enable automatic recovery with pg_rewind
- Faster recovery when rejoining cluster
- Requires:
wal_log_hints = onor data checksums
use_slots:
- Create replication slots automatically
- Prevent WAL deletion when replica lags
- Recommended: true
5.3. initdb options
Common options:
encoding: Character encoding (UTF8 recommended)locale: System localedata-checksums: Enable page checksums (detect corruption)auth-host: Default authentication method for host connectionsauth-local: Default authentication method for local connections
Note: initdb only runs when bootstrapping cluster for the first time.
5.4. pg_hba configuration
Best practices:
- ✅ Use
scram-sha-256(most secure) - ✅ Specific IP addresses/subnets
- ✅ Separate users for different purposes
- ❌ Avoid
trustmethod - ❌ Avoid
0.0.0.0/0unless necessary
5.5. Bootstrap users
User types:
- admin: Administrative tasks
- application: Application database user
- monitoring: Prometheus exporter, etc.
- replication: Already handled by Patroni
5.6. Post-bootstrap scripts
post_bootstrap: Runs after bootstrapping cluster (only on primary) post_init: Runs after initializing database
Example script (/etc/patroni/scripts/post_bootstrap.sh):
6. Section: PostgreSQL
6.1. Connection settings
listen: Interface for PostgreSQL to listen connect_address: Address for replication connections proxy_address: Virtual IP (HAProxy, pgBouncer)
6.2. Data and binary directories
Notes:
data_dir: Where database files are storedbin_dir: Where PostgreSQL binaries are located (psql, pg_ctl, etc.)config_dir: If config files are in a different location from data_dir
6.3. Authentication
replication: User for streaming replication superuser: Patroni uses to manage PostgreSQL rewind: User for pg_rewind (optional, can use superuser)
Security best practice: Store passwords in environment variables or secrets manager.
6.4. Runtime parameters
Memory sizing guide:
6.5. Additional pg_hba entries
Merge with entries from bootstrap.pg_hba.
6.6. Callback scripts
on_role_change example:
6.7. Custom configuration files
Include custom configuration file.
Example (custom.conf):
6.8. Remove data directory on failover
Be careful: Deletes data directory if recovery fails.
7.1. Failover tags
nofailover:
Use case: Replica only used for reporting, analytics.
noloadbalance:
Use case: Node under maintenance or has issues.
clonefrom:
Use case: Designated backup node.
nosync:
Use case: Async replica in different datacenter.
7.2. Custom tags
Use cases:
- Monitoring and labeling
- Custom failover logic
- Geographic routing
- Multi-tenant identification
7.3. Priority tag
Example cluster:
8. Section: Watchdog
8.1. Basic watchdog configuration
Modes:
off: Disable watchdogautomatic: Use if availablerequired: Fail if watchdog not available
8.2. Hardware watchdog
Check watchdog availability:
Load watchdog module:
Grant access to postgres user:
8.3. Why use watchdog?
Split-brain prevention:
- Patroni hangs but PostgreSQL still running
- Network issue: Patroni loses DCS but node alive
- Watchdog reboots node → Prevent zombie primary
Flow:
9. Section: Synchronous Replication
9.1. Enable synchronous mode
synchronous_mode: Enable sync replication synchronous_mode_strict: Primary refuses writes if no sync standby synchronous_node_count: Number of sync standbys (≥1)
9.2. Synchronous mode variants
Async (default):
- Fast writes
- Risk data loss if primary fails
Synchronous:
- Wait for 1 standby confirmation
- Degrade to async if no standbys
Strict synchronous:
- REFUSE writes if no sync standby
- Zero data loss guarantee
- Risk availability impact
9.3. Multiple synchronous standbys
PostgreSQL 18 supports:
10. Complete Configuration Example
10.1. Production-grade patroni.yml
10.2. Environment variables
Load in systemd:
11. Summary
Key Takeaways
✅ Configuration layers: Command line > Env > Config file > DCS
✅ Static config: Node-specific, requires restart
✅ Dynamic config: Cluster-wide, update via patronictl edit-config
✅ Bootstrap: One-time initialization settings
✅ Tags: Control failover behavior and node roles
✅ Sync replication: Balance between durability and availability
Best Practices Checklist
- Use environment variables for passwords
- Enable
use_pg_rewindwithwal_log_hints: on - Set appropriate
ttl,loop_wait,retry_timeout - Configure
maximum_lag_on_failoverfor zero data loss - Use
data-checksumsin initdb - Set up callback scripts for notifications
- Configure watchdog for split-brain prevention
- Use
scram-sha-256authentication - Document custom tags and their meanings
- Regular backup of configuration files
Preparation for Lesson 9
Lesson 9 will bootstrap the cluster for the first time with the prepared configuration:
- Start Patroni on 3 nodes
- Verify cluster formation
- Test basic operations
- Troubleshoot common issues