Patroni Callbacks
Learning Objectives
After this lesson, you will:
- Understand what Patroni callbacks are and when they are triggered
- Implement custom scripts for lifecycle events
- Configure callbacks for automation tasks
- Handle role changes (primary ā replica)
- Setup notifications and monitoring hooks
- Troubleshoot callback failures
1. Callbacks Overview
1.1. What are Callbacks?
Callbacks = Custom scripts executed by Patroni at cluster lifecycle events.
Use cases:
- š Notifications: Alert team when failover occurs
- š§ Automation: Update DNS, load balancer configs
- š Monitoring: Push metrics to monitoring system
- š¦ Traffic management: Redirect application traffic
- š Security: Rotate credentials, update firewall rules
- š Logging: Custom audit logs
1.2. Available callbacks
Patroni provides the following callback events:
| Callback | Trigger | Use Case |
|---|---|---|
| on_start | Before PostgreSQL starts | Pre-start checks, mount volumes |
| on_stop | Before PostgreSQL stops | Cleanup, notify applications |
| on_restart | Before PostgreSQL restarts | Log restart event |
| on_reload | After PostgreSQL config reload | Verify config changes |
| on_role_change | Role changes (primary ā replica) | Most important - update DNS, LB |
| pre_promote | Before replica promoted to primary | Final checks before promotion |
| post_promote | After replica promoted to primary | Update monitoring, send alerts |
1.3. Callback execution flow
1.4. Callback environment variables
Patroni passes environment variables to scripts:
| Variable | Description | Example |
|---|---|---|
| PATRONI_ROLE | Current role after change | master, replica |
| PATRONI_SCOPE | Cluster name | postgres |
| PATRONI_NAME | Node name | node1 |
| PATRONI_CLUSTER_NAME | Cluster name (alias) | postgres |
| PATRONI_VERSION | Patroni version | 3.2.0 |
For on_role_change:
| Variable | Value |
|---|---|
| PATRONI_NEW_ROLE | New role: master or replica |
| PATRONI_OLD_ROLE | Previous role |
2. Configure Callbacks in Patroni
2.1. Basic configuration
In patroni.yml:
Key points:
- Paths must be absolute
- Scripts must be executable (
chmod +x) - Owned by postgres user
- Should complete quickly
(<30 seconds) - Non-zero exit code = callback failed (logged but doesn't block operation)
2.2. Create callback directory
3. Implement Callback Scripts
3.1. on_start callback
Use case: Pre-start validation, mount checks.
Script: /var/lib/postgresql/callbacks/on_start.sh
Create script:
3.2. on_stop callback
Use case: Graceful shutdown notifications.
Script: /var/lib/postgresql/callbacks/on_stop.sh
Create script:
3.3. on_role_change callback (Most Important!)
Use case: Update DNS, load balancers, send notifications.
Script: /var/lib/postgresql/callbacks/on_role_change.sh
Create production-ready script:
3.4. on_restart callback
Use case: Log restarts, notify about planned maintenance.
3.5. on_reload callback
Use case: Verify configuration changes were applied.
3.6. Create log directory
4. Update Patroni Configuration
4.1. Add callbacks to patroni.yml
On all 3 nodes, edit /etc/patroni/patroni.yml:
4.2. Reload Patroni configuration
5. Test Callbacks
5.1. Test on_restart
5.2. Test on_reload
5.3. Test on_role_change (Failover)
ā ļø IMPORTANT: Test in non-production!
6. Advanced Callback Examples
6.1. DNS update using nsupdate
Prerequisites: BIND DNS server with DDNS enabled.
6.2. HAProxy backend update
Via stats socket:
6.3. Consul service registration
6.4. Email notification
6.5. Slack/Teams webhook
Detailed Slack notification:
6.6. Metrics push to monitoring
Push to Prometheus Pushgateway:
7. Callback Best Practices
ā DO
- Keep callbacks fast
- Complete within 10-30 seconds
- Long tasks ā background jobs
- Use proper logging
- Log all actions
- Include timestamps
- Rotate logs
- Handle errors gracefully
- Use
set -ecarefully - Catch errors, log, continue
- Non-zero exit = warning, not failure
- Use
- Test thoroughly
- Test in staging
- Simulate all scenarios
- Verify idempotency
- Make scripts idempotent
- Can run multiple times safely
- Check before modify
- Use absolute paths
- Don't rely on PATH
- Specify full paths
- Secure credentials
- Don't hardcode passwords
- Use environment variables or secrets manager
- Monitor callback execution
- Alert on failures
- Track execution time
ā DON'T
- Don't block for long time
- Patroni waits for callbacks
- Long delays ā slower failover
- Don't rely on network during failover
- Network may be partitioned
- Have fallback mechanisms
- Don't fail the callback unnecessarily
- Exit 0 even if notification fails
- Log errors but continue
- Don't run database queries in callbacks
- PostgreSQL may not be ready
- Can cause deadlocks
- Don't modify PostgreSQL configuration
- Let Patroni manage config
- Use Patroni's parameters
- Don't use interactive commands
- No user input
- Must run unattended
8. Troubleshoot Callback Issues
8.1. Callback not executing
Check:
8.2. Callback failing
Check logs:
Common issues:
- Syntax error: Run
bash -n script.shto check - Missing dependency: Install required tools (curl, nc, etc.)
- Permission denied: Check file/directory permissions
- Timeout: Script taking too long
8.3. Callback causing slow failover
Measure callback execution time:
9. Production Callback Template
Complete production-ready template:
10. Lab Exercises
Lab 1: Setup basic callbacks
Tasks:
- Create callback directory and scripts
- Add callbacks to patroni.yml
- Reload Patroni
- Test with
patronictl restart
Lab 2: Test failover callbacks
Tasks:
- Monitor callback logs:
tail -f /var/log/patroni/callbacks.log - Stop primary:
sudo systemctl stop patroni - Verify on_role_change executed on new primary
- Check marker files:
/tmp/postgres_is_* - Restart old primary, verify it rejoins as replica
Lab 3: Implement Slack notifications
Tasks:
- Get Slack webhook URL
- Add notification to on_role_change.sh
- Test by triggering failover
- Verify message received in Slack
Lab 4: Measure callback performance
Tasks:
- Add timing to all callbacks
- Trigger various events (restart, reload, failover)
- Analyze callback execution times
- Optimize slow callbacks
11. Summary
Key Takeaways
ā Callbacks = Custom automation at lifecycle events
ā on_role_change = Most critical callback for failover automation
ā
Keep callbacks fast (<30s) for quick failover
ā Log everything for debugging
ā Test thoroughly before production
ā Handle errors gracefully - don't block operations
Common Use Cases
| Callback | Common Actions |
|---|---|
| on_start | Pre-flight checks, mount verification |
| on_stop | Cleanup, notifications |
| on_role_change | Update DNS, LB, send alerts |
| on_restart | Log maintenance events |
| on_reload | Verify config changes |
Current Architecture
Preparing for Lesson 12
Lesson 12 will cover Patroni REST API:
- Health check endpoints
- Cluster status queries
- Configuration management via API
- Integration with load balancers
- Monitoring and metrics