Installing and Configuring etcd Cluster
Objectives
After this lesson, you will:
- Understand the role of etcd in Patroni architecture
- Download and install etcd on 3 nodes
- Configure etcd cluster with Raft consensus
- Create systemd service for etcd
- Check health of etcd cluster
- Use basic etcdctl commands
1. Introduction to etcd
1.1. What is etcd?
etcd is a distributed, reliable key-value store using the Raft consensus algorithm. Developed by CoreOS and now a CNCF (Cloud Native Computing Foundation) project.
Key features:
- πΒ Strongly consistent: Ensures consistency with Raft
- πΒ Fast: Sub-millisecond latency for reads
- πΒ Distributed: Runs multi-node cluster with quorum
- π‘Β Watch mechanism: Real-time notifications for changes
- πΒ TTL support: Automatic key expiration (for leader locks)
- πΒ gRPC + HTTP API: Easy integration
1.2. etcd in Patroni Architecture
etcd stores:
/service/postgres/leader: Leader lock (TTL 30s)/service/postgres/members/: Node information/service/postgres/config: Cluster configuration/service/postgres/initialize: Bootstrap state/service/postgres/failover: Failover instructions
2. Download and install etcd
2.1. Architecture considerations
Cluster size recommendations:
- 3 nodes: Recommended for production, tolerate 1 failure
- 5 nodes: High availability, tolerate 2 failures
- 7+ nodes: Overkill for most use cases
Deployment topology:
Lab uses Option 2 (co-located) to save resources.
2.2. Installing etcd on Ubuntu/Debian
Perform on ALL 3 nodes.
Step 1: Download etcd binary
Output:
Step 2: Create etcd user and directories
2.3. Installing on CentOS/RHEL
3. Configure etcd 3-node cluster
3.1. Network topology
3.2. Create configuration file
Node 1 (10.0.1.11) - /etc/etcd/etcd.conf
Node 2 (10.0.1.12) - /etc/etcd/etcd.conf
Node 3 (10.0.1.13) - /etc/etcd/etcd.conf
3.3. Explanation of parameters
| Parameter | Meaning |
|---|---|
| ETCD_NAME | Unique name of member in cluster |
| ETCD_DATA_DIR | Directory to store data |
| ETCD_LISTEN_PEER_URLS | URL to listen for peer communication (port 2380) |
| ETCD_LISTEN_CLIENT_URLS | URL to listen for client connections (port 2379) |
| ETCD_INITIAL_ADVERTISE_PEER_URLS | URL for other peers to connect to |
| ETCD_ADVERTISE_CLIENT_URLS | URL for clients to connect to |
| ETCD_INITIAL_CLUSTER | List of all members when bootstrap |
| ETCD_INITIAL_CLUSTER_STATE | newΒ (first time) orΒ existingΒ (add member) |
| ETCD_INITIAL_CLUSTER_TOKEN | Unique token for cluster (to avoid confusion) |
4. Create systemd service
Create file /etc/systemd/system/etcd.service on ALL 3 nodes:
Reload systemd and enable service:
5. Start etcd cluster
5.1. Start etcd on nodes
Important: Start SIMULTANEOUSLY or within 30 seconds for cluster to form.
Terminal 1 (node1):
Terminal 2 (node2):
Terminal 3 (node3):
5.2. Check logs
Successful startup logs:
6. Check etcd cluster health
6.1. Check cluster members
6.2. Check cluster health
6.3. Check endpoint status
Explanation of output:
IS LEADER: etcd1 is currently the leaderRAFT TERM: Election term (increases with each election)RAFT INDEX: Number of log entries
7. Basic etcdctl commands
7.1. Set environment (optional)
7.2. Basic operations
Put/Get/Delete keys
List keys with prefix
Watch for changes
TTL keys (used for leader locks)
7.3. Advanced operations
Transaction (atomic operations)
Snapshot backup
8. Lab: Complete etcd cluster setup
8.1. Lab objectives
- β Install etcd on 3 nodes
- β Configure cluster
- β Verify cluster health
- β Test basic operations
- β Simulate node failure
8.2. Step-by-step lab guide
1. Install etcd on all nodes
Completed in Section 2.
2. Create config files
Completed in Section 3.
3. Create systemd service
Completed in Section 4.
4. Start cluster
5. Verify cluster
6. Test write/read
7. Test leader election
8. Test data persistence
8.3. Troubleshooting common issues
Issue 1: Cluster won't form
Issue 2: Cannot connect to etcd
Issue 3: Node won't join cluster
Issue 4: Split-brain or multiple leaders
9. Performance tuning
9.1. etcd tuning parameters
9.2. Monitoring etcd
Key metrics to monitor:
- Latency (99th percentile < 50ms)
- Disk fsync duration (< 10ms)
- Leader changes (should be rare)
- Database size
- Failed proposals
Check metrics:
10. Summary
Key Takeaways
β Β etcd cluster: 3-node cluster for production HA
β Β Ports: 2379 (client), 2380 (peer)
β Β Raft consensus: Automatic leader election and data replication
β Β Quorum: Need 2/3 nodes for cluster to operate
β Β TTL keys: Used for Patroni leader locks
β Β etcdctl: CLI tool for management and troubleshooting
Checklist after Lab
- Β etcd cluster 3 nodes running
- Β
etcdctl member listdisplays all 3 members - Β
etcdctl endpoint health --clusterall healthy - Β 1 leader and 2 followers
- Β etcd service enabled and will auto-start on reboot
- Β Firewall allows ports 2379 and 2380
Current Architecture
Preparation for Lesson 7
The next lesson will install Patroni and integrate with the etcd cluster already set up.