CloudTada | Infrastructure & DevOps Insights

Installing and Configuring etcd Cluster

Objectives

After this lesson, you will:

Understand the role of etcd in Patroni architecture
Download and install etcd on 3 nodes
Configure etcd cluster with Raft consensus
Create systemd service for etcd
Check health of etcd cluster
Use basic etcdctl commands

1. Introduction to etcd

1.1. What is etcd?

etcd is a distributed, reliable key-value store using the Raft consensus algorithm. Developed by CoreOS and now a CNCF (Cloud Native Computing Foundation) project.

Key features:

🔐 Strongly consistent: Ensures consistency with Raft
🚀 Fast: Sub-millisecond latency for reads
🔄 Distributed: Runs multi-node cluster with quorum
📡 Watch mechanism: Real-time notifications for changes
🔒 TTL support: Automatic key expiration (for leader locks)
🌐 gRPC + HTTP API: Easy integration

1.2. etcd in Patroni Architecture

TEXT

┌──────────────────────────────────┐
│      etcd Cluster (3 nodes)      │
│  ┌─────┐   ┌─────┐   ┌─────┐    │
│  │etcd1│───│etcd2│───│etcd3│    │
│  └──┬──┘   └──┬──┘   └──┬──┘    │
│     │         │         │         │
│     └─────────┴─────────┘         │
│        Raft Consensus             │
└──────────────────────────────────┘
         │        │        │
    ┌────┴────┐  │  ┌─────┴─────┐
    ▼         ▼  ▼  ▼           ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Patroni 1│ │Patroni 2│ │Patroni 3│
└─────────┘ └─────────┘ └─────────┘

etcd stores:

/service/postgres/leader: Leader lock (TTL 30s)
/service/postgres/members/: Node information
/service/postgres/config: Cluster configuration
/service/postgres/initialize: Bootstrap state
/service/postgres/failover: Failover instructions

2. Download and install etcd

2.1. Architecture considerations

Cluster size recommendations:

3 nodes: Recommended for production, tolerate 1 failure
5 nodes: High availability, tolerate 2 failures
7+ nodes: Overkill for most use cases

Deployment topology:

TEXT

Option 1: etcd on separate servers (Recommended)
┌──────────┐  ┌──────────┐  ┌──────────┐
│  etcd1   │  │  etcd2   │  │  etcd3   │
└──────────┘  └──────────┘  └──────────┘
      ▲             ▲             ▲
      └─────────────┴─────────────┘
      │             │             │
┌──────────┐  ┌──────────┐  ┌──────────┐
│Patroni 1 │  │Patroni 2 │  │Patroni 3 │
│  + PG    │  │  + PG    │  │  + PG    │
└──────────┘  └──────────┘  └──────────┘

Option 2: etcd co-located (For labs/dev)
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   etcd1      │  │   etcd2      │  │   etcd3      │
│   Patroni 1  │  │   Patroni 2  │  │   Patroni 3  │
│   PG         │  │   PG         │  │   PG         │
└──────────────┘  └──────────────┘  └──────────────┘

Lab uses Option 2 (co-located) to save resources.

2.2. Installing etcd on Ubuntu/Debian

Perform on ALL 3 nodes.

Step 1: Download etcd binary

TEXT

# Set version
ETCD_VER=v3.5.11

# Download
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

# Extract
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz

# Move binaries to PATH
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdutl /usr/local/bin/

# Verify
etcd --version
etcdctl version

Output:

TEXT

etcd Version: 3.5.11
Git SHA: ...
Go Version: go1.20.12

Step 2: Create etcd user and directories

TEXT

# Create user
sudo useradd -r -s /bin/false etcd

# Create directories
sudo mkdir -p /var/lib/etcd
sudo mkdir -p /etc/etcd

# Set ownership
sudo chown -R etcd:etcd /var/lib/etcd
sudo chown -R etcd:etcd /etc/etcd

2.3. Installing on CentOS/RHEL

TEXT

# Download (same as Ubuntu)
ETCD_VER=v3.5.11
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz

sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/

# Create user and directories
sudo useradd -r -s /sbin/nologin etcd
sudo mkdir -p /var/lib/etcd /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd

3. Configure etcd 3-node cluster

3.1. Network topology

TEXT

node1 (etcd1): 10.0.1.11:2379,2380
node2 (etcd2): 10.0.1.12:2379,2380
node3 (etcd3): 10.0.1.13:2379,2380

Port 2379: Client communication (Patroni connects here)
Port 2380: Peer communication (etcd cluster internal)

3.2. Create configuration file

Node 1 (10.0.1.11) - /etc/etcd/etcd.conf

TEXT

# Member name
ETCD_NAME="etcd1"

# Data directory
ETCD_DATA_DIR="/var/lib/etcd/etcd1.etcd"

# Listen URLs
ETCD_LISTEN_PEER_URLS="http://10.0.1.11:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.11:2379,http://127.0.0.1:2379"

# Advertise URLs (what other nodes use to connect)
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.11:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.11:2379"

# Cluster configuration
ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

# Logging
ETCD_LOG_LEVEL="info"

Node 2 (10.0.1.12) - /etc/etcd/etcd.conf

TEXT

ETCD_NAME="etcd2"
ETCD_DATA_DIR="/var/lib/etcd/etcd2.etcd"

ETCD_LISTEN_PEER_URLS="http://10.0.1.12:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.12:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.12:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.12:2379"

ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

ETCD_LOG_LEVEL="info"

Node 3 (10.0.1.13) - /etc/etcd/etcd.conf

TEXT

ETCD_NAME="etcd3"
ETCD_DATA_DIR="/var/lib/etcd/etcd3.etcd"

ETCD_LISTEN_PEER_URLS="http://10.0.1.13:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.13:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.13:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.13:2379"

ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

ETCD_LOG_LEVEL="info"

3.3. Explanation of parameters

Parameter	Meaning
ETCD_NAME	Unique name of member in cluster
ETCD_DATA_DIR	Directory to store data
ETCD_LISTEN_PEER_URLS	URL to listen for peer communication (port 2380)
ETCD_LISTEN_CLIENT_URLS	URL to listen for client connections (port 2379)
ETCD_INITIAL_ADVERTISE_PEER_URLS	URL for other peers to connect to
ETCD_ADVERTISE_CLIENT_URLS	URL for clients to connect to
ETCD_INITIAL_CLUSTER	List of all members when bootstrap
ETCD_INITIAL_CLUSTER_STATE	new (first time) or existing (add member)
ETCD_INITIAL_CLUSTER_TOKEN	Unique token for cluster (to avoid confusion)

4. Create systemd service

Create file /etc/systemd/system/etcd.service on ALL 3 nodes:

TEXT

[Unit]
Description=etcd distributed reliable key-value store
Documentation=https://etcd.io/docs/
After=network.target
Wants=network-online.target

[Service]
Type=notify
User=etcd
Group=etcd

# Load environment variables from config file
EnvironmentFile=/etc/etcd/etcd.conf

# Start etcd with config
ExecStart=/usr/local/bin/etcd

# Restart on failure
Restart=on-failure
RestartSec=5

# Limits
LimitNOFILE=65536
LimitNPROC=65536

# Security
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/etcd

[Install]
WantedBy=multi-user.target

Reload systemd and enable service:

TEXT

sudo systemctl daemon-reload
sudo systemctl enable etcd

5. Start etcd cluster

5.1. Start etcd on nodes

Important: Start SIMULTANEOUSLY or within 30 seconds for cluster to form.

Terminal 1 (node1):

TEXT

sudo systemctl start etcd
sudo systemctl status etcd

Terminal 2 (node2):

TEXT

sudo systemctl start etcd
sudo systemctl status etcd

Terminal 3 (node3):

TEXT

sudo systemctl start etcd
sudo systemctl status etcd

5.2. Check logs

TEXT

sudo journalctl -u etcd -f

Successful startup logs:

TEXT

... etcd1 became leader at term 2
... established a TCP streaming connection with peer etcd2
... established a TCP streaming connection with peer etcd3
... ready to serve client requests

6. Check etcd cluster health

6.1. Check cluster members

TEXT

# From any node
etcdctl member list

# Output:
# 8e9e05c52164694d, started, etcd1, http://10.0.1.11:2380, http://10.0.1.11:2379, false
# 91bc3c398fb3c146, started, etcd2, http://10.0.1.12:2380, http://10.0.1.12:2379, false
# fd422379fda50e48, started, etcd3, http://10.0.1.13:2380, http://10.0.1.13:2379, false

6.2. Check cluster health

TEXT

etcdctl endpoint health --cluster

# Output:
# http://10.0.1.11:2379 is healthy: successfully committed proposal: took = 2.345678ms
# http://10.0.1.12:2379 is healthy: successfully committed proposal: took = 1.234567ms
# http://10.0.1.13:2379 is healthy: successfully committed proposal: took = 2.123456ms

6.3. Check endpoint status

TEXT

etcdctl endpoint status --cluster --write-out=table

# Output:
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# |    ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 10.0.1.11:2379  | 8e9e05c52164694d |  3.5.11 |   20 kB |      true |      false |         2 |          8 |                  8 |        |
# | 10.0.1.12:2379  | 91bc3c398fb3c146 |  3.5.11 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
# | 10.0.1.13:2379  | fd422379fda50e48 |  3.5.11 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Explanation of output:

IS LEADER: etcd1 is currently the leader
RAFT TERM: Election term (increases with each election)
RAFT INDEX: Number of log entries

7. Basic etcdctl commands

7.1. Set environment (optional)

TEXT

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379

# Add to ~/.bashrc to make persistent
echo 'export ETCDCTL_API=3' >> ~/.bashrc
echo 'export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379' >> ~/.bashrc

7.2. Basic operations

Put/Get/Delete keys

TEXT

# Write a key
etcdctl put /test/key1 "Hello etcd"

# Read a key
etcdctl get /test/key1
# Output:
# /test/key1
# Hello etcd

# Get with details
etcdctl get /test/key1 --write-out=json

# Delete a key
etcdctl del /test/key1

List keys with prefix

TEXT

# Put some test keys
etcdctl put /service/postgres/test1 "value1"
etcdctl put /service/postgres/test2 "value2"

# List all keys under /service/postgres/
etcdctl get /service/postgres/ --prefix

# Output:
# /service/postgres/test1
# value1
# /service/postgres/test2
# value2

Watch for changes

TEXT

# Terminal 1: Watch for changes
etcdctl watch /service/postgres/ --prefix

# Terminal 2: Make changes
etcdctl put /service/postgres/leader "node1"

# Terminal 1 will display:
# PUT
# /service/postgres/leader
# node1

TTL keys (used for leader locks)

TEXT

# Create a lease with 30 seconds TTL
etcdctl lease grant 30
# Output: lease 7587869125995748410 granted with TTL(30s)

# Put key with lease
etcdctl put /test/ttl-key "value" --lease=7587869125995748410

# Key will be automatically deleted after 30 seconds

# Keep lease alive
etcdctl lease keep-alive 7587869125995748410

7.3. Advanced operations

Transaction (atomic operations)

TEXT

# Atomic compare-and-swap
etcdctl txn <<< '
compare:
value("/test/key1") = "old_value"

success requests:
put /test/key1 "new_value"

failure requests:
get /test/key1
'

Snapshot backup

TEXT

# Create snapshot
etcdctl snapshot save /tmp/etcd-backup.db

# Verify snapshot
etcdctl snapshot status /tmp/etcd-backup.db --write-out=table

8. Lab: Complete etcd cluster setup

8.1. Lab objectives

✅ Install etcd on 3 nodes
✅ Configure cluster
✅ Verify cluster health
✅ Test basic operations
✅ Simulate node failure

8.2. Step-by-step lab guide

1. Install etcd on all nodes

Completed in Section 2.

2. Create config files

Completed in Section 3.

3. Create systemd service

Completed in Section 4.

4. Start cluster

TEXT

# On all 3 nodes (simultaneously)
sudo systemctl start etcd

# Check status
sudo systemctl status etcd

5. Verify cluster

TEXT

# Member list
etcdctl member list

# Health check
etcdctl endpoint health --cluster

# Status
etcdctl endpoint status --cluster --write-out=table

6. Test write/read

TEXT

# On node1: Write
etcdctl put /test/mykey "Hello from etcd cluster"

# On node2: Read
etcdctl get /test/mykey
# Should see: Hello from etcd cluster

# On node3: Verify
etcdctl get /test/mykey
# Should see: Hello from etcd cluster

7. Test leader election

TEXT

# Identify current leader
etcdctl endpoint status --cluster --write-out=table
# Note which node IS LEADER = true

# Stop leader node
sudo systemctl stop etcd  # On leader node

# Wait 5-10 seconds

# Check from another node
etcdctl endpoint status --cluster --write-out=table
# New leader should be elected

# Restart stopped node
sudo systemctl start etcd  # On stopped node

# Verify rejoined
etcdctl member list

8. Test data persistence

TEXT

# Write some data
etcdctl put /persistent/key "This should survive restart"

# Restart ALL nodes (one by one)
sudo systemctl restart etcd

# Verify data
etcdctl get /persistent/key
# Should still see: This should survive restart

8.3. Troubleshooting common issues

Issue 1: Cluster won't form

TEXT

# Symptom
journalctl -u etcd -n 50
# Error: "request cluster ID mismatch"

# Solution: Clear data and restart
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/*
sudo systemctl start etcd

Issue 2: Cannot connect to etcd

TEXT

# Check if etcd is listening
sudo netstat -tlnp | grep etcd
# Should see ports 2379 and 2380

# Check firewall
sudo firewall-cmd --list-all  # CentOS/RHEL
sudo ufw status                # Ubuntu

# Add firewall rules if needed
sudo ufw allow 2379/tcp
sudo ufw allow 2380/tcp

Issue 3: Node won't join cluster

TEXT

# Check ETCD_INITIAL_CLUSTER in config
cat /etc/etcd/etcd.conf | grep INITIAL_CLUSTER

# Verify network connectivity
ping 10.0.1.11
telnet 10.0.1.11 2380

Issue 4: Split-brain or multiple leaders

TEXT

# Check cluster status
etcdctl endpoint status --cluster --write-out=table

# If multiple leaders (shouldn't happen with proper setup):
# 1. Stop all etcd instances
sudo systemctl stop etcd  # On all nodes

# 2. Clear data on all nodes
sudo rm -rf /var/lib/etcd/*

# 3. Restart cluster (bootstrap again)
# Start all nodes within 30 seconds

9. Performance tuning

9.1. etcd tuning parameters

TEXT

# Add to /etc/etcd/etcd.conf

# Heartbeat interval (default: 100ms)
ETCD_HEARTBEAT_INTERVAL="100"

# Election timeout (default: 1000ms)
ETCD_ELECTION_TIMEOUT="1000"

# Snapshot count (default: 10000)
# Compact and snapshot after this many transactions
ETCD_SNAPSHOT_COUNT="10000"

# Quota backend bytes (default: 2GB)
# Max database size
ETCD_QUOTA_BACKEND_BYTES="2147483648"

9.2. Monitoring etcd

Key metrics to monitor:

Latency (99th percentile < 50ms)
Disk fsync duration (< 10ms)
Leader changes (should be rare)
Database size
Failed proposals

Check metrics:

TEXT

curl http://10.0.1.11:2379/metrics

# Key metrics:
# etcd_server_has_leader
# etcd_server_leader_changes_seen_total
# etcd_disk_backend_commit_duration_seconds
# etcd_network_peer_round_trip_time_seconds

10. Summary

Key Takeaways

✅ etcd cluster: 3-node cluster for production HA

✅ Ports: 2379 (client), 2380 (peer)

✅ Raft consensus: Automatic leader election and data replication

✅ Quorum: Need 2/3 nodes for cluster to operate

✅ TTL keys: Used for Patroni leader locks

✅ etcdctl: CLI tool for management and troubleshooting

Checklist after Lab

etcd cluster 3 nodes running
etcdctl member list displays all 3 members
etcdctl endpoint health --cluster all healthy
1 leader and 2 followers
etcd service enabled and will auto-start on reboot
Firewall allows ports 2379 and 2380

Current Architecture

TEXT

✅ 3 VMs prepared (Lesson 4)
✅ PostgreSQL 15 installed (Lesson 5)
✅ etcd cluster running (Lesson 6)

Next: Installing Patroni and bootstrapping HA cluster

Preparation for Lesson 7

The next lesson will install Patroni and integrate with the etcd cluster already set up.

Course

PostgreSQL High Availability A-Z

Share this article

You might also like