Testing of Etcd Failure
An odd number of nodes establishes an high-availability etcd cluster. When the different number of nodes starts to fail, what is the behavior of the cluster? It’s well known when the quorum is lost, that is (n/2+1) nodes fails, the cluster is not able to receive any request of write based on the raft consensus algorithm. Can the cluster still serve the request of read?
Testing is the best answer. Given the tools like Multipass, set up an etcd cluster is no longer a luxury task on a laptop.
Setup a 5 nodes etcd cluster
I am not going to test it on the K3s with etcd operator as I need to start/stop the etcd node manually for testing purpose. Let's create 5 VMs.
multipass launch --name etcd0 --cpus 1 --mem 1G --disk 5G
multipass launch --name etcd1 --cpus 1 --mem 1G --disk 5G
multipass launch --name etcd2 --cpus 1 --mem 1G --disk 5G
multipass launch --name etcd3 --cpus 1 --mem 1G --disk 5G
multipass launch --name etcd4 --cpus 1 --mem 1G --disk 5G
Then for each of the VM, download and install the etcd,
curl -LO https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gztar zxvf etcd-v3.3.13-linux-amd64.tar.gz
cd etcd-v3.3.13-linux-amd64
sudo cp etcd etcdctl /usr/local/bin/
Create a systemd service file as below,
[Unit]
Description=etcd[Service]
ExecStart=/usr/local/bin/etcd \
--name {{ .name }} \
--data-dir /var/lib/etcd \
--initial-advertise-peer-urls http://{{ .ip }}:2380 \
--listen-peer-urls http://{{ .ip }}:2380 \
--listen-client-urls http://{{ .ip }}:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://{{ .ip }}:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster {{ .members }} \
--initial-cluster-state newRestart=on-failure
RestartSec=5[Install]
WantedBy=multi-user.target
Replace the IP address and name accordingly based on the nodes. (I have automated this as a Magefile task)
The {{ .members }} is a list of the etcd cluster members such as etcd0=http://192.168.64.8:2380,etcd1=http://192.168.64.9:2380,etcd2=http://192.168.64.10:2380,etcd3=http://192.168.64.11:2380,etcd4=http://192.168.64.12:2380
Start the service
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
Check the etcd status
multipass shell etcd0multipass@etcd0:~$ export ETCDCTL_API=3 && export ETCDCTL_ENDPOINTS=http://192.168.64.8:2379,http://192.168.64.9:2379,http://192.168.64.10:2379,http://192.168.64.11:2379,http://192.168.64.12:2379
multipass@etcd0:~$ etcdctl endpoint status -w table
The result is shown below. 5 member cluster is healthy with the 1st node as the leader.
Testing node failure
Before we start the testing, let's add a key into the etcd
etcdctl put /clock "$(date)"etcdctl get /clock
/clock
Sun May 12 21:38:58 +08 2019
Stop the etcd on the last node, etcd4,
multipass exec etcd4 — sudo systemctl stop etcd
Check the status, the leader is still there.
Continue to stop the etcd service on node etcd3 and etcd2.
When there are only 2 nodes left, the quorum is lost, no leader presents. Then the write action fails as expected.
multipass@etcd0:~$ etcdctl put /clock "$(date)"
Error: context deadline exceededmultipass@etcd0:~$ etcdctl get /clock
Error: context deadline exceeded
The read activity could not proceed either.
Linearizable Read of etcd Raft
Based on the Raft documentation, all the read activities are linearizable, and the implementations all interact with the leader to make sure the data retrieved are the most recent.
Therefore, when the quorum is lost, both read and write activity in etcd cannot be performed properly.