-
Notifications
You must be signed in to change notification settings - Fork 16
Description
TL;DR
This issue is acknowledged and work is in progress to resolve it:
- - Create a design-document for the controller #181
- - Extended status check for reconciliation #207
- - Add a helper set library to facilitate implementation of the new health check procedure #252
- - Separate ConfigMap templating and creation #259
- - Add utility functions for new status check #260
- - Sacle UP / Scale Down functionality #265
Please read further to understand the background reasoning.
Summary
The status field of the EtcdCluster
is self-referential and is not determined by the status of an actual etcd cluster.
Background
Kubernetes controlelrs are level based. This generally means, that a controller should be able to determine its status from the state of the surrounding environment. It is generally considered a bad idea to read the status of the root object (in this case, the EtcdCluster
CR).
Remember, status should be able to be reconstituted from the state of the world, so it’s generally not a good idea to read from the status of the root object. Instead, you should reconstruct it every run.
Kubebuilder book — Implementing a controller
Controllers also do not distinguish between types of events, which trigger reconciliation (was a resource created, updated, deleted, etc). Trying to fight this restriction may cause complications further down the line. For example, implementing statuses, which describe the lifecycle of a resource and then relying on them to select a mode of operation of the controller can lead to new failure modes which need to be handled in different ways, growing the complexity of the codebase.
Issue
In its current state, the controller relies on an EtcdCluster
's status field to determine the next steps in the reconciliation logic. If the controller reads an incorrect status without actually validating it, the etcd cluster will fail to launch. A contrived example can produce this issue as follows:
First apply the following yaml
---
apiVersion: etcd.aenix.io/v1alpha1
kind: EtcdCluster
metadata:
name: test
namespace: default
spec:
replicas: 3
podTemplate:
spec:
containers:
- name: etcd
image: "nginx"
args: ["-c", "sleep 3600"]
command: ["/bin/sh"]
livenessProbe:
httpGet:
host: ifconfig.me
path: /
port: 80
scheme: HTTP
readinessProbe:
httpGet:
host: ifconfig.me
path: /
port: 80
scheme: HTTP
startupProbe:
httpGet:
host: ifconfig.me
path: /
port: 80
scheme: HTTP
This object will generate a statefulset which will launch and pass all readiness checks, but will not in fact build a working cluster, instead three containers will simply run sleep 3600
. The cluster-state configmap will set the value of ETCD_INITIAL_CLUSTER_STATE
to existing
.
Next, attempt to fix the problem, by applying a correct manifest
---
apiVersion: etcd.aenix.io/v1alpha1
kind: EtcdCluster
metadata:
name: test
namespace: default
spec:
replicas: 3
The statefulset's podspec will be updated to correct values to launch an etcd cluster, but since the initial cluster state is erroneously set to "existing", a cluster will not be bootstrapped.
Analysis
In my opinion, the root cause of this problem is that no attempt is made to verify the "actual" status of the etcd cluster. Although at the moment the logic of the controller is quite simple and one must craft rather contrived bad inputs to break it, the examples do show that a failed transaction here or there, such as a dropped packet, a crashed node, etc, can leave an EtcdCluster
(the custom resource) in an unrecoverable state.
Since implementing communication with the etcd cluster will be necessary in any case, for instance, to implement scaling up and down, I think it is prudent to start work on such a feature now and use it to determine the status of the cluster.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status