Kubernetes application availability and recoverability is the result of a complex interplay between several complex systems – the application itself, network management subsystems, docker, Kubernetes, infrastructure platform (cloud or vSphere), and Kublr as a Kubernetes and infrastructure management and automation platform.
While many of the capabilities in this area are provided out-of-the box by Kublr and Kubernetes, ensuring that the whole stack is reliable requires attention to the application and integration level.
The following are failure scenarios.
There are two reasons why Kubelet may become unreachable while the physical node is still reachable:
In the first case, the Kublr agent will restart kubelet service and the node will recover. Application containers will not be affected.
The second case is not distinguishable (from a Kubernetes standpoint) from the situation when the node becomes unreachable, and it will be handled in the same manner.
When a Node (whether master or worker) is completely unreachable on the network (while still available in vSphere inventory), Kubernetes will change its status to UNKNOWN, and all Pods running on the node will change status to Ready=false.
Recovery operations for this situation are normally application specific, but here is a possible “cover-all worst case” scenario:
force-detach volumes from the node via vSphere API
force-delete pods on the failed node via Kubernetes API or kubectl
Now the pods will be automatically recovered on other nodes by Kubernetes.
This does not happen automatically out-of-the box because the node unavailability does not necessarily mean application failure.
The decision on whether to try to recover/reconnect the node or try to reinitialize it can only be made using higher-level knowledge of the system – by either administrator or operation automation scripts. Kublr/Kubernetes support both scenarios – see items 7 and 8 below.
Kublr Agent does not have to be reachable to function correctly.
It is designed to maintain and recover a cluster instance relying on locally stored configuration, even when KCP is not reachable or down.
Handling of this situation significantly depends on the application container deployment configuration.
The very first effect of storage being unreachable will likely be the failure of the application in a container using this store.
If the application/container is configured to report the failure to Kubernetes (e.g. via Kubernetes health and readiness checks), Kubernetes will mark the Pod as failed and will terminate it. This causes resource cleanup, in particular, storage detachment.
After the pod is terminated, Kubernetes (or more specifically Deployment, StatefulSet or another controller used for the application) will try to restart the Pod and reschedule it to the same or a different node according to the application manifest, resource availability, and many other factors Kubernetes uses for pod scheduling.
As a part of the scheduling process, Kubernetes will try to reattach the store to a node. If this is not possible or fails, it will continuously retry until the issue with the store is fixed, or the application configuration is updated.
This is a general scenario, which may vary depending on how the storage is represented in Kubernetes metadata – e.g. whether the storage is dynamically allocated or not, what retention policy is used on PV and PVC (persistent volumes and persistent volume claims), whether Stateful Set or Deployment is used for the application, etc.
Kubernetes will identify the Pod as terminated, and the application controller (Deployment or StatefulSet) will try to restart/reschedule the pod.
The recovery process is the same as described in the item 4.
The application controller (Deployment or StatefulSet) will try to restart/reschedule the pod.
The recovery process is the same as described in the item 4.
When the node is restarted, Kublr agent will reconfigure and restart Kubernetes components on that node, and the node will reconnect to the cluster.
From Kubernetes standpoint, this means that the node will stop responding for some time and will switch to “UNKNOWN” state temporarily, and them back to the “Ready” state.
Pods running on the node will go to the “unknown” state (while the node is in “Unknown” state). Once the node reconnects, the pods will be identified as “Terminated” and Kubernetes will proceed to recover them (including reattaching the volumes).
In most cases the pods will be recovered on the same node.
When a node is irrecoverably corrupt, Kubernetes first handles it the same way as an unreachable node.
If the administrator (or operations automation script) decides that the node will not be able to recover, the corresponding VM may be removed from vSphere and Kubernetes will identify it as deleted and remove it from its database.
Pods registered for that node will be removed as well and Kubernetes will proceed to restart them on other nodes as usual (including storage volume reconnection).
Kublr will recreate the VM after it is removed from vSphere inventory so that the note can reinitialize and re-register in the Kubernetes cluster.