Justin's Linklog

workaround for istio's graceful-shutdown lifecycle bug

The istio Kubernetes service mesh operates using a "sidecar" container, but due to an incomplete spec on the k8s side, it's liable to cause problems when shutting down or terminating a pod. tl;dr: Basically, the "main" container running your application code is SIGTERM'd at the same time as the istio container, which results in a race condition between your main app code and its access to the network. Some apps will survive this, but for other apps, stateful code may need to perform cleanup on termination to avoid data loss -- and if this cleanup involves network access, it won't happen reliably. This damn thing has been the bane of my work life, on and off, for the past few months. Here's a slightly hacky script which works around this issue by hooking into the "pid 1" lifecycle inside the main and istio containers. Blech.

(tags: istio fail bugs k8s sidecars work service-meshes)

Archives