Kubernetes Vertical Pod Autoscaler


I had the opportunity to start working with Vertical Pod Autoscaler recently. Although its implementation is still in beta, the idea behind itm and what it does (and plans to do), are amazing and useful for better managing container and cluster resources.

 

A brief view of what VPA does:

VPA has 3 components:

  • the admission-controller which handles deploying new pods
  • the updater which decides when to create new pods
  • the recommender which recommends what resources should a pod’s containers request

There are various other posts out there about how VPA works exactly, and I suggest you read those, as well as any resources in the Github repository. It is essential that you understand what triggers a scaling event, either for memory or for cpu.

For example, take these lines from the design proposal:

Recommendation model

For CPU the objective is to keep the fraction of time when the container usage exceeds a high percentage (e.g. 95%) of request below a certain threshold (e.g. 1% of time). In this model the “CPU usage” is defined as mean usage measured over a short interval. The shorter the measurement interval, the better the quality of recommendations for spiky, latency sensitive workloads. Minimum reasonable resolution is 1/min, recommended is 1/sec.

For memory the objective is to keep the probability of the container usage exceeding the request in a specific time window below a certain threshold (e.g. below 1% in 24h). The window must be long (≥ 24h) to ensure that evictions caused by OOM do not visibly affect (a) availability of serving applications (b) progress of batch computations (a more advanced model could allow user to specify SLO to control this).

Make sure you read these enough times until you understand what’s proposed here, and why it suits the idea of vertical scaling in Kubernetes.

Keep in mind that not everything is implemented already, so you may run into some walls, and also encounter bugs. For instance, I’ve had a few issues with OOMKilled events - the recommender does not take them into account as it is supposed to do, which results in the pod being stuck in CrashLoopBackOff with continuous OOM events. Despite this, I’ve went ahead and made a few adjustments to minAllowed and maxAllowed values for memory recommendation on my VPA objects, which one would argue that is pretty-much the idea behind VPA, and have set the Update Mode to “Auto”, since I have confidence in its abilities.

 

Use a Grafana dashboard:

Also, I highly recommend running any VPA objects you want to create in "Off" Update Mode for a few days, so that you clearly see and further understand how exactly VPA is reacting to your workload. You can also export the logs from the 3 components via EFK/ELK, to better keep track of the proposed scaling events and any issue that they may face. I recommend using a Grafana dashboard for that.

With the dashboard, you can quickly check out what is currently happening in your VPA objects or drill down on some past scaling event. I created this dashboard for my own needs, as there was nothing already out there, and I will keep it up-to-date with future VPA releases.

Just remember to enable collecting kube-state-metrics for verticalpodautoscalers, as it is not On by default.