If you would like to implement autoscaling in your Kubernetes cluster then you are at the right place to get started, so read on.
I’ve explored the implementation of the Kubernetes object called HorizontalPodAutoscaler (HPA for short) in order to autoscale (up or down) a deployment according to the memory usage of its pods. The idea is to have a deployment with 10 replicas, but only deploy the number of pods required according to their memory usage. So, at some point during the day, we could deploy up to 10 pods when there is a lot of traffic to process but during the night, go down to 4 for example when it is quieter. I’ll only have to set a few parameters and HPA will take care of the rest based on the metrics it collects. The algorithm used for this autoscaling is described in the Kubernetes documentation here.
Metrics prerequisite
Before diving into HPA, you must ensure metrics are installed in your cluster as the HPA algorithm is using them for autoscaling. A quick check is to use the kubectl top command and observe the results:
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
busybox-5cfd866f57-4l95f 0m 0Mi
busybox-5cfd866f57-b9h45 0m 0Mi
If the output displays the CPU and MEMORY for each of your pod then metrics are installed and you can move forward. If like me you are using Minikube first for testing this, installing metrics is as easy as the command below:
$ minikube addons enable metrics-server
API autoscaling v1
Let’s move on to HPA. I’ve configured HorizontalPodAutoscaler through a yaml file to experiment with it. Below I’ll describe what you need to know to make it work. The first step is to check what is your API version for this hpa object:
$ kubectl api-resources|grep hpa
horizontalpodautoscalers hpa autoscaling/v1 true HorizontalPodAutoscaler
On this old cluster I’ve got, the API version of hpa is autoscaling/v1. Here is the bad news: in this version it is not possible to use pod memory metrics to do autoscaling. There is just one parameter available and it can only use the CPU metrics of the pods:
$ kubectl explain hpa.spec
KIND: HorizontalPodAutoscaler
VERSION: autoscaling/v1
RESOURCE: spec <Object>
DESCRIPTION:
behaviour of autoscaler. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
specification of a horizontal pod autoscaler.
FIELDS:
maxReplicas <integer> -required-
upper limit for the number of pods that can be set by the autoscaler;
cannot be smaller than MinReplicas.
minReplicas <integer>
lower limit for the number of pods that can be set by the autoscaler,
default 1.
scaleTargetRef <Object> -required-
reference to scaled resource; horizontal pod autoscaler will learn the
current resource consumption and will set the desired number of pods by
using its Scale subresource.
targetCPUUtilizationPercentage <integer>
target average CPU utilization (represented as a percentage of requested
CPU) over all the pods; if not specified the default autoscaling policy
will be used.
In this API version, the possibilities are then limited to this parameter targetCPUUtilizationPercentage that evaluates the average CPU utilization of the pods.
Api autoscaling v2
To have more options, your cluster needs to use autoscaling/v2 for HPA (this is what a recent version of Minikube is automatically using). An hpa yaml file could then be for example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-busybox
spec:
maxReplicas: 2
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: busybox
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
In this version 2 the syntax has changed from version 1 and targetCPUUtilizationPercentage has been replaced by metrics which allows a more flexible way to configure metrics.
To leverage HPA, the resources parameter has to be configured for the pod’s containers of the deployment as for example:
resources:
limits:
memory: "1Gi"
requests:
memory: "1Ki"
When the deployment is set and your hpa configuration is running, you’ll then be able to check its status:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-busybox Deployment/busybox 16200%/60% 1 2 2 121m
The TARGETS column shows the percentage of memory used by all the pods (not relevant here as my pods are not running anything and are in a sleep state) / the average utilization we’ve configured for our hpa metrics. That works and I now have a solution to adapt and deploy in the wild. If the memory metrics were not collected properly I would have seen instead <unknown>/60%.
Add CPU Autoscaling
If you also want to use HPA with the CPU metrics of your pods, here is how to proceed. Update the hpa yaml file as follows:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-busybox
spec:
maxReplicas: 2
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: busybox
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
Just add a block resources for cpu and its metric and value. You see how flexible that new api version 2 is in comparison with the unique cpu parameter in version 1.
Add in the deployment pod’s containers a cpu resource for your pods:
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "0.5"
memory: "1Ki"
When all is set, you’ll now have the following output:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-busybox Deployment/busybox 16000%/60%, 0%/60% 1 2 2 6m49s
In addition to the memory, hpa is now also monitoring the CPU (0%/60%) and use both metrics to autoscale the pods of our deployment.
I hope this post will help you quickly configure HPA in your Kubernetes cluster. If you want to learn more about Kubernetes, check out our Training course given by our Kubernetes wizard!
If you need to design HPA and understand in details how the targets calculation above is done then check out my blog on this topic.