If you would like to implement autoscaling in your Kubernetes cluster then you are at the right place to get started, so read on.

I’ve explored the implementation of the Kubernetes object called HorizontalPodAutoscaler (HPA for short) in order to autoscale (up or down) a deployment according to the memory usage of its pods. The idea is to have a deployment with 10 replicas, but only deploy the number of pods required according to their memory usage. So, at some point during the day, we could deploy up to 10 pods when there is a lot of traffic to process but during the night, go down to 4 for example when it is quieter. I’ll only have to set a few parameters and HPA will take care of the rest based on the metrics it collects. The algorithm used for this autoscaling is described in the Kubernetes documentation here.

Metrics prerequisite

Before diving into HPA, you must ensure metrics are installed in your cluster as the HPA algorithm is using them for autoscaling. A quick check is to use the kubectl top command and observe the results:

$ kubectl top pod
NAME                       CPU(cores)   MEMORY(bytes)
busybox-5cfd866f57-4l95f   0m           0Mi
busybox-5cfd866f57-b9h45   0m           0Mi

If the output displays the CPU and MEMORY for each of your pod then metrics are installed and you can move forward. If like me you are using Minikube first for testing this, installing metrics is as easy as the command below:

$ minikube addons enable metrics-server

API autoscaling v1

Let’s move on to HPA. I’ve configured HorizontalPodAutoscaler through a yaml file to experiment with it. Below I’ll describe what you need to know to make it work. The first step is to check what is your API version for this hpa object:

$ kubectl api-resources|grep hpa
horizontalpodautoscalers          hpa                                        autoscaling/v1                          true         HorizontalPodAutoscaler

On this old cluster I’ve got, the API version of hpa is autoscaling/v1. Here is the bad news: in this version it is not possible to use pod memory metrics to do autoscaling. There is just one parameter available and it can only use the CPU metrics of the pods:

$ kubectl explain hpa.spec
KIND:     HorizontalPodAutoscaler
VERSION:  autoscaling/v1

RESOURCE: spec <Object>

DESCRIPTION:
     behaviour of autoscaler. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.

     specification of a horizontal pod autoscaler.

FIELDS:
   maxReplicas  <integer> -required-
     upper limit for the number of pods that can be set by the autoscaler;
     cannot be smaller than MinReplicas.

   minReplicas  <integer>
     lower limit for the number of pods that can be set by the autoscaler,
     default 1.

   scaleTargetRef       <Object> -required-
     reference to scaled resource; horizontal pod autoscaler will learn the
     current resource consumption and will set the desired number of pods by
     using its Scale subresource.

   targetCPUUtilizationPercentage       <integer>
     target average CPU utilization (represented as a percentage of requested
     CPU) over all the pods; if not specified the default autoscaling policy
     will be used.

In this API version, the possibilities are then limited to this parameter targetCPUUtilizationPercentage that evaluates the average CPU utilization of the pods.

Api autoscaling v2

To have more options, your cluster needs to use autoscaling/v2 for HPA (this is what a recent version of Minikube is automatically using). An hpa yaml file could then be for example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-busybox
spec:
  maxReplicas: 2
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: busybox
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 60

In this version 2 the syntax has changed from version 1 and targetCPUUtilizationPercentage has been replaced by metrics which allows a more flexible way to configure metrics.

To leverage HPA, the resources parameter has to be configured for the pod’s containers of the deployment as for example:

resources:
          limits:
            memory: "1Gi"
          requests:
            memory: "1Ki"

When the deployment is set and your hpa configuration is running, you’ll then be able to check its status:

$ kubectl get hpa
NAME          REFERENCE            TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
hpa-busybox   Deployment/busybox   16200%/60%   1         2         2          121m

The TARGETS column shows the percentage of memory used by all the pods (not relevant here as my pods are not running anything and are in a sleep state) / the average utilization we’ve configured for our hpa metrics. That works and I now have a solution to adapt and deploy in the wild. If the memory metrics were not collected properly I would have seen instead <unknown>/60%.

Add CPU Autoscaling

If you also want to use HPA with the CPU metrics of your pods, here is how to proceed. Update the hpa yaml file as follows:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-busybox
spec:
  maxReplicas: 2
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: busybox
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 60

Just add a block resources for cpu and its metric and value. You see how flexible that new api version 2 is in comparison with the unique cpu parameter in version 1.

Add in the deployment pod’s containers a cpu resource for your pods:

resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "0.5"
            memory: "1Ki"

When all is set, you’ll now have the following output:

$ kubectl get hpa
NAME          REFERENCE            TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
hpa-busybox   Deployment/busybox   16000%/60%, 0%/60%   1         2         2          6m49s

In addition to the memory, hpa is now also monitoring the CPU (0%/60%) and use both metrics to autoscale the pods of our deployment.

I hope this post will help you quickly configure HPA in your Kubernetes cluster. If you want to learn more about Kubernetes, check out our Training course given by our Kubernetes wizard!

If you need to design HPA and understand in details how the targets calculation above is done then check out my blog on this topic.