HorizontalPodAutoscaler

A HPA controls the scale of a Deployment and its ReplicaSet

  • The HPA will also instruct the workload resource to scale back down when load decreases.

A HPA automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.

Horizontal scaling here means that the strategy is to deploy more pods (scale out), rather than adding more computational resources like CPU and memory to existing pods.

How it works

The HPA runs within the Control Plane (master node)

HPA is implemented as a control loop that runs intermittently (ie. it's not a continuous process).

  • default is every 15s

Once each period, the controller manager queries for information about how resources are being utilized, and compares it against the metric specified in the HPA spec definition.

  • the target resource is found by scaleTargetRef.
  • it then selects the pods based on the target's resources .spec.selector labels and obtains the metrics.

Example:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: $APP_NAME
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: $APP_NAME
  minReplicas: 2
  maxReplicas: 4
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 100