Andrzej Kawula

1 # Scaling Kubernetes for High Traffic Events

3 When dealing with sudden spikes in traffic, traditional auto-scaling

4 strategies often fall short. The reaction time of HPA (Horizontal Pod

5 Autoscaler) combined with Cluster Autoscaler can lead to a gap where

6 requests are dropped or latency skyrockets.

8 ## The Problem with Reactive Scaling

10 The default metric server polling interval is 15 seconds. Given that

11 it takes time for the HPA controller to react and calculate the desired

12 replicas, you are already behind. If your cluster lacks capacity, the

13 Cluster Autoscaler must provision new nodes. Cloud providers can take

14 several minutes to spin up a new VM and join it to the cluster.

16 ## Solutions

18 ### 1. Overprovisioning with Pause Pods

19 By deploying low-priority "pause" pods that do nothing but reserve

20 compute resources, you force the cluster to scale up nodes in advance.

21 When real pods need to schedule, they preempt the pause pods, instantly

22 securing resources without waiting for a new node.

24 ### 2. KEDA (Kubernetes Event-driven Autoscaling)

25 Instead of scaling on CPU/Memory, scale directly on the event source

26 (e.g., length of an SQS queue or Kafka lag). This allows scaling from 0

27 and reacts much faster to incoming backlogs.

29 Example `ScaledObject`:

30 apiVersion: keda.sh/v1alpha1

31 kind: ScaledObject

32 metadata:

33 name: my-app-scaling

34 spec:

35 scaleTargetRef:

36 name: my-app

37 triggers:

38 - type: aws-sqs-queue

39 metadata:

40 queueURL: https://sqs.eu-west-1.amazonaws.com/...

41 queueLength: "5"

43 By combining these patterns, we can achieve high reliability even

44 during marketing pushes or sudden viral traffic.