1 # Scaling Kubernetes for High Traffic
Events
2
3 When dealing with sudden spikes in
traffic, traditional auto-scaling
4 strategies often fall short. The reaction
time of HPA (Horizontal Pod
5 Autoscaler) combined with Cluster
Autoscaler can lead to a gap where
6 requests are dropped or latency
skyrockets.
7
8 ## The Problem with Reactive Scaling
9
10 The default metric server polling interval
is 15 seconds. Given that
11 it takes time for the HPA controller to
react and calculate the desired
12 replicas, you are already behind. If your
cluster lacks capacity, the
13 Cluster Autoscaler must provision new
nodes. Cloud providers can take
14 several minutes to spin up a new VM and
join it to the cluster.
15
16 ## Solutions
17
18 ### 1. Overprovisioning with Pause
Pods
19 By deploying low-priority "pause" pods
that do nothing but reserve
20 compute resources, you force the cluster
to scale up nodes in advance.
21 When real pods need to schedule, they
preempt the pause pods, instantly
22 securing resources without waiting for a
new node.
23
24 ### 2. KEDA (Kubernetes Event-driven
Autoscaling)
25 Instead of scaling on CPU/Memory, scale
directly on the event source
26 (e.g., length of an SQS queue or Kafka
lag). This allows scaling from 0
27 and reacts much faster to incoming
backlogs.
28
29 Example `ScaledObject`:
30 apiVersion: keda.sh/v1alpha1
31 kind: ScaledObject
32 metadata:
33 name: my-app-scaling
34 spec:
35 scaleTargetRef:
36 name: my-app
37 triggers:
38 - type: aws-sqs-queue
39 metadata:
40 queueURL:
https://sqs.eu-west-1.amazonaws.com/...
41 queueLength: "5"
42
43 By combining these patterns, we can
achieve high reliability even
44 during marketing pushes or sudden viral
traffic.