In our previous post, we tackled how to build a scalable Kubernetes cluster on ARM VPS servers. In this post, we will tackle the reason we use Kubernetes at all: Scalability.
Scaling Workloads with Horizontal Pod Autoscaler (HPA)
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a deployment based on observed CPU or memory utilization. This prevents overloading pods during peak traffic and reduces costs during low demand.
📌 What HPA Does:
- Monitors resource usage (CPU/memory) of running pods.
- Increases or decreases replicas based on predefined thresholds.
- Works at the Deployment level, adjusting only the number of pods (not node count).
Installing the Metrics Server
When building a Kubernetes cluster, the Metrics Server is typically installed on the control plane nodes, but it doesn’t run as a traditional application on a specific node. Instead, it is deployed as a cluster-wide resource using a Deployment or similar manifest, and it runs as a pod managed by Kubernetes itself. The Metrics Server collects resource usage data (like CPU and memory) from the kubelets on all nodes and aggregates it for use via the Kubernetes API.
Here’s a breakdown of where and how it fits:
The Kubernetes API server must be running and reachable, as Metrics Server registers itself with the API to provide metrics.
1. Deployment Location:
- You don’t manually “install” the Metrics Server on a specific node. Instead, you apply its configuration (usually a YAML file) to the cluster using kubectl. Kubernetes then schedules the Metrics Server pod(s) to run on one or more nodes based on resource availability and scheduling rules.
- By default, it’s deployed as a single pod in the kube-system namespace, and Kubernetes decides where to place it (often on a worker node, but this is managed automatically).
2. Prerequisites:
- The Metrics Server relies on the kubelet running on each node (both control plane and worker nodes) to gather metrics.Ensure the kubelet is properly configured and accessible.
- The Kubernetes API server must be running and reachable, as Metrics Server registers itself with the API to provide metrics.
3. How to Install It:
The easiest way to install the Metrics Server is by applying the official YAML manifest from the Kubernetes project. For example:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
This deploys the Metrics Server into the kube-system namespace. You can verify it’s running with:
kubectl get pods -n kube-system | grep metrics-server
Test whether metrics are being collected:
kubectl top nodes
kubectl top pods
If metrics are returned, the server is functioning correctly.
Deploying an Application with Resource Limits
HPA only works if pods have resource requests and limits. We’ll deploy an Nginx application with CPU constraints.
Create a file nginx-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "250m"
memory: "256Mi"
Now you can deploy it:
kubectl apply -f nginx-deployment.yaml
Configuring HPA for Autoscaling
Now, set up HPA for Nginx:
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10
Check the HPA status to see if it is active:
kubectl get hpa
Simulating Load to Trigger Autoscaling
To verify autoscaling, create an artificial CPU load using a busybox pod:
kubectl run -it --rm load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://nginx-deployment.default.svc.cluster.local; done"
This command creates a temporary pod in your Kubernetes cluster that generates continuous HTTP requests to a service named nginx-deployment in the default namespace. Here’s how it works:
- kubectl run:
- This is a command-line tool used to run a pod in a Kubernetes cluster. While kubectl run was historically used for creating deployments, in newer versions of Kubernetes (post-1.18), it’s primarily used to create a single pod.
- -it:
- The -i (interactive) and -t (tty) flags together allocate a terminal session to the pod, allowing you to interact with it if needed. In this case, since the command runs a non-interactive loop, you won’t see much output unless something goes wrong (and you’re attached to the session).
- –rm:
- This flag ensures the pod is automatically deleted as soon as it stops running or you exit the session. It’s useful for temporary, ephemeral tasks like this load generation, so no cleanup is needed.
- load-generator:
- This is the name assigned to the pod. It’s a descriptive name indicating its purpose: generating load.
- –image=busybox:
- Specifies the container image to use for the pod. busybox is a lightweight image that includes a minimal set of Unix utilities, including wget and /bin/sh, which are needed for the command.
- — /bin/sh -c “while true; do wget -q -O- http://nginx-deployment.default.svc.cluster.local; done”:
- This is the command that the container executes when it starts.
- /bin/sh -c: Runs a shell (sh) and executes the string that follows as a shell command.
- “while true; do …; done”: An infinite loop that keeps running indefinitely until the pod is stopped or deleted.
- wget -q -O- http://nginx-deployment.default.svc.cluster.local:
- wget: A command-line tool to fetch content over HTTP.
- -q: Quiet mode, suppressing wget’s output (so it doesn’t clutter logs or the terminal).
- -O-: Sends the output to stdout (standard output) rather than saving it to a file, effectively discarding it in this case.
- http://nginx-deployment.default.svc.cluster.local: The target URL, which is a Kubernetes service named nginx-deployment in the default namespace. This assumes there’s an nginx-deployment service running in your cluster that exposes an HTTP endpoint.
Overall Effect:
- The command launches a single pod named load-generator using the busybox image.
- Inside the pod, it runs an infinite loop that continuously sends HTTP GET requests to the nginx-deployment service (assumed to be an Nginx web server or similar) in the default namespace.
- The requests are silent (-q) and their responses are discarded (-O-), so the pod’s sole purpose is to generate load on the target service.
- Because of –rm, the pod is automatically deleted when it stops (e.g., if you Ctrl+C the terminal or the pod crashes)
We can now monitor pod scaling with these commands:
kubectl get pods
kubectl get hpa
Once your test is complete, you can use Ctrl-C to stop the test.
Why Ctrl+C Works:
-it Flags:
- The -i (interactive) and -t (tty) flags attach your terminal to the pod’s stdin and stdout. This means when the pod starts, your terminal is directly connected to the /bin/sh process running the while true loop.
- When you press Ctrl+C in an interactive terminal session, it sends a SIGINT (interrupt signal) to the process attached to the terminal—in this case, the sh shell running the loop.
–rm Flag:
- The –rm flag ensures the pod is automatically deleted when its process exits. So, when Ctrl+C interrupts the shell process, the pod terminates, and Kubernetes cleans it up immediately.

Load Balancing with MetalLB
Kubernetes services use ClusterIP, which is internal. To expose services externally to the internet, we need a LoadBalancer.
📌 Why MetalLB?
- Enables external IP assignment to Kubernetes services.
- Provides Layer 2 (ARP-based) or BGP routing.
Installing MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml
This command deploys MetalLB, a load-balancer implementation for bare-metal Kubernetes clusters, into your cluster. It applies a predefined YAML manifest directly from the MetalLB GitHub repository, specifically version v0.13.7, using the “native” installation method. Here’s a detailed explanation:
- kubectl apply:
- This command is used to create or update resources in a Kubernetes cluster based on a configuration file (in YAML or JSON format). If the resources don’t exist, they’re created; if they do, they’re updated to match the desired state.
- -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml:
- The -f flag specifies the file containing the resource definitions. In this case, it’s a URL pointing to a raw YAML file hosted on GitHub.
- The URL (https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml) is from the MetalLB project’s official repository, specifically version v0.13.7, and it provides the “native” manifest for installing MetalLB.
- What is MetalLB?:
- MetalLB is an open-source project that provides a LoadBalancer service type implementation for Kubernetes clusters running on bare-metal hardware (or environments where cloud-provider load balancers, like AWS ELB or GCP Load Balancer, aren’t available).
- In cloud-hosted Kubernetes clusters (e.g., EKS, GKE), the LoadBalancer service type automatically provisions an external load balancer. On bare-metal clusters, Kubernetes doesn’t have a native way to do this—MetalLB fills that gap.
- What the Manifest Does: The metallb-native.yaml file contains a collection of Kubernetes resources that set up MetalLB. When you apply it, it typically includes:
- Namespace: Creates a metallb-system namespace (if not already present) to house MetalLB components.
- Deployments/DaemonSets: Deploys the MetalLB controller (a single pod) and speaker pods (one per node via a DaemonSet).
- Controller: Manages IP address assignment for LoadBalancer services.
- Speaker: Announces the assigned IPs to the network (via ARP or BGP).
- RBAC Resources: Sets up Roles, RoleBindings, ClusterRoles, and ClusterRoleBindings to grant MetalLB the necessary permissions (e.g., to watch services and update their status).
- Service Accounts: Creates service accounts for the controller and speaker components.
- ConfigMap/Secret (optional): While the base manifest sets up the components, it doesn’t configure MetalLB’s behavior (e.g., IP pools or protocol). That requires a separate configuration step.
- Outcome:
- After running this command, MetalLB is installed in your cluster in the metallb-system namespace.
- It doesn’t immediately assign IPs to services—you need to configure MetalLB with an additional resource (like a ConfigMap or IPAddressPool) to define the IP range it can use and the protocol (Layer 2 with ARP or BGP).
Assigning an IP Pool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: my-ip-pool
namespace: metallb-system
spec:
addresses:
- 192.168.1.100-192.168.1.200
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: my-l2-advertisement
namespace: metallb-system
Apply the configuration:
kubectl apply -f metallb-config.yaml
Expose an application via LoadBalancer:
kubectl expose deployment nginx-deployment --type=LoadBalancer --name=nginx-service
We can now check the assigned IP for the loadbalancer:
kubectl get svc nginx-service
This will give you the assigned IP to use for your Nginx application, to ensure it runs through the loadbalancer and auto-scales as required.
Conclusion
By implementing autoscaling and load balancing, your Kubernetes cluster is now:
âś” Capable of handling high traffic automatically
âś” Efficient in resource allocation, reducing costs
âś” Optimized for external access with MetalLB
With this setup, your ARM-based Kubernetes cluster is ready for real-world production workloads!
Power Your Projects with vpszen.com VPS Solutions
Looking for reliable hosting to run your Linux servers and host your next big project? VpsZen.com has you covered with top-tier VPS options tailored to your needs.
Choose from ARM64 VPS Servers for energy-efficient performance, or Root VPS Servers for virtual servers with dedicated resources.