Scaling & Monitoring
Learn how to scale your Kubernetes applications automatically and monitor them effectively using various scaling strategies and monitoring tools.
Manual Scaling
Scale Deployment
# Scale to specific number of replicas
kubectl scale deployment/myapp --replicas=5
# Scale all deployments in namespace
kubectl scale deployment --all --replicas=3 -n production
# Scale ReplicaSet
kubectl scale rs/myapp-rs --replicas=5
# Scale StatefulSet
kubectl scale statefulset/myapp-sts --replicas=3
Scale ReplicationController
kubectl scale rc/myapp-rc --replicas=5
Horizontal Pod Autoscaler (HPA)
What is HPA?
HPA automatically scales the number of Pods in a deployment, replica set, or stateful set based on observed CPU, memory utilization, or custom metrics.
Prerequisites
Install Metrics Server:
# Install metrics server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify installation
kubectl get deployment metrics-server -n kube-system
# Test metrics
kubectl top nodes
kubectl top pods
Basic HPA Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
HPA Commands
# Create HPA
kubectl apply -f hpa.yaml
# Create HPA from command line
kubectl autoscale deployment/myapp-deployment \
--min=2 \
--max=10 \
--cpu-percent=70
# Get HPA status
kubectl get hpa
# Describe HPA
kubectl describe hpa myapp-hpa
# Delete HPA
kubectl delete hpa myapp-hpa
Vertical Pod Autoscaler (VPA)
What is VPA?
VPA automatically adjusts CPU and memory requests/limits for containers in Pods based on historical and current resource usage.
VPA Example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deployment
updatePolicy:
updateMode: "Auto" # Auto, Initial, Recreate, Off
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
Cluster Autoscaler
What is Cluster Autoscaler?
Cluster Autoscaler automatically adjusts the number of nodes in your cluster when:
- Pods fail to schedule due to insufficient resources
- Nodes are underutilized for extended periods
Cluster Autoscaler Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
containers:
- name: cluster-autoscaler
image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
Monitoring Stack
Prometheus Setup
# Install Prometheus Operator
kubectl create namespace monitoring
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
# Create Prometheus instance
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 2Gi
cpu: 1000m
Grafana Setup
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secret
key: admin-password
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 500m
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: LoadBalancer
Application Metrics
Exposing Metrics in Application
# Example: Node.js application
const express = require('express');
const client = require('prom-client');
const app = express();
const register = new client.Registry();
// Collect default metrics
client.collectDefaultMetrics({ register });
// Custom metrics
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});
register.registerMetric(httpRequestDuration);
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
app.listen(8080);
ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-metrics
namespace: production
labels:
team: backend
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
Logging
Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
spec:
selector:
matchLabels:
name: fluentd
template:
metadata:
labels:
name: fluentd
spec:
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging.svc.cluster.local"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
ELK Stack Setup
# Elasticsearch
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: logging
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
env:
- name: discovery.type
value: single-node
- name: "ES_JAVA_OPTS"
value: "-Xms512m -Xmx512m"
resources:
requests:
memory: 512Mi
cpu: 500m
limits:
memory: 1Gi
cpu: 1000m
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 20Gi
Alerting
Prometheus Alertmanager
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitoring
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://alert-receiver:5001/'
---
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: main
namespace: monitoring
spec:
replicas: 2
configSecret: alertmanager-config
PrometheusRule Example
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: myapp-alerts
namespace: monitoring
spec:
groups:
- name: myapp.rules
rules:
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for 5 minutes"
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod is crash looping"
description: "Pod {{ $labels.pod }} is restarting frequently"
Observability Commands
Resource Monitoring
# Get node metrics
kubectl top nodes
# Get pod metrics
kubectl top pods
# Get pod metrics in namespace
kubectl top pods -n production
# Get pod metrics with labels
kubectl top pods -l app=myapp
# Get pod metrics sorted by CPU
kubectl top pods --sort-by=cpu
# Get pod metrics sorted by memory
kubectl top pods --sort-by=memory
Log Commands
# Get pod logs
kubectl logs <pod-name>
# Follow logs
kubectl logs -f <pod-name>
# Get logs from all containers in pod
kubectl logs <pod-name> --all-containers=true
# Get logs from specific container
kubectl logs <pod-name> -c <container-name>
# Get logs with timestamp
kubectl logs <pod-name> --timestamps
# Get logs since specific time
kubectl logs <pod-name> --since=1h
# Get logs from previous container
kubectl logs <pod-name> --previous
# Get logs from all pods with label
kubectl logs -l app=myapp --tail=100
Best Practices
Scaling Best Practices
- ✅ Set appropriate min/max replicas for HPA
- ✅ Use resource requests and limits
- ✅ Monitor and tune HPA metrics thresholds
- ✅ Consider scaling policies for stability
- ✅ Test scaling behavior under load
Monitoring Best Practices
- ✅ Monitor resource utilization (CPU, memory)
- ✅ Monitor application metrics (requests, latency)
- ✅ Set up alerting for critical issues
- ✅ Use distributed tracing for microservices
- ✅ Centralize logging with ELK or Loki
- ✅ Monitor cluster health and capacity
Next Steps
- Deployment Guide - Production deployment strategies
- Core Concepts - Deep dive into Kubernetes objects
- kubectl Cheatsheet - Quick reference