Scaling & Monitoring

Learn how to scale your Kubernetes applications automatically and monitor them effectively using various scaling strategies and monitoring tools.

Manual Scaling

Scale Deployment

# Scale to specific number of replicas
kubectl scale deployment/myapp --replicas=5

# Scale all deployments in namespace
kubectl scale deployment --all --replicas=3 -n production

# Scale ReplicaSet
kubectl scale rs/myapp-rs --replicas=5

# Scale StatefulSet
kubectl scale statefulset/myapp-sts --replicas=3

Scale ReplicationController

kubectl scale rc/myapp-rc --replicas=5

Horizontal Pod Autoscaler (HPA)

What is HPA?

HPA automatically scales the number of Pods in a deployment, replica set, or stateful set based on observed CPU, memory utilization, or custom metrics.

Prerequisites

Install Metrics Server:

# Install metrics server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system

# Test metrics
kubectl top nodes
kubectl top pods

Basic HPA Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Max

HPA Commands

# Create HPA
kubectl apply -f hpa.yaml

# Create HPA from command line
kubectl autoscale deployment/myapp-deployment \
  --min=2 \
  --max=10 \
  --cpu-percent=70

# Get HPA status
kubectl get hpa

# Describe HPA
kubectl describe hpa myapp-hpa

# Delete HPA
kubectl delete hpa myapp-hpa

Vertical Pod Autoscaler (VPA)

What is VPA?

VPA automatically adjusts CPU and memory requests/limits for containers in Pods based on historical and current resource usage.

VPA Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Initial, Recreate, Off
  resourcePolicy:
    containerPolicies:
    - containerName: myapp
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

Cluster Autoscaler

What is Cluster Autoscaler?

Cluster Autoscaler automatically adjusts the number of nodes in your cluster when:

  • Pods fail to schedule due to insufficient resources
  • Nodes are underutilized for extended periods

Cluster Autoscaler Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi

Monitoring Stack

Prometheus Setup

# Install Prometheus Operator
kubectl create namespace monitoring
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml

# Create Prometheus instance
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 2
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 2Gi
      cpu: 1000m

Grafana Setup

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              name: grafana-secret
              key: admin-password
        volumeMounts:
        - name: grafana-storage
          mountPath: /var/lib/grafana
        resources:
          requests:
            memory: 256Mi
            cpu: 100m
          limits:
            memory: 512Mi
            cpu: 500m
      volumes:
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-pvc

---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
spec:
  selector:
    app: grafana
  ports:
  - port: 3000
    targetPort: 3000
  type: LoadBalancer

Application Metrics

Exposing Metrics in Application

# Example: Node.js application
const express = require('express');
const client = require('prom-client');

const app = express();
const register = new client.Registry();

// Collect default metrics
client.collectDefaultMetrics({ register });

// Custom metrics
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});

register.registerMetric(httpRequestDuration);

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(8080);

ServiceMonitor for Prometheus

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-metrics
  namespace: production
  labels:
    team: backend
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s

Logging

Fluentd DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

ELK Stack Setup

# Elasticsearch
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
        env:
        - name: discovery.type
          value: single-node
        - name: "ES_JAVA_OPTS"
          value: "-Xms512m -Xmx512m"
        resources:
          requests:
            memory: 512Mi
            cpu: 500m
          limits:
            memory: 1Gi
            cpu: 1000m
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

Alerting

Prometheus Alertmanager

apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitoring
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 12h
      receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
      webhook_configs:
      - url: 'http://alert-receiver:5001/'

---
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: main
  namespace: monitoring
spec:
  replicas: 2
  configSecret: alertmanager-config

PrometheusRule Example

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: monitoring
spec:
  groups:
  - name: myapp.rules
    rules:
    - alert: HighCPUUsage
      expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "CPU usage is above 80% for 5 minutes"
    
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod is crash looping"
        description: "Pod {{ $labels.pod }} is restarting frequently"

Observability Commands

Resource Monitoring

# Get node metrics
kubectl top nodes

# Get pod metrics
kubectl top pods

# Get pod metrics in namespace
kubectl top pods -n production

# Get pod metrics with labels
kubectl top pods -l app=myapp

# Get pod metrics sorted by CPU
kubectl top pods --sort-by=cpu

# Get pod metrics sorted by memory
kubectl top pods --sort-by=memory

Log Commands

# Get pod logs
kubectl logs <pod-name>

# Follow logs
kubectl logs -f <pod-name>

# Get logs from all containers in pod
kubectl logs <pod-name> --all-containers=true

# Get logs from specific container
kubectl logs <pod-name> -c <container-name>

# Get logs with timestamp
kubectl logs <pod-name> --timestamps

# Get logs since specific time
kubectl logs <pod-name> --since=1h

# Get logs from previous container
kubectl logs <pod-name> --previous

# Get logs from all pods with label
kubectl logs -l app=myapp --tail=100

Best Practices

Scaling Best Practices

  • ✅ Set appropriate min/max replicas for HPA
  • ✅ Use resource requests and limits
  • ✅ Monitor and tune HPA metrics thresholds
  • ✅ Consider scaling policies for stability
  • ✅ Test scaling behavior under load

Monitoring Best Practices

  • ✅ Monitor resource utilization (CPU, memory)
  • ✅ Monitor application metrics (requests, latency)
  • ✅ Set up alerting for critical issues
  • ✅ Use distributed tracing for microservices
  • ✅ Centralize logging with ELK or Loki
  • ✅ Monitor cluster health and capacity

Next Steps