Secrets Management & PKI

Secrets management and PKI are foundational platform capabilities. This page covers HashiCorp Vault (deep production deployment), Kubernetes-native certificate management with cert-manager, internal PKI with Vault's PKI engine, and TLS best practices.

HashiCorp Vault — Architecture

Vault is a secrets management and data protection platform. It provides a unified interface for secret storage, dynamic credentials, encryption-as-a-service, and PKI — all with fine-grained access control and full audit logging.

Core Components

Storage Backends

Vault persists encrypted data to a backend. Only Vault can decrypt it — the backend sees ciphertext only.

  • Integrated Raft (recommended) — built-in consensus storage, no external dependency, supports HA natively
  • Consul — HashiCorp Consul for storage, requires running Consul cluster
  • DynamoDB — AWS DynamoDB for cloud-native storage (does not support HA leader election natively)
  • GCS / S3 — object storage backends, single-node only (no HA)

Auth Methods

How clients prove their identity to Vault. Vault validates credentials via the auth method and issues a token with associated policies. Methods: Kubernetes, AWS IAM, GCP IAM, AppRole, LDAP, OIDC, GitHub, TLS certificates, Userpass.

Secret Engines

Plugins that store, generate, or encrypt data. Mounted at paths within Vault. Types: KV (v1/v2), database (dynamic credentials), PKI, AWS, GCP, Azure, SSH, Transit (encryption-as-a-service), TOTP.

Audit Devices

Every Vault operation — including failed auth attempts — is written to all enabled audit devices before the operation completes. If no audit device can be written to, Vault blocks the operation. Types: file, syslog, socket.

High Availability with Integrated Raft

Raft integrated storage provides HA without an external dependency. One Vault node is the active leader; remaining nodes are standbys. Standby nodes can be configured as performance standbys (read requests handled locally, reducing leader load) or regular standbys (forward all reads to the leader).

  • Leader election: Raft consensus with write quorum (N/2 + 1 nodes)
  • Disaster Recovery replication: replicates all data to a secondary cluster (separate region) for disaster recovery — secondary is read-only warm standby
  • Performance replication: replicates data to secondary clusters that serve read traffic, reducing latency for geographically distributed clients

Vault on Kubernetes — Helm Install

Production HA Values

# vault-values.yaml — production HA with Raft and AWS KMS auto-unseal
global:
  enabled: true
  tlsDisable: false

injector:
  enabled: true
  replicas: 2
  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 256Mi
      cpu: 250m

server:
  enabled: true
  image:
    repository: hashicorp/vault
    tag: "1.17.0"

  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 512Mi
      cpu: 500m

  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: vault
              app.kubernetes.io/instance: vault
              component: server
          topologyKey: kubernetes.io/hostname

  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-ha-tls/vault.ca
    VAULT_TLSCERT: /vault/userconfig/vault-ha-tls/vault.crt
    VAULT_TLSKEY: /vault/userconfig/vault-ha-tls/vault.key

  volumes:
    - name: userconfig-vault-ha-tls
      secret:
        secretName: vault-ha-tls

  volumeMounts:
    - mountPath: /vault/userconfig/vault-ha-tls
      name: userconfig-vault-ha-tls
      readOnly: true

  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      setNodeId: true
      config: |
        ui = true
        cluster_name = "vault-production"

        listener "tcp" {
          tls_disable = 0
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-ha-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"
        }

        storage "raft" {
          path    = "/vault/data"
          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
          }
        }

        # AWS KMS auto-unseal
        seal "awskms" {
          region     = "us-east-1"
          kms_key_id = "alias/vault-unseal-key"
        }

        service_registration "kubernetes" {}

ui:
  enabled: true
  serviceType: ClusterIP
# Install Vault via Helm
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

helm install vault hashicorp/vault \
  --namespace vault \
  --create-namespace \
  -f vault-values.yaml

# Initialize Vault (first time only — outputs root token + unseal keys)
kubectl exec -n vault vault-0 -- vault operator init \
  -key-shares=5 \
  -key-threshold=3 \
  -format=json > vault-init.json

# With auto-unseal via AWS KMS, initialization only generates recovery keys
# Manual unseal is not needed after pod restarts

# Join Raft peers
kubectl exec -n vault vault-1 -- vault operator raft join https://vault-0.vault-internal:8200
kubectl exec -n vault vault-2 -- vault operator raft join https://vault-0.vault-internal:8200

# Verify cluster status
kubectl exec -n vault vault-0 -- vault operator raft list-peers

Auto-Unseal — GCP Cloud KMS

# GCP Cloud KMS auto-unseal config in Vault HCL
seal "gcpckms" {
  credentials = "/vault/userconfig/gcp-creds/credentials.json"
  project     = "my-project"
  region      = "global"
  key_ring    = "vault-keyring"
  crypto_key  = "vault-unseal-key"
}

Auth Methods — Deep Dive

Kubernetes Auth

The most common auth method for workloads running in Kubernetes. Vault validates the pod's service account JWT against the Kubernetes API.

# Enable Kubernetes auth
vault auth enable kubernetes

# Configure Kubernetes auth
vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  token_reviewer_jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token \
  issuer="https://kubernetes.default.svc.cluster.local"

# Create a role binding a service account to a Vault policy
vault write auth/kubernetes/role/payment-service \
  bound_service_account_names="payment-service" \
  bound_service_account_namespaces="production" \
  policies="payment-service-policy" \
  ttl="1h"

# The application authenticates:
JWT=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
vault write auth/kubernetes/login \
  role="payment-service" \
  jwt="${JWT}"

AppRole Auth

AppRole is designed for machine-to-machine auth where Kubernetes auth is not available (CI systems, VMs, external services). It uses two credentials: a role_id (non-secret, like a username) and a secret_id (secret, like a password):

# Enable AppRole
vault auth enable approle

# Create role
vault write auth/approle/role/ci-pipeline \
  secret_id_ttl="10m" \
  token_num_uses="1" \
  token_ttl="20m" \
  token_max_ttl="30m" \
  secret_id_num_uses="1" \
  policies="ci-pipeline-policy"

# Fetch role_id (can be stored in CI config — it's not secret)
vault read auth/approle/role/ci-pipeline/role-id

# Generate a response-wrapped secret_id (the wrapper token expires in 120s)
# Even the delivery mechanism never sees the actual secret_id
vault write -wrap-ttl=120s -f auth/approle/role/ci-pipeline/secret-id

# CI system unwraps to get the actual secret_id
vault unwrap <wrapping-token>

# Login
vault write auth/approle/login \
  role_id="<role-id>" \
  secret_id="<secret-id>"

AWS IAM Auth

# Enable AWS auth
vault auth enable aws

# Configure AWS auth (Vault calls AWS STS to verify caller identity)
vault write auth/aws/config/client \
  access_key="<aws-access-key>" \
  secret_key="<aws-secret-key>" \
  region="us-east-1"

# Bind an IAM role ARN to a Vault policy
vault write auth/aws/role/production-ec2 \
  auth_type="iam" \
  bound_iam_principal_arn="arn:aws:iam::123456789:role/production-app-role" \
  policies="production-app-policy" \
  ttl="1h"

Secret Engines

KV v2 — Versioned Key-Value

# Enable KV v2 at path "secret/"
vault secrets enable -path=secret kv-v2

# Write a secret
vault kv put secret/production/payment-service \
  db_password="super-secret-password" \
  api_key="stripe-live-key-abc123"

# Read a secret
vault kv get secret/production/payment-service

# Read only specific fields
vault kv get -field=db_password secret/production/payment-service

# Read a specific version
vault kv get -version=3 secret/production/payment-service

# List all secrets at a path
vault kv list secret/production/

# Soft-delete version (data recoverable)
vault kv delete secret/production/payment-service

# Destroy versions permanently
vault kv destroy -versions=1,2 secret/production/payment-service

# Check-and-set (CAS) — only write if current version matches
vault kv put -cas=4 secret/production/payment-service db_password="new-password"

# Read metadata (versions, deletion times, custom metadata)
vault kv metadata get secret/production/payment-service

Dynamic Secrets — PostgreSQL

Dynamic credentials are generated on-demand and expire automatically, eliminating long-lived database passwords:

# Enable database secret engine
vault secrets enable database

# Configure PostgreSQL connection
vault write database/config/production-postgres \
  plugin_name="postgresql-database-plugin" \
  allowed_roles="payment-service,order-service" \
  connection_url="postgresql://{{username}}:{{password}}@postgres.production:5432/appdb?sslmode=require" \
  username="vault-admin" \
  password="vault-admin-password" \
  password_authentication="scram-sha-256"

# Create a role (Vault will execute this SQL when generating credentials)
vault write database/roles/payment-service \
  db_name="production-postgres" \
  creation_statements="
    CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';
    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA payments TO \"{{name}}\";
    GRANT USAGE ON ALL SEQUENCES IN SCHEMA payments TO \"{{name}}\";
  " \
  default_ttl="1h" \
  max_ttl="24h"

# Request credentials
vault read database/creds/payment-service
# Returns: username=v-kubernetes-payment-abc123, password=A1b2C3d4..., lease_duration=1h

# Revoke credentials early (e.g., on incident)
vault lease revoke database/creds/payment-service/<lease-id>

Dynamic Secrets — AWS

# Enable AWS secret engine
vault secrets enable aws

# Configure with credentials that can create/delete IAM users or assume roles
vault write aws/config/root \
  access_key="AKIAIOSFODNN7EXAMPLE" \
  secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  region="us-east-1"

# Role using IAM policy inline (creates a temp IAM user)
vault write aws/roles/s3-backup-writer \
  credential_type="iam_user" \
  policy_document='{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Action": ["s3:PutObject","s3:GetObject"],
      "Resource": "arn:aws:s3:::my-backup-bucket/*"
    }]
  }'

# Role using assumed_role (preferred — no IAM user created)
vault write aws/roles/ec2-deployer \
  credential_type="assumed_role" \
  role_arns="arn:aws:iam::123456789:role/DeployerRole" \
  default_ttl="15m" \
  max_ttl="1h"

# Request temporary AWS credentials
vault read aws/creds/s3-backup-writer

Vault PKI Engine — Internal CA

Full CA Setup Flow

# Step 1: Enable PKI engine for root CA (long TTL — offline, rarely rotated)
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki    # 10 years

# Step 2: Generate root CA (keep the root key inside Vault — never export it)
vault write -field=certificate pki/root/generate/internal \
  common_name="Acme Corp Root CA" \
  organization="Acme Corp" \
  country="US" \
  ttl="87600h" \
  key_type="rsa" \
  key_bits=4096 \
  > root-ca.crt

# Step 3: Configure CA URLs (used in issued certificates)
vault write pki/config/urls \
  issuing_certificates="https://vault.internal:8200/v1/pki/ca" \
  crl_distribution_points="https://vault.internal:8200/v1/pki/crl" \
  ocsp_servers="https://vault.internal:8200/v1/pki/ocsp"

# Step 4: Enable intermediate CA engine
vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=43800h pki_int   # 5 years

# Step 5: Generate intermediate CSR
vault write -format=json pki_int/intermediate/generate/internal \
  common_name="Acme Corp Intermediate CA" \
  organization="Acme Corp" \
  key_type="rsa" \
  key_bits=4096 \
  | jq -r '.data.csr' > intermediate.csr

# Step 6: Sign intermediate with root CA
vault write -format=json pki/root/sign-intermediate \
  [email protected] \
  common_name="Acme Corp Intermediate CA" \
  ttl="43800h" \
  | jq -r '.data.certificate' > intermediate.crt

# Step 7: Import signed intermediate into pki_int
vault write pki_int/intermediate/set-signed \
  [email protected]

# Step 8: Configure intermediate CA URLs
vault write pki_int/config/urls \
  issuing_certificates="https://vault.internal:8200/v1/pki_int/ca" \
  crl_distribution_points="https://vault.internal:8200/v1/pki_int/crl" \
  ocsp_servers="https://vault.internal:8200/v1/pki_int/ocsp"

# Step 9: Create a role for issuing leaf certificates
vault write pki_int/roles/internal-services \
  allowed_domains="svc.cluster.local,internal,example.com" \
  allow_subdomains=true \
  allow_glob_domains=false \
  max_ttl="720h" \    # 30 days max
  key_type="rsa" \
  key_bits=2048 \
  require_cn=false \
  allow_any_name=false

# Step 10: Issue a certificate
vault write pki_int/issue/internal-services \
  common_name="payment-service.production.svc.cluster.local" \
  alt_names="payment-service.production,payment-service" \
  ttl="24h"

Vault Policies

Policies are written in HCL and attached to tokens via auth method roles. They follow a deny-by-default model.

# payment-service-policy.hcl — production read-only for its own secrets
path "secret/data/production/payment-service" {
  capabilities = ["read"]
}

path "secret/data/production/payment-service/*" {
  capabilities = ["read"]
}

# Allow reading dynamic database credentials
path "database/creds/payment-service" {
  capabilities = ["read"]
}

# Allow issuing certificates for this service
path "pki_int/issue/internal-services" {
  capabilities = ["create", "update"]
  allowed_parameters = {
    "common_name" = ["payment-service.production.svc.cluster.local"]
    "ttl" = []
  }
}

# Allow renewing own token
path "auth/token/renew-self" {
  capabilities = ["update"]
}

# Allow looking up own token
path "auth/token/lookup-self" {
  capabilities = ["read"]
}

# Write the policy to Vault
vault policy write payment-service payment-service-policy.hcl
# platform-admin-policy.hcl — read/write for platform team
path "secret/*" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

path "sys/mounts/*" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

path "auth/*" {
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "pki*" {
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "sys/policies/*" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

path "sys/audit*" {
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

Vault Agent & Kubernetes Injector

Vault Agent Injector — Pod Annotations

The Vault Agent Injector mutating webhook intercepts pod creation and injects a Vault Agent init container and sidecar. Applications read secrets as files — no SDK changes needed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: production
spec:
  template:
    metadata:
      annotations:
        # Enable Vault Agent injection
        vault.hashicorp.com/agent-inject: "true"

        # Vault role for Kubernetes auth
        vault.hashicorp.com/role: "payment-service"

        # Inject KV secret at /vault/secrets/config.env
        vault.hashicorp.com/agent-inject-secret-config.env: "secret/data/production/payment-service"
        vault.hashicorp.com/agent-inject-template-config.env: |
          {{- with secret "secret/data/production/payment-service" -}}
          export DB_PASSWORD="{{ .Data.data.db_password }}"
          export API_KEY="{{ .Data.data.api_key }}"
          {{- end }}

        # Inject database credentials
        vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/payment-service"
        vault.hashicorp.com/agent-inject-template-db-creds: |
          {{- with secret "database/creds/payment-service" -}}
          export DB_USER="{{ .Data.username }}"
          export DB_PASS="{{ .Data.password }}"
          {{- end }}

        # Run as sidecar (not just init — keeps secrets refreshed)
        vault.hashicorp.com/agent-run-as-same-user: "true"
        vault.hashicorp.com/agent-pre-populate-only: "false"

        # Resource limits for the Vault Agent sidecar
        vault.hashicorp.com/agent-requests-cpu: "50m"
        vault.hashicorp.com/agent-requests-mem: "64Mi"
        vault.hashicorp.com/agent-limits-cpu: "100m"
        vault.hashicorp.com/agent-limits-mem: "128Mi"
    spec:
      serviceAccountName: payment-service
      containers:
        - name: payment-service
          image: acme/payment-service:v1.2.0
          command: ["/bin/sh", "-c"]
          args:
            - |
              source /vault/secrets/config.env
              source /vault/secrets/db-creds
              exec /app/payment-service

Vault Audit Logging

# Enable file audit device (JSON lines to stdout — captured by log aggregation)
vault audit enable file file_path=/vault/logs/audit.log

# Enable syslog audit
vault audit enable syslog tag="vault" facility="AUTH"

# Verify audit devices
vault audit list -detailed

# Sample audit log entry (all operations logged including auth failures)
# {
#   "time": "2025-03-28T10:00:00Z",
#   "type": "request",
#   "auth": {"token_type": "service", "policies": ["payment-service-policy"]},
#   "request": {"operation": "read", "path": "secret/data/production/payment-service"},
#   "response": {"data": {"metadata": {"version": 4}}}
# }

cert-manager — Kubernetes Certificate Management

cert-manager automates the issuance and renewal of TLS certificates in Kubernetes. It supports Let's Encrypt, Vault PKI, self-signed, and any ACME-compatible CA.

Installation

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.0 \
  --set installCRDs=true \
  --set global.leaderElection.namespace=cert-manager \
  --set prometheus.enabled=true \
  --set prometheus.servicemonitor.enabled=true

# Verify
kubectl get pods -n cert-manager
kubectl get crds | grep cert-manager

ClusterIssuer — Let's Encrypt HTTP01

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: nginx    # or "haproxy", "traefik", etc.
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging   # use for testing — avoids rate limits
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
      - http01:
          ingress:
            class: nginx

ClusterIssuer — Let's Encrypt DNS01 (Route 53)

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns01
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-dns01-account-key
    solvers:
      - dns01:
          route53:
            region: us-east-1
            hostedZoneID: Z1234567890ABC
            # If using IAM role for service account (IRSA):
            role: arn:aws:iam::123456789:role/cert-manager-route53

Certificate Resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-example-com
  namespace: production
spec:
  secretName: api-example-com-tls    # K8s Secret created with tls.crt/tls.key/ca.crt
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  commonName: api.example.com
  dnsNames:
    - api.example.com
    - api-v2.example.com
  duration: 2160h        # 90 days (Let's Encrypt maximum)
  renewBefore: 720h      # renew 30 days before expiry
  privateKey:
    algorithm: RSA
    size: 2048
    rotationPolicy: Always   # generate new key on each renewal

Ingress Annotation for Automatic TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
    - hosts:
        - api.example.com
      secretName: api-example-com-tls    # cert-manager creates this secret
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 80

VaultIssuer — Internal PKI via cert-manager

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-internal-ca
spec:
  vault:
    server: https://vault.vault.svc.cluster.local:8200
    path: pki_int/sign/internal-services
    caBundle: <base64-encoded-vault-ca-cert>
    auth:
      kubernetes:
        role: cert-manager
        mountPath: /v1/auth/kubernetes
        serviceAccountRef:
          name: cert-manager

---
# Certificate issued by Vault internal CA
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: payment-service-mtls
  namespace: production
spec:
  secretName: payment-service-mtls-tls
  issuerRef:
    name: vault-internal-ca
    kind: ClusterIssuer
  commonName: payment-service.production.svc.cluster.local
  dnsNames:
    - payment-service
    - payment-service.production
    - payment-service.production.svc
    - payment-service.production.svc.cluster.local
  duration: 24h
  renewBefore: 8h

openssl — Common Operations

# Generate a 4096-bit RSA private key
openssl genrsa -out server.key 4096

# Generate a Certificate Signing Request (CSR)
openssl req -new -key server.key -out server.csr \
  -subj "/CN=api.example.com/O=Acme Corp/C=US" \
  -addext "subjectAltName=DNS:api.example.com,DNS:api-v2.example.com"

# Self-sign a certificate (for testing only)
openssl x509 -req -in server.csr -signkey server.key \
  -out server.crt -days 365 -sha256

# Sign a CSR with a CA
openssl x509 -req -in server.csr \
  -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out server.crt -days 365 -sha256 \
  -extfile san.ext   # san.ext: subjectAltName=DNS:api.example.com

# Inspect a certificate (human-readable)
openssl x509 -in server.crt -noout -text

# Check certificate expiry
openssl x509 -in server.crt -noout -dates
openssl x509 -enddate -noout -in server.crt

# Check expiry of a live server certificate
echo | openssl s_client -servername api.example.com \
  -connect api.example.com:443 2>/dev/null \
  | openssl x509 -noout -dates

# Verify certificate chain
openssl verify -CAfile ca-chain.crt server.crt

# Convert PEM to PKCS12 (for Java keystores, browser import)
openssl pkcs12 -export -out server.p12 \
  -inkey server.key -in server.crt -certfile ca-chain.crt

# Extract certificate from PKCS12
openssl pkcs12 -in server.p12 -nokeys -out server.crt

# Check if a key and certificate match (fingerprints must match)
openssl x509 -noout -modulus -in server.crt | openssl md5
openssl rsa  -noout -modulus -in server.key | openssl md5

# Generate ECDSA key (P-256 — faster and smaller than RSA-2048)
openssl ecparam -name prime256v1 -genkey -noout -out ec.key
openssl req -new -key ec.key -out ec.csr -subj "/CN=api.example.com"

TLS Best Practices

Protocol Versions

Require TLS 1.2 minimum; prefer TLS 1.3. Disable SSL 3.0, TLS 1.0, TLS 1.1. NGINX example:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;   # let client choose from server list (TLS 1.3 negotiates this)

Cipher Suites

Prefer AEAD ciphers with forward secrecy. Avoid RC4, 3DES, NULL, EXPORT, ANON. Recommended NGINX config:

ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305;

HSTS

HTTP Strict Transport Security prevents protocol downgrade attacks. Submit to the HSTS preload list for maximum protection:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

OCSP Stapling

Staple the OCSP response to the TLS handshake so clients don't need to query the CA's OCSP server (privacy + performance):

ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/nginx/ca-chain.crt;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

Certificate Monitoring

Expired certificates cause outages. Monitor expiry proactively with at least 30 days warning.

Prometheus — ssl_exporter

# Deploy ssl_exporter to scrape certificate expiry from live endpoints
# prometheus-ssl-exporter/values.yaml
config:
  modules:
    https:
      prober: http
      timeout: 10s
      http:
        valid_status_codes: []
        method: GET
        tls_config:
          insecure_skip_verify: false

# Add targets to scrape config
scrape_configs:
  - job_name: 'ssl-expiry'
    metrics_path: /probe
    params:
      module: [https]
    static_configs:
      - targets:
          - api.example.com:443
          - admin.example.com:443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: ssl-exporter:9219

Prometheus Alerting Rules

groups:
  - name: certificate-expiry
    rules:
      # Alert 30 days before expiry
      - alert: CertificateExpiringSoon
        expr: ssl_cert_not_after - time() < 86400 * 30
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Certificate expiring in less than 30 days"
          description: "Certificate for {{ $labels.instance }} expires in {{ $value | humanizeDuration }}"

      # Alert 7 days before expiry — urgent
      - alert: CertificateExpiringCritical
        expr: ssl_cert_not_after - time() < 86400 * 7
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Certificate expiring in less than 7 days — action required"
          description: "Certificate for {{ $labels.instance }} expires in {{ $value | humanizeDuration }}"

      # Alert if cert-manager Certificate resource is not Ready
      - alert: CertManagerCertificateNotReady
        expr: certmanager_certificate_ready_status{condition="False"} == 1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "cert-manager Certificate not ready"
          description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} is not ready"

      # Alert on cert-manager renewal failure
      - alert: CertManagerCertificateRenewalFailure
        expr: increase(certmanager_certificate_renewal_timestamp_seconds[1h]) == 0
          and certmanager_certificate_expiration_timestamp_seconds - time() < 86400 * 14
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "cert-manager certificate has not been renewed"
          description: "Certificate {{ $labels.name }} is due for renewal but has not been renewed in the last hour"
Defense in depth: Combine automated cert-manager renewal with Prometheus alerting. If cert-manager fails silently (ACME challenge fails, DNS propagation issue), the alert fires with 30 days remaining — enough time to intervene manually before the certificate expires.

Useful Certificate Monitoring Commands

# List all cert-manager Certificates and their ready status
kubectl get certificates -A

# Describe a specific certificate (shows renewal events)
kubectl describe certificate api-example-com -n production

# List cert-manager CertificateRequests (one per issuance attempt)
kubectl get certificaterequests -A

# Watch cert-manager logs for ACME challenge activity
kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager -f

# Check Vault PKI CRL and OCSP
vault read pki_int/crl/rotate   # force CRL rotation
vault read pki_int/config/crl