Security Playbooks
Security Incident Response — Seven security playbooks covering the most critical cloud security incident types. All playbooks follow the NIST SP 800-61 incident response lifecycle: Preparation, Detection, Containment, Eradication, Recovery, Post-Incident Activity.
Legal Notice: Before taking any forensic action, consult your organisation's legal and compliance team. Preserve all evidence. Do not destroy logs or snapshots. In regulated industries (PCI-DSS, HIPAA, GDPR), breach notification obligations may be triggered within 24–72 hours of discovery.
SEC-0001 — Exposed Credentials / Secret Leak
Symptom / Detection
- GitHub Advanced Security / GitGuardian / truffleHog alert fires on a commit or PR
- AWS IAM Access Analyzer or GCP SCC reports credential in public repository
- Developer accidentally commits
.env,credentials.json, orterraform.tfvarsto a public repo - Third-party bug bounty report or responsible disclosure
Critical: Assume the credentials have already been harvested by automated scanners within minutes of the public push. Act immediately — do not wait to investigate whether they were actually used before revoking.
Immediate Actions
# Step 1: Revoke / invalidate the exposed credential IMMEDIATELY
# AWS — deactivate then delete the access key
aws iam update-access-key --access-key-id <AKIA...> --status Inactive --user-name <username>
aws iam delete-access-key --access-key-id <AKIA...> --user-name <username>
# GCP — revoke a service account key
gcloud iam service-accounts keys disable <key-id> \
--iam-account=<sa-name>@<project>.iam.gserviceaccount.com
gcloud iam service-accounts keys delete <key-id> \
--iam-account=<sa-name>@<project>.iam.gserviceaccount.com
# GitHub — revoke a GitHub PAT
# Navigate to: GitHub Settings → Developer Settings → Personal Access Tokens → Revoke
# Or via API:
curl -X DELETE \
-H "Authorization: token <your-admin-token>" \
https://api.github.com/applications/<client_id>/token \
-d '{"access_token":"<token-to-revoke>"}'
# Step 2: Notify the security team via #security-incidents channel
# Step 3: Declare incident SEV1 if production credentials; SEV2 for dev/test
Audit — Check for Unauthorised Usage
# AWS CloudTrail — search for API calls by the compromised key
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=<AKIA...> \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--output json | jq '.Events[] | {time: .EventTime, name: .EventName, source: .EventSource, ip: .SourceIPAddress}'
# Filter for high-risk API calls (IAM changes, data access, new resources)
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=<AKIA...> \
--query 'Events[?contains(EventName, `Create`) || contains(EventName, `Put`) || contains(EventName, `Delete`) || contains(EventName, `Attach`)]'
# GCP — audit log for service account activity
gcloud logging read \
'protoPayload.authenticationInfo.serviceAccountKeyName:"<key-id>"' \
--limit=100 --format=json
Eradication
# Remove the secret from git history using git-filter-repo (preferred over BFG)
pip install git-filter-repo
git filter-repo --path <file-with-secret> --invert-paths
# If only removing a specific string (e.g., a key value):
git filter-repo --replace-text <(echo '<SECRET_VALUE>==>[REDACTED]')
# Force push cleaned history (coordinate with all team members first)
git push origin --force --all
git push origin --force --tags
# Invalidate GitHub's cached copy by contacting GitHub Support with the repo URL
# GitHub has a process to remove cached views of deleted content
Rotate All Credentials
# Rotate ALL credentials that may have been in scope (not just the exposed one)
# Because if the repo/system was compromised, other secrets may also be at risk
# Generate new AWS access key
aws iam create-access-key --user-name <username>
# Generate new GCP service account key
gcloud iam service-accounts keys create new-key.json \
--iam-account=<sa-name>@<project>.iam.gserviceaccount.com
# Update secrets in vault / K8s secrets / CI/CD environment variables
kubectl create secret generic <secret-name> \
--from-literal=API_KEY='<new-value>' \
-n <namespace> --dry-run=client -o yaml | kubectl apply -f -
# Trigger rolling restart of pods that use the secret
kubectl rollout restart deployment/<deployment-name> -n <namespace>
Prevention: Install pre-commit hooks using
detect-secrets or gitleaks. Require secret scanning in CI pipelines. Use OIDC / Workload Identity Federation to eliminate static credentials entirely. Store all secrets in HashiCorp Vault or cloud-native secret managers.
SEC-0002 — Suspicious API Activity / Possible Account Compromise
Symptom / Detection
- AWS GuardDuty finding:
UnauthorizedAccess:IAMUser/MaliciousIPCallerorRecon:IAMUser/TorIPCaller - API calls from unusual geographies or IP ranges not in normal baseline
- Spike in
AssumeRole,CreateUser,AttachRolePolicyAPI calls - Instance launched in an unusual region or with a non-standard AMI
Immediate Actions
# Step 1: Acknowledge the GuardDuty finding
aws guardduty list-findings \
--detector-id <detector-id> \
--finding-criteria '{"Criterion":{"service.archived":{"Eq":["false"]}}}'
# Get finding details
aws guardduty get-findings \
--detector-id <detector-id> \
--finding-ids <finding-id> \
| jq '.Findings[0] | {type: .Type, severity: .Severity, ip: .Service.Action.NetworkConnectionAction.RemoteIpDetails.IpAddressV4, account: .AccountId}'
# Step 2: Revoke all active sessions for the compromised IAM user/role
# Attach an explicit Deny policy with current time condition to force session invalidation
aws iam put-user-policy --user-name <username> \
--policy-name DenyAllImmediately \
--policy-document '{
"Version":"2012-10-17",
"Statement":[{
"Effect":"Deny",
"Action":"*",
"Resource":"*",
"Condition":{
"DateLessThan":{"aws:TokenIssueTime":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}
}
}]
}'
# Step 3: Disable the IAM user's access keys
aws iam list-access-keys --user-name <username>
aws iam update-access-key --access-key-id <key-id> --status Inactive --user-name <username>
Isolate Compromised Instance
# Create an isolation security group with no ingress/egress
aws ec2 create-security-group \
--group-name "ISOLATION-DO-NOT-USE" \
--description "Forensic isolation - no traffic allowed" \
--vpc-id <vpc-id>
# Apply isolation security group to suspected instance (REMOVES existing SGs)
aws ec2 modify-instance-attribute \
--instance-id <instance-id> \
--groups <isolation-sg-id>
# Capture forensic EBS snapshot BEFORE taking any other action
aws ec2 create-snapshot \
--volume-id <root-volume-id> \
--description "FORENSIC SNAPSHOT - Incident INC-NNNN - $(date -u +%Y%m%dT%H%M%SZ)"
# Tag the instance and snapshot clearly
aws ec2 create-tags \
--resources <instance-id> \
--tags Key=SecurityStatus,Value=COMPROMISED Key=IncidentID,Value=INC-NNNN
Prevention: Enable GuardDuty in all regions and all accounts. Configure EventBridge rules to auto-notify and auto-remediate critical findings. Implement IAM permission boundaries. Use AWS Organizations SCPs to restrict actions to approved regions only.
SEC-0003 — DDoS Attack
Symptom / Detection
- Traffic volume spike of 10x or more normal baseline within minutes
- AWS Shield Advanced or GCP Cloud Armor generates a DDoS alert
- Load balancer reporting HTTP 503 errors at high rate; target response time exceeds SLO
- Origin instances hit CPU/network saturation without corresponding legitimate traffic
Immediate Actions — AWS
# Step 1: Contact AWS Shield Response Team (SRT) immediately for SEV1 DDoS
# Requires AWS Shield Advanced subscription
aws shield describe-subscription
aws shield create-protection-group \
--protection-group-id "ddos-response-$(date +%Y%m%d)" \
--aggregation SUM \
--pattern ALL
# Step 2: Apply WAF rate limiting rule via AWS Console or CLI
# Create a rate-based rule (max 2000 requests per 5 minutes per IP)
aws wafv2 create-rule-group \
--name "RateLimitRule" \
--scope CLOUDFRONT \
--capacity 10 \
--rules '[
{
"Name": "RateLimitPerIP",
"Priority": 1,
"Statement": {
"RateBasedStatement": {
"Limit": 2000,
"AggregateKeyType": "IP"
}
},
"Action": {"Block": {}},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "RateLimitPerIP"
}
}
]' \
--visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=RateLimitRuleGroup \
--region us-east-1
CloudFront Geo-Blocking
# Step 3: Block attacking countries via CloudFront geo restriction
# (Use only if attack is clearly originating from specific countries)
aws cloudfront update-distribution \
--id <distribution-id> \
--distribution-config file://distribution-config-with-geo-block.json
# distribution-config-with-geo-block.json snippet:
# "Restrictions": {
# "GeoRestriction": {
# "RestrictionType": "blacklist",
# "Quantity": 2,
# "Items": ["RU", "CN"]
# }
# }
# Step 4: Scale out origin infrastructure to absorb traffic while mitigation takes effect
aws autoscaling set-desired-capacity \
--auto-scaling-group-name <asg-name> \
--desired-capacity <2x-normal>
# Step 5: Monitor WAF metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/WAFV2 \
--metric-name BlockedRequests \
--dimensions Name=Rule,Value=RateLimitPerIP Name=WebACL,Value=<webacl-name> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Sum
Prevention: Subscribe to AWS Shield Advanced. Pre-configure WAF rate limiting rules before an attack. Use CloudFront for all public endpoints. Define attack response runbooks in advance. Have AWS SRT contact information readily accessible.
SEC-0004 — Unauthorised Privilege Escalation
Symptom / Detection
- CloudTrail alert on
AttachRolePolicy,PutUserPolicy,CreatePolicyVersionfrom unexpected principal - IAM Access Analyzer finding on overly permissive new policy
- User or role suddenly has
AdministratorAccessor wildcard actions without approval - Kubernetes:
kubectl auth can-i --listreveals unexpected permissions for a service account
Immediate Actions
# Step 1: Detect the privilege escalation via CloudTrail
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=AttachRolePolicy \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
--query 'Events[*].{Time:EventTime,User:Username,Role:RequestParameters}' \
--output table
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=PutUserPolicy \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)
# Step 2: Immediately detach the unauthorised policy
aws iam detach-role-policy \
--role-name <role-name> \
--policy-arn <policy-arn>
aws iam detach-user-policy \
--user-name <username> \
--policy-arn <policy-arn>
# Step 3: Delete any inline policies added without authorisation
aws iam list-user-policies --user-name <username>
aws iam delete-user-policy --user-name <username> --policy-name <policy-name>
# Step 4: Check for new IAM users or roles created during the window
aws iam list-users --query 'Users[?CreateDate>=`2024-01-01`]'
aws iam list-roles --query 'Roles[?CreateDate>=`2024-01-01`]'
Kubernetes RBAC Audit
# List all ClusterRoleBindings that grant cluster-admin
kubectl get clusterrolebindings -o json \
| jq '.items[] | select(.roleRef.name=="cluster-admin") | {name: .metadata.name, subjects: .subjects}'
# Remove an unauthorised ClusterRoleBinding
kubectl delete clusterrolebinding <binding-name>
# Audit all permissions for a specific service account
kubectl auth can-i --list --as=system:serviceaccount:<namespace>:<sa-name>
# Check for new RBAC resources created recently
kubectl get clusterrolebindings,rolebindings -A \
--sort-by='.metadata.creationTimestamp' | tail -20
Prevention: Implement IAM permission boundaries on all users and roles. Use AWS Organizations SCPs to prevent privilege escalation at the org level. Enable CloudTrail Insights for anomaly detection on IAM events. Enforce approval workflows for any IAM policy changes via IaC (Terraform) pull requests. Perform quarterly IAM access reviews.
SEC-0005 — Data Exfiltration Detected
Symptom / Detection
- GuardDuty finding:
Exfiltration:S3/AnomalousBehaviororUnauthorizedAccess:S3/TorIPCaller - Unusual outbound data transfer spike in VPC Flow Logs
- CloudTrail shows bulk
GetObjectorListObjectsAPI calls from unexpected source - DLP tool triggers on sensitive data patterns leaving the perimeter
GDPR / HIPAA / PCI-DSS: If personal data, health records, or cardholder data may be involved, notify your Data Protection Officer and Legal team immediately. Regulatory breach notification timelines begin at discovery, not at confirmation.
Immediate Actions
# Step 1: Isolate the workload — cut off egress immediately
# Update the Security Group of the affected instance to block all egress
aws ec2 revoke-security-group-egress \
--group-id <sg-id> \
--ip-permissions '[{"IpProtocol":"-1","IpRanges":[{"CidrIp":"0.0.0.0/0"}]}]'
# Kubernetes: apply a NetworkPolicy to block all egress from the namespace
kubectl apply -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: block-all-egress-emergency
namespace: <namespace>
spec:
podSelector: {}
policyTypes:
- Egress
EOF
# Step 2: Revoke credentials associated with the exfiltrating process
# (Refer to SEC-0001 for credential revocation steps)
# Step 3: Preserve logs — DO NOT delete or modify any logs
# Enable S3 Object Lock on CloudTrail bucket if not already enabled
aws s3api put-object-lock-configuration \
--bucket <cloudtrail-bucket> \
--object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":365}}}'
# Step 4: Quantify the blast radius — identify what data was accessed
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventSource,AttributeValue=s3.amazonaws.com \
--start-time <incident-start-time> \
--query 'Events[?EventName==`GetObject` || EventName==`ListObjects`]' \
| jq '.[] | {time: .EventTime, bucket: .Resources[0].ResourceName, user: .Username, ip: .SourceIPAddress}'
Legal Notification Checklist
# Documentation and notification checklist for data breach
# [ ] Record time of discovery: YYYY-MM-DD HH:MM UTC
# [ ] Identify data categories involved:
# - Personal data (names, emails, phone numbers)
# - Sensitive personal data (health, financial, biometric)
# - Cardholder data (PAN, CVV, expiry)
# [ ] Estimate number of data subjects affected
# [ ] Identify geographic regions of affected subjects (determines applicable law)
# Regulatory notification timelines:
# GDPR (EU): 72 hours from discovery to Supervisory Authority
# HIPAA (US): 60 days to HHS OCR; 60 days to affected individuals
# PCI-DSS: Immediately to card brands (Visa, Mastercard)
# PDPA (Thailand): 72 hours to PDPC; 30 days to data subjects
# [ ] Engage external legal counsel
# [ ] Engage forensic incident response firm (if internal capacity insufficient)
# [ ] Prepare breach notification letter for affected individuals
# [ ] Update incident log with all actions taken (timestamped)
Prevention: Implement VPC endpoints for S3 and other AWS services to prevent data leaving the VPC via internet. Enable S3 Access Logging and Macie for sensitive data discovery. Use network egress filtering (AWS Network Firewall, GCP Cloud Armor). Enforce data classification tags and bucket policies that restrict access to classified data.
SEC-0006 — Ransomware / Crypto-Miner Detected
Symptom / Detection
- GuardDuty finding:
CryptoCurrency:EC2/BitcoinTool.B!DNSorTrojan:EC2/BlackholeTraffic - Sustained 100% CPU on EC2 instance or K8s node with no corresponding application load
- Outbound connections to known mining pool addresses (pool.minexmr.com, etc.)
- Files with extensions
.locked,.encrypted, or ransom notes in filesystem - Unusual processes:
xmrig,minerd,kworkerwith unusual flags
Do NOT attempt to pay any ransom without first consulting legal counsel and law enforcement. Paying does not guarantee data recovery and may violate OFAC sanctions regulations.
Immediate Actions
# Step 1: ISOLATE IMMEDIATELY — cut off network access
aws ec2 modify-instance-attribute \
--instance-id <instance-id> \
--groups <isolation-sg-id> # Empty security group with no rules
# For Kubernetes — cordon the node and drain workloads to unaffected nodes
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# Step 2: Take a forensic snapshot BEFORE terminating the instance
aws ec2 create-snapshot \
--volume-id <volume-id> \
--description "FORENSIC - Ransomware/Miner - INC-NNNN - $(date -u +%Y%m%dT%H%M%SZ)"
# Step 3: Identify the blast radius — check other instances for similar IOCs
# Search for the same outbound connection pattern in VPC Flow Logs
aws logs filter-log-events \
--log-group-name /aws/vpc/flowlogs \
--filter-pattern "REJECT * * * 443" \
--start-time $(date -d '24 hours ago' +%s000)
# Step 4: Identify patient zero — how did the attacker get in?
# Check for recent SSH logins from unusual IPs
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=ConsoleLogin \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
# Check IAM for new credentials created
aws iam list-users --query 'Users[?CreateDate>=`2024-01-01`]'
Recovery — Restore from Backup
# Step 1: Verify backup integrity BEFORE restoring
# Do NOT restore to the same network segment as the infected instance
# AWS: Launch a new instance from a clean AMI and restore data from backup
aws ec2 run-instances \
--image-id <clean-ami-id> \
--instance-type <type> \
--subnet-id <clean-subnet-id> \
--security-group-ids <clean-sg-id> \
--key-name <keypair>
# Restore data from S3 backup (verify backup predates infection)
aws s3 sync s3://<backup-bucket>/<path> /data/restored/ \
--exclude "*.locked" --exclude "*.encrypted"
# Kubernetes: redeploy from clean container images
kubectl set image deployment/<deployment> \
<container>=<registry>/<image>:<known-clean-version> \
-n <namespace>
# Step 2: Rotate ALL credentials — assume full compromise
# (Follow SEC-0001 rotation steps for all service accounts and API keys)
Prevention: Enable GuardDuty in all accounts and regions. Use immutable infrastructure — replace instances rather than patching. Enforce regular, tested, air-gapped backups (3-2-1 rule). Implement egress filtering to block connections to known crypto-mining pools. Use AWS Inspector / GCP Security Command Center for vulnerability scanning.
SEC-0007 — Certificate / TLS Vulnerability
Symptom / Detection
- Vulnerability scanner (Qualys, Nessus, testssl.sh) reports weak cipher suites (RC4, DES, 3DES, EXPORT)
- SSL Labs scan returns a grade lower than A (e.g., B, C, or F due to POODLE, BEAST, Heartbleed, ROBOT)
- TLS 1.0 or TLS 1.1 still enabled — deprecated protocols detected
- Certificate uses SHA-1 signing algorithm or weak key size (<2048-bit RSA / <256-bit EC)
Diagnose Cipher Suites and Protocol Support
# Test full SSL/TLS configuration against a live endpoint
# Install testssl.sh
git clone --depth 1 https://github.com/drwetter/testssl.sh.git
cd testssl.sh
# Full scan with severity grading
./testssl.sh --severity HIGH https://<domain>:443
# Quick check — just protocols
./testssl.sh --protocols <domain>:443
# Using openssl — test specific cipher suites manually
openssl s_client -cipher RC4-SHA -connect <domain>:443 2>&1 | grep "Cipher is"
# If output shows "Cipher is RC4-SHA" — RC4 is enabled (VULNERABILITY)
# Test for TLS 1.0 support
openssl s_client -tls1 -connect <domain>:443 2>&1 | grep -E "Protocol|Cipher"
# Test for TLS 1.1 support
openssl s_client -tls1_1 -connect <domain>:443 2>&1 | grep -E "Protocol|Cipher"
# Test for TLS 1.2 support (should be supported)
openssl s_client -tls1_2 -connect <domain>:443 2>&1 | grep -E "Protocol|Cipher"
# Test for TLS 1.3 support (should be supported)
openssl s_client -tls1_3 -connect <domain>:443 2>&1 | grep -E "Protocol|Cipher"
Disable Weak Ciphers — Nginx
# Recommended secure Nginx SSL configuration
# Edit /etc/nginx/sites-available/<site> or /etc/nginx/nginx.conf
server {
listen 443 ssl http2;
server_name <domain>;
ssl_certificate /etc/letsencrypt/live/<domain>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/<domain>/privkey.pem;
# Only TLS 1.2 and 1.3 (disable 1.0 and 1.1)
ssl_protocols TLSv1.2 TLSv1.3;
# Mozilla Intermediate compatibility cipher list (as of 2024)
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
# HSTS (only after verifying HTTPS works correctly)
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
# Disable weak headers
add_header X-Frame-Options SAMEORIGIN always;
add_header X-Content-Type-Options nosniff always;
}
# Test config and reload
nginx -t && systemctl reload nginx
Rotate Certificates with Weak Parameters
# Generate a new 4096-bit RSA private key and CSR
openssl req -newkey rsa:4096 -keyout new-private.key -out new-csr.csr \
-subj "/C=VN/ST=Hanoi/L=Hanoi/O=Company/CN=<domain>"
# Or generate an EC (P-384) key — preferred for modern deployments
openssl ecparam -name secp384r1 -genkey -noout -out ec-private.key
openssl req -new -key ec-private.key -out ec-csr.csr \
-subj "/C=VN/ST=Hanoi/L=Hanoi/O=Company/CN=<domain>"
# Check the new certificate parameters after issuance
openssl x509 -in new-certificate.crt -noout -text \
| grep -E "Public Key Algorithm|Public-Key:|Signature Algorithm:"
# Update Kubernetes TLS secret with the new certificate
kubectl create secret tls <tls-secret-name> \
--cert=new-certificate.crt \
--key=new-private.key \
-n <namespace> --dry-run=client -o yaml | kubectl apply -f -
Verification: Re-run
testssl.sh and confirm no HIGH or CRITICAL severity findings. SSL Labs scan should return A or A+ grade. Verify TLS 1.0 and 1.1 are rejected: openssl s_client -tls1 -connect <domain>:443 should return handshake failure.
Prevention: Use the Mozilla SSL Configuration Generator for all web servers. Schedule quarterly SSL Labs scans via CI/CD. Enable automated certificate renewal with cert-manager. Pin cipher suite configuration in Terraform / Ansible so it cannot be changed without a code review.