Cloud Overview
What is Cloud Computing?
Cloud computing delivers on-demand IT resources over the internet with pay-as-you-go pricing. Instead of buying and maintaining physical data centers, you access technology services from a cloud provider. The primary service models differ in how much of the stack is managed for you.
| Model | Full Name | You Manage | Provider Manages | Examples |
|---|---|---|---|---|
| IaaS | Infrastructure as a Service | OS, runtime, data, apps | Servers, storage, networking, virtualization | AWS EC2, GCP Compute Engine, Azure VMs |
| PaaS | Platform as a Service | Data and apps | OS, runtime, middleware, servers, storage | AWS Elastic Beanstalk, GCP App Engine, Heroku |
| SaaS | Software as a Service | Configuration and data input | Everything | Gmail, Salesforce, Slack, Datadog |
| FaaS | Function as a Service | Code logic only | Infrastructure, scaling, OS, runtime | AWS Lambda, GCP Cloud Functions, Azure Functions |
Multi-Cloud vs Hybrid Cloud
Multi-Cloud Strategy
Multi-cloud uses services from two or more public cloud providers (e.g., AWS + GCP) simultaneously for different workloads or as redundancy.
- Avoid vendor lock-in
- Use best-of-breed services per cloud
- Geographic redundancy and compliance
- Leverage competitive pricing
- Resilience against cloud provider outages
- Operational complexity increases significantly
- Skill requirements multiply
- Data egress costs between clouds
- Inconsistent tooling and APIs
- Harder to optimize costs holistically
Hybrid Cloud Strategy
Hybrid cloud connects on-premises infrastructure with one or more public clouds, enabling data and applications to move between environments.
- Regulatory compliance (data residency)
- Legacy systems modernization
- Burst capacity for peak demand
- Disaster recovery to cloud
- Treating cloud as "just another data center"
- Lift-and-shift without optimization
- Ignoring cloud-native services
- No unified identity management
AWS Well-Architected Framework — 6 Pillars
The AWS Well-Architected Framework provides architectural best practices across six pillars. Each pillar includes design principles, questions, and best practices to help evaluate and improve cloud architectures.
1. Operational Excellence
Focus on running and monitoring systems to deliver business value and continually improve processes. Key practices include performing operations as code (IaC), annotating documentation, making frequent small reversible changes, refining operations procedures frequently, and anticipating failure.
Key services: AWS CloudFormation, Systems Manager, CloudWatch, X-Ray, CodePipeline
2. Security
Protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies. Apply security at all layers, enable traceability, implement least-privilege access, secure data in transit and at rest, and automate security best practices.
Key services: IAM, KMS, CloudTrail, Config, GuardDuty, Security Hub, WAF, Shield
3. Reliability
Ensure a workload performs its intended function correctly and consistently when expected. Recover from failures automatically, scale horizontally to increase aggregate workload availability, stop guessing capacity, manage change with automation.
Key services: Route 53, ELB, Auto Scaling, CloudWatch, S3, Glacier
4. Performance Efficiency
Use IT and computing resources efficiently to meet system requirements, and maintain that efficiency as demand changes and technologies evolve. Democratize advanced technologies, go global in minutes, use serverless architectures, experiment more often.
Key services: EC2 (right-sizing), ElastiCache, CloudFront, Lambda, EKS
5. Cost Optimization
Run systems to deliver business value at the lowest price point. Adopt a consumption model, measure overall efficiency, stop spending money on undifferentiated heavy lifting, and analyze and attribute expenditure.
Key services: Cost Explorer, Budgets, Trusted Advisor, Reserved Instances, Savings Plans, Spot Instances
6. Sustainability
Minimize the environmental impact of running cloud workloads. Understand your impact, establish sustainability goals, maximize utilization, anticipate and adopt new hardware/software, use managed services, and reduce the downstream impact of your cloud workloads.
Key services: Compute Optimizer, Graviton instances, Lambda, Fargate
GCP Architecture Framework — Pillar Mapping
GCP's Architecture Framework aligns closely with AWS's model, providing the same 6 categories adapted to Google's ecosystem.
| Pillar | AWS WA Equivalent | Key GCP Services |
|---|---|---|
| Operational Excellence | Operational Excellence | Cloud Deployment Manager, Cloud Build, Cloud Monitoring, Error Reporting |
| Security, Privacy, Compliance | Security | IAM, Cloud KMS, Security Command Center, Cloud Armor, VPC Service Controls |
| Reliability | Reliability | Cloud Load Balancing, Cloud DNS, Cloud Storage, Spanner (global), GKE Autopilot |
| Performance Optimization | Performance Efficiency | Cloud CDN, Memorystore, Bigtable, BigQuery, Cloud Trace, Profiler |
| Cost Optimization | Cost Optimization | Billing Reports, Recommender, Committed Use Discounts, Spot VMs, Budgets |
| Sustainability | Sustainability | Carbon Footprint dashboard, Tau T2D (ARM), serverless products, region selection |
Cloud Design Patterns
Active-Active Multi-AZ
Traffic is distributed across multiple availability zones simultaneously. All instances serve production traffic. Failure of one AZ does not cause downtime — load balancers automatically route to healthy AZs. Best for high-throughput, low-latency applications.
Active-Passive Failover
Primary AZ/region handles all traffic; secondary stands by in a warm or hot standby state. On failure, DNS failover or a load balancer health check promotes the passive to active. Simpler and cheaper than active-active but has a failover lag (typically 30–120 seconds for DNS TTL).
Multi-Region Deployment
Application and data are replicated across geographically separate cloud regions. Serves users from the nearest region for latency reduction. Critical for global compliance, data residency requirements, and ultra-high availability (SLA > 99.99%).
Blue-Green Deployment
Two identical production environments run in parallel (blue = current, green = new). Deploy the new version to green, validate, then switch traffic via DNS or load balancer. Zero-downtime deployments with instant rollback by switching traffic back to blue.
Cloud Migration Strategies — The 6 Rs
When moving workloads to the cloud, choose the migration strategy based on business goals, time constraints, and technical complexity.
1. Rehost (Lift and Shift)
Move applications to the cloud without changes — same OS, same configuration. Fast and low-risk but misses cloud-native benefits. Use for quick migrations under deadline pressure or for legacy apps that can't be easily modified.
Example: Move an on-premises RHEL MySQL server to an AWS EC2 instance with the same configuration using VM Import/Export.
2. Replatform (Lift, Tinker, and Shift)
Make a few cloud optimizations without changing core architecture. Examples: migrate MySQL to Amazon RDS, move app server to Elastic Beanstalk, use managed container services instead of self-managed containers.
Example: Move a self-managed PostgreSQL server to Cloud SQL (GCP) with automated backups and HA.
3. Repurchase (Drop and Shop)
Replace existing application with a different product, usually a SaaS solution. Common for CRM, ERP, email systems where the cloud SaaS equivalent is more cost-effective and feature-rich.
Example: Replace on-premises Exchange with Microsoft 365 or replace Jira Data Center with Jira Cloud.
4. Refactor / Re-architect
Redesign the application using cloud-native capabilities. Decompose monolith into microservices, adopt serverless, event-driven architecture, or containerization. Highest effort but greatest long-term benefit.
Example: Break a monolithic e-commerce app into services — order service (Lambda), product catalog (DynamoDB + API Gateway), authentication (Cognito).
5. Retire
Decommission applications that are no longer needed or are redundant after evaluating the portfolio. Reduces cost and complexity of managing unnecessary workloads.
Example: Identify 20% of the application portfolio as rarely used during assessment and shut them down post-migration.
6. Retain
Keep applications on-premises or in current state if migration doesn't make business sense yet. Usually for apps with upcoming major version updates, pending compliance approvals, or too costly to migrate at this time.
Example: Retain a legacy COBOL mainframe application that will be replaced by a SaaS vendor in 18 months.
Landing Zone Concept
A Landing Zone is a pre-configured, secure, multi-account cloud environment based on best practices. It provides the foundation upon which all cloud workloads are deployed.
Account / Project Structure
# AWS Organization Structure
Root (Management Account)
├── Security OU
│ ├── Audit Account # CloudTrail, Config aggregation
│ └── Log Archive Account # Centralized logging (S3)
├── Infrastructure OU
│ ├── Network Account # Transit Gateway, Direct Connect
│ └── Shared Services # Active Directory, DNS, AMI catalog
├── Workloads OU
│ ├── Production OU
│ │ ├── Prod-AppA Account
│ │ └── Prod-AppB Account
│ ├── Staging OU
│ │ └── Staging-AppA Account
│ └── Dev OU
│ └── Dev-Sandbox Account
└── Sandbox OU
└── Individual developer sandboxes
Network Topology
# Hub-and-Spoke with Transit Gateway (AWS)
Network Account (Hub)
├── Transit Gateway
├── Shared VPC (10.0.0.0/16)
│ ├── VPN / Direct Connect endpoints
│ └── Centralized Egress (NAT Gateway)
└── DNS VPC (Route 53 Resolver)
Spoke VPCs (per workload account)
├── Prod VPC: 10.10.0.0/16 → TGW attachment
├── Staging VPC: 10.20.0.0/16 → TGW attachment
└── Dev VPC: 10.30.0.0/16 → TGW attachment
# GCP Shared VPC equivalent
Host Project (VPC host)
└── Shared VPC → subnets delegated to service projects
Service Project A (prod workloads) → uses shared subnets
Service Project B (dev workloads) → uses shared subnets
IAM Hierarchy
# AWS IAM hierarchy via Organizations
- SCPs (Service Control Policies) at OU level: deny guardrails
- Permission Boundaries: limit max IAM permissions per role
- Role: specific task roles (Developer, ReadOnly, Admin)
- IAM Identity Center (SSO): federated access across accounts
# GCP IAM hierarchy
Organization → Folders → Projects → Resources
- Org Policies: constraints applied top-down
- IAM bindings at each level (inherited downward)
- Service Accounts: workload identity at resource level
- Workload Identity Federation: keyless auth for CI/CD
FinOps Integration — Cost from Day 1
Tagging Strategy
# Required tags for every resource (enforce via SCPs / Org Policies)
Environment: prod | staging | dev | sandbox
Owner: team-platform | team-backend | team-data
Project: project-name or cost-center-code
Application: app-name
ManagedBy: terraform | manual | cloudformation
CostCenter: CC-12345
# AWS Tag Policy example (JSON)
{
"tags": {
"Environment": {
"tag_key": { "@@assign": "Environment" },
"tag_value": {
"@@assign": ["prod", "staging", "dev", "sandbox"]
},
"enforced_for": {
"@@assign": ["ec2:instance", "rds:db", "s3:bucket"]
}
}
}
}
# Terraform tagging (locals block)
locals {
common_tags = {
Environment = var.environment
Owner = var.team
Project = var.project
ManagedBy = "terraform"
CostCenter = var.cost_center
}
}
Key Tooling Ecosystem
| Tool | Category | Purpose |
|---|---|---|
| Terraform | IaC | Provision and manage cloud infrastructure declaratively across AWS, GCP, Azure |
| Ansible | Configuration Management | Configure OS, install packages, manage files on cloud VMs and on-premises servers |
| Helm | Package Manager | Deploy and manage Kubernetes applications using versioned charts |
| ArgoCD | GitOps / CD | Continuous delivery for Kubernetes — sync cluster state with Git repository |
| Prometheus | Monitoring | Metrics collection and alerting for cloud-native and Kubernetes workloads |
| CloudWatch | Monitoring (AWS) | AWS-native metrics, logs, alarms, dashboards, and tracing (X-Ray integration) |
| Cloud Monitoring | Monitoring (GCP) | GCP-native metrics collection, alerting policies, and uptime checks |
| Grafana | Visualization | Unified dashboards for Prometheus, CloudWatch, and Cloud Monitoring data sources |
- AWS Deep Dive — EC2, EKS, VPC, IAM, S3, CloudWatch
- GCP Deep Dive — GKE, Compute Engine, VPC, Cloud SQL, Monitoring
- Architecture Patterns — HA, DR, multi-region, microservices, serverless