AWS DevOps
Engineer Pro
Master the DOP-C02 exam — CI/CD automation, Infrastructure as Code, resilient architectures, monitoring, incident response, and security compliance. Built for engineers operating AWS at professional scale.
SDLC Automation
The largest domain on DOP-C02. Covers CI/CD pipeline design with AWS CodePipeline, CodeBuild, CodeCommit, and CodeDeploy — plus artifact management, testing strategies, and deployment patterns like blue/green and canary.
Software Development Lifecycle Automation
CI/CD pipelines · Deployment strategies · Artifact management · Testing
🚀 CodePipeline
- Orchestrates end-to-end CI/CD workflow
- Source → Build → Test → Deploy stages
- Manual approval action between stages
- Integrates with CodeCommit, GitHub, S3, ECR
- Cross-account & cross-region pipeline actions
- EventBridge triggers on state changes
🔨 CodeBuild
- Fully managed build service (no servers)
- buildspec.yml defines build phases
- Phases: install → pre_build → build → post_build
- Artifacts uploaded to S3 after build
- Environment variables & SSM Parameter Store
- VPC config for private resource access
🚢 CodeDeploy
- In-place: stop app, deploy, restart (EC2/On-prem)
- Blue/Green: new fleet, traffic shift, old decommission
- appspec.yml defines lifecycle hooks
- EC2, Lambda, ECS deployment types
- Rollback on alarm or manual trigger
- Deployment groups control target scope
📦 CodeArtifact
- Managed artifact repository (npm, Maven, pip, NuGet)
- Upstream repositories: pull from npmjs, Maven Central
- Cross-account sharing via resource policies
- Integrates with CodeBuild buildspec.yml
- Domain contains multiple repositories
🔀 Deployment Strategies
- All-at-once: fast but max downtime risk
- Rolling: partial fleet at a time
- Rolling with additional batch: adds new before removing
- Immutable: new ASG, then swap; safest for EB
- Blue/Green: full new env, DNS switch, instant rollback
- Canary: shift 10% traffic, then 100% after bake
🧪 Testing in Pipelines
- Unit tests in CodeBuild post_build phase
- Integration tests via Lambda test actions
- Load testing with AWS Distributed Load Testing
- Selenium UI tests via CodeBuild + EC2
- Test reports API: JUnit/Cucumber XML format
- Gate on test pass rate before promote
Lambda & ECS Deployment Patterns
| Strategy | Lambda Traffic Shift | ECS/Fargate | Rollback Trigger |
|---|---|---|---|
| Canary10Percent5Minutes | 10% → wait 5 min → 100% | Linear shift via ALB weighted TG | CloudWatch alarm |
| Linear10PercentEvery1Minute | Add 10% each minute | Incremental weighted rule update | CloudWatch alarm |
| AllAtOnce | Instant shift | Full task replacement | Deployment failure |
| Blue/Green | N/A — Lambda aliases | New task set, swap ALB listener | Manual or alarm |
Configuration Management & IaC
CloudFormation, AWS Systems Manager, AWS Config, Elastic Beanstalk, and OpsWorks. Emphasis on automating infrastructure lifecycle, drift detection, and configuration enforcement at scale.
Infrastructure as Code & Configuration
CloudFormation · SSM · AWS Config · Elastic Beanstalk · OpsWorks
📋 CloudFormation Advanced
- StackSets: deploy stacks to multiple accounts/regions
- Nested Stacks: reuse templates via Stack resource
- Drift Detection: identify manual changes to stack resources
- Change Sets: preview changes before applying
- cfn-init & cfn-signal: bootstrap EC2 instances
- DeletionPolicy: Retain / Snapshot / Delete
🖥️ Systems Manager (SSM)
- Run Command: execute scripts on managed instances
- Session Manager: shell access without SSH/bastion
- Patch Manager: automate OS patching with baselines
- Parameter Store: config data; String / StringList / SecureString
- Automation: runbooks for complex multi-step ops
- Inventory: collect metadata from managed instances
⚖️ AWS Config
- Records resource configuration history
- Managed rules: required-tags, restricted-ssh, etc.
- Custom rules: Lambda-backed for complex logic
- Conformance Packs: group of rules + remediation actions
- Auto-remediation: SSM Automation document per rule
- Multi-account: Config Aggregator across org
🌱 Elastic Beanstalk
- PaaS: manages EC2, ASG, ELB, RDS automatically
- .ebextensions: customize EB environment via YAML
- Deployment modes: all-at-once, rolling, immutable, traffic-splitting
- Worker tier: SQS + daemon for background processing
- Custom AMI & platform via Packer/EC2 Image Builder
🍽️ OpsWorks
- Managed Chef / Puppet configuration management
- OpsWorks Stacks: layers for each app tier
- Recipes triggered on lifecycle events
- Use when already invested in Chef/Puppet tooling
- Exam: less focus — know when to use vs SSM/CFn
🏗️ EC2 Image Builder
- Automates AMI/container image creation and patching
- Pipeline: base image → build → test → distribute
- Components: install/configure scripts per phase
- Integrates with SSM Patch Manager for OS updates
- Distributes images to multiple regions/accounts
SSM Parameter Store vs Secrets Manager
| Feature | SSM Parameter Store | Secrets Manager |
|---|---|---|
| Cost | Free (standard tier) / $0.05/param (advanced) | $0.40/secret/month + $0.05/10k API calls |
| Automatic Rotation | No (manual Lambda) | Yes — built-in for RDS, Redshift, DocumentDB |
| Cross-account | Via SSM cross-account RAM or custom | Native resource-based policy |
| Best For | Config values, non-sensitive strings, app config | DB credentials, API keys, secrets needing rotation |
| KMS Integration | SecureString type | Always encrypted with KMS |
Resilient Cloud Solutions
Designing and operating highly available, fault-tolerant, and self-healing architectures. Covers Auto Scaling strategies, multi-AZ and multi-region patterns, decoupling with SQS/SNS, and disaster recovery.
High Availability & Fault Tolerance
Auto Scaling · Multi-AZ · Multi-region DR · Decoupling patterns
📈 Auto Scaling Groups
- Target Tracking: maintain metric at target (simplest)
- Step Scaling: tiered responses to alarm thresholds
- Scheduled: predictable load patterns
- Predictive Scaling: ML-based capacity planning
- Lifecycle Hooks: pause before/after scale events
- Warm Pools: pre-initialized instances for fast scale-out
⚖️ Load Balancing HA
- ALB: multi-AZ; health checks per target group
- Connection draining: graceful deregistration
- Cross-zone load balancing: distribute across all AZ instances
- NLB: ultra-low latency, static IPs, TCP/UDP
- GLB: third-party appliance traffic inspection
- Route 53 health checks + failover routing
📨 Decoupling with SQS
- Standard: at-least-once, best-effort order
- FIFO: exactly-once, strict order, 3,000 TPS with batching
- DLQ: captures failed messages after maxReceiveCount
- Visibility timeout: hides message during processing
- Long polling: reduces API calls (WaitTimeSeconds 1-20)
- Message retention: up to 14 days
🌐 Multi-Region DR
- Backup & Restore: hours RTO/RPO — cheapest
- Pilot Light: core always-on; scale on disaster
- Warm Standby: scaled-down replica running
- Multi-Site Active-Active: full capacity both regions
- Route 53 failover: active-passive with health checks
- Aurora Global DB: <1s RPO, ~1min RTO
🔄 Self-Healing Patterns
- EC2 Auto Recovery: recover on hardware failure
- ASG health check replacement: instance health monitoring
- EventBridge + Lambda: automated remediation workflows
- Step Functions: orchestrate complex recovery logic
- SSM Automation runbooks: pre-built remediation scripts
🏷️ Spot & Mixed Fleets
- Spot Instances: 2-min interruption notice
- Mixed Instance Policy: On-Demand base + Spot savings
- Capacity Rebalancing: proactive instance replacement
- Spot Fleet: multiple pools for interruption resilience
- Interruption handling: checkpointing & stateless design
Pending:Wait (scale-out) and Terminating:Wait (scale-in) pause the ASG action. Use them to run bootstrap scripts, drain connections, or send notifications before proceeding. CompleteLifecycleAction or heartbeat timeout resumes.
Monitoring & Logging
Comprehensive observability using CloudWatch (metrics, logs, alarms, dashboards), CloudTrail, AWS X-Ray, and third-party integrations. Emphasis on automated alerting and log-driven insights.
Observability & Operational Intelligence
CloudWatch · CloudTrail · X-Ray · OpenSearch · Kinesis Data Streams
📊 CloudWatch Metrics
- Custom metrics: PutMetricData API (1-second resolution)
- High-resolution alarms: 10 or 30-second periods
- Metric Math: calculate derived metrics
- CloudWatch Agent: collect memory, disk, process metrics
- Embedded Metrics Format (EMF): logs as structured metrics
- Cross-account cross-region dashboards
📜 CloudWatch Logs
- Log Groups → Log Streams → Log Events
- Metric Filters: extract values from log patterns
- CloudWatch Logs Insights: interactive SQL-like queries
- Subscription Filters: stream logs to Lambda/Kinesis/OpenSearch
- Log retention: 1 day to indefinite (default: never expire)
- Cross-account log shipping via subscription
🔔 CloudWatch Alarms
- States: OK / ALARM / INSUFFICIENT_DATA
- Actions: SNS, Auto Scaling, EC2 actions
- Composite Alarms: AND/OR logic on multiple alarms
- Anomaly Detection: ML baseline for dynamic thresholds
- Treat missing data: breaching / notBreaching / ignore / missing
🕵️ CloudTrail
- Records API calls (who, what, when, from where)
- Management events: control plane (default on)
- Data events: S3 object ops, Lambda invocations (opt-in)
- Insights: detect unusual API activity patterns
- Org trail: single trail covering all member accounts
- Log file integrity validation: SHA-256 hash chain
🔍 AWS X-Ray
- Distributed tracing across microservices
- Trace: end-to-end request path (segments + subsegments)
- Service Map: visual dependency graph
- Sampling rules: reduce cost while capturing anomalies
- X-Ray Daemon: UDP port 2000, batches & sends traces
- Annotations: indexed for filtering; metadata: not indexed
📡 Centralized Logging
- Kinesis Data Streams: real-time log ingestion
- Kinesis Data Firehose: S3/OpenSearch/Redshift delivery
- OpenSearch Service: full-text search + Kibana dashboards
- CloudWatch → Firehose → S3 → Athena: log analytics
- Log archival: S3 + Glacier for compliance retention
Incident & Event Response
Automated remediation using EventBridge rules, SSM Automation runbooks, AWS Health, and Config auto-remediation. Building event-driven, self-healing operations workflows.
Automated Remediation & Event-Driven Ops
EventBridge · SSM Automation · AWS Health · Config Remediation
🎯 EventBridge Rules
- Event pattern rules: match specific service events
- Schedule rules: cron/rate expressions
- Targets: Lambda, SQS, SNS, Step Functions, CodePipeline
- Cross-account event routing via event buses
- Archive & replay: reprocess past events
- Schema Registry: auto-discovers event schemas
🤖 SSM Automation
- Runbooks (documents) define automation steps
- AWS-managed: AWS-RestartEC2Instance, AWS-CreateSnapshot
- Custom runbooks: Bash/Python scripts in steps
- Triggered by: Config, EventBridge, Schedule, manually
- Rate control: concurrency + error thresholds
- Cross-account automation with assumed roles
🏥 AWS Health
- Personal Health Dashboard: account-specific events
- Service Health Dashboard: global service status
- Health events in EventBridge: automate response
- Org-level health: view all member account events
- Advance notice for scheduled maintenance (EC2, RDS)
🔁 Config Auto-Remediation
- Remediation action linked to SSM Automation document
- Auto-remediation: triggers on NON_COMPLIANT evaluation
- Retry on failure with attempt count
- Parameter mapping from Config rule to SSM params
- Example: restrict-ssh → AWS-DisablePublicAccessForSecurityGroup
📱 Incident Response Patterns
- CloudWatch Alarm → SNS → Lambda → remediation
- EventBridge → Step Functions → multi-step workflow
- GuardDuty finding → EventBridge → quarantine Lambda
- Health event → EventBridge → migrate workloads
- Ops Center: operational work items (OpsItems) tracking
🔧 AWS Systems Manager OpsCenter
- Centralizes OpsItems (operational issues)
- EventBridge rules auto-create OpsItems from events
- Associate runbooks with OpsItems for remediation
- Integration with ITSM tools (Jira, ServiceNow)
- Aggregates findings from Security Hub, Config, CloudTrail
AWS-DisablePublicAccessForSecurityGroup. Config passes the instance's SG ID as a parameter. This is the gold standard detect-and-remediate pattern.
Security & Compliance Automation
Automating security controls, secret management, compliance posture, and threat detection. Covers IAM policies, KMS, Secrets Manager, GuardDuty, Security Hub, Macie, and AWS Config conformance packs.
Security Automation & Compliance
IAM · KMS · Secrets Manager · GuardDuty · Security Hub · Macie · Config
🔐 IAM Advanced Patterns
- Permission Boundaries: cap max permissions for roles
- SCPs: max permissions for all accounts in OU
- Attribute-Based Access Control (ABAC): tag conditions
- PassRole: required to assign roles to services
- Service-linked roles: pre-defined per-service roles
- Condition keys: aws:RequestedRegion, aws:PrincipalTag
🗝️ KMS & Encryption
- CMK (Customer Managed Key): full control, auditable
- AWS Managed Key: per-service, auto-rotation every 3yr
- Envelope encryption: data key encrypted with CMK
- Key policies: resource-based; grants: temporary access
- KMS multi-region keys: replicate for cross-region decrypt
- CloudHSM: FIPS 140-2 Level 3; single-tenant HSM
🔑 Secrets Manager
- Automatic rotation: Lambda rotates on schedule
- Built-in rotators: RDS, Aurora, Redshift, DocumentDB
- Cross-account access via resource-based policy
- Versioning: AWSCURRENT / AWSPREVIOUS / AWSPENDING
- Always KMS-encrypted; no plaintext storage
- SDK: GetSecretValue → caches with TTL refresh
🛡️ Threat Detection
- GuardDuty: ML-based threat detection (VPC Flow, DNS, CloudTrail)
- Amazon Inspector: vulnerability scanning (EC2, Lambda, ECR)
- Amazon Macie: PII/sensitive data discovery in S3
- Security Hub: aggregates findings, CSPM posture score
- Detective: investigate security findings root cause
- All integrate with EventBridge for automated response
📋 Compliance Automation
- AWS Config Conformance Packs: CIS, PCI-DSS, HIPAA rulesets
- Security Hub standards: AWS Foundational, CIS, PCI
- AWS Audit Manager: continuous evidence collection
- AWS Artifact: download compliance reports (SOC 2, ISO)
- Service Control Policies: preventive compliance guardrails
🔒 Network Security
- AWS WAF: Layer 7 rules, rate limiting, bot control
- AWS Shield Standard: always-on DDoS (free)
- Shield Advanced: $3k/month; 24/7 SRT, cost protection
- VPC Security Groups: stateful; NACLs: stateless
- VPC Flow Logs: IP traffic to CW Logs or S3
- AWS Network Firewall: centralized stateful/stateless IDS/IPS
Master Quick Reference
| Service / Concept | Dom | The Key Fact to Remember |
|---|---|---|
| CodeDeploy Blue/Green (ECS) | D1 | ALB listener shifts traffic between task sets; appspec defines traffic hooks |
| CodeBuild buildspec.yml | D1 | Phases: install → pre_build → build → post_build; artifacts block uploads to S3 |
| Lambda Canary Deployment | D1 | Canary10Percent5Minutes shifts 10%, waits, then 100%. CloudWatch alarm triggers rollback |
| CloudFormation StackSets | D2 | SERVICE_MANAGED = auto-deploy with Organizations; SELF_MANAGED = manual target accounts |
| SSM Patch Manager | D2 | Patch baselines define rules; Maintenance Windows schedule patching; State Manager enforces |
| AWS Config Conformance Pack | D2 | Bundle of Config rules + remediation; deploy via StackSets for org-wide compliance |
| ASG Warm Pools | D3 | Pre-initialized stopped instances; much faster scale-out than launching new; pay for EBS only |
| SQS DLQ | D3 | After maxReceiveCount failures, messages go to DLQ. Set retention period long enough to inspect |
| CloudWatch Anomaly Detection | D4 | ML baseline from 2+ weeks of data; dynamic upper/lower bands replace static thresholds |
| X-Ray Annotations vs Metadata | D4 | Annotations: indexed, filterable. Metadata: not indexed, arbitrary data. Use annotations for search |
| Config Auto-Remediation | D5 | NON_COMPLIANT → SSM Automation doc triggered automatically. Parameter mapping from rule |
| EventBridge Targets | D5 | Lambda, SQS, SNS, Step Functions, CodePipeline, Kinesis, ECS Task, SSM Automation |
| GuardDuty → Remediation | D6 | Finding → EventBridge → Lambda → isolate instance. No agent needed; uses existing logs |
| Secrets Manager Rotation | D6 | 3-phase Lambda: createSecret → setSecret → testSecret → finishSecret. Versions: AWSPENDING → AWSCURRENT |
| KMS Envelope Encryption | D6 | Generate data key → encrypt data with data key → encrypt data key with CMK → store both |
DOP-C02 Mock Exam — 100 Questions
Professional-level scenario questions across all 6 domains. Use the explanation button after each answer to deepen your understanding.