⚙️ DevOps / Platform Engineer Roadmap
Learn to ship software the way the best teams do. You'll go from Linux basics to running a Kubernetes cluster on AWS — building the entire CI/CD pipeline, infrastructure-as-code, and monitoring stack yourself, one layer at a time.
Adjust pace, depth, and focus based on your experience.
Personalized setup
Choose your experience level and goals before beginning.
Module 1
Linux: The Foundation Under Everything
Module 2
CI/CD Pipelines with GitHub Actions
Module 3
Docker: From "Works on My Machine" to Portable Artifacts
Module 4
AWS Core Services: Your Production Environment
Module 5
Terraform: Infrastructure That Fits in a Git Repo
Module 6
Kubernetes: Deploying & Running Your Application
Module 7
Helm: Packaging Kubernetes for Real Teams
Module 8
Argo CD: GitOps-Driven Delivery
Module 9
Monitoring & Observability: Know Before Your Users Do
Personalized setup
Choose your experience level and goals before beginning.
Module 1
Linux: The Foundation Under Everything
Module 2
CI/CD Pipelines with GitHub Actions
Module 3
Docker: From "Works on My Machine" to Portable Artifacts
Module 4
AWS Core Services: Your Production Environment
Module 5
Terraform: Infrastructure That Fits in a Git Repo
Module 6
Kubernetes: Deploying & Running Your Application
Module 7
Helm: Packaging Kubernetes for Real Teams
Module 8
Argo CD: GitOps-Driven Delivery
Module 9
Monitoring & Observability: Know Before Your Users Do
Capstone Project
What you'll build by the end
You'll build the deployment platform for DeployBot — a sample Node.js + PostgreSQL application. Starting from a blank terminal, you'll containerize the app, set up automated CI/CD with GitHub Actions, provision a full AWS environment with Terraform (VPC, EKS, RDS, ECR), package everything into Helm charts, wire up Argo CD for GitOps delivery, and build a Prometheus + Grafana monitoring stack with real alerting. By the end, pushing to main triggers a fully automated build → test → deploy → monitor pipeline — the same workflow used at companies like Spotify and Shopify.
Full Curriculum
9 modules · 39 topics · 10-14 weeks
Module 1
Linux: The Foundation Under Everything
Every tool in this roadmap runs on Linux. Get confident with the command line, file system, networking, and shell scripting — the skills that separate people who use DevOps tools from people who understand them.
- 1The Shell: Navigation, Pipes, Redirection & the Commands You'll Use Daily
- 2Users, Groups & File Permissions: Who Can Do What
- 3Processes & Systemd: How Linux Runs (and Restarts) Your Software
- 4Networking Fundamentals: Ports, DNS, Firewalls & What Happens When You curl
- 5Shell Scripting: Variables, Loops, Exit Codes & Writing Scripts That Don't Break
Project: Spin up an Ubuntu instance, configure SSH key-based login, create a deploy user with sudo access, write a bash script that checks disk usage, memory, and running services — then schedule it with cron to run every 5 minutes and log output to a file.
Module 2
CI/CD Pipelines with GitHub Actions
Automate everything that happens between a git push and a running deployment. You'll build real pipelines that lint, test, build images, and trigger deployments — not toy examples.
- 1Workflow Anatomy: Triggers, Jobs, Steps & the YAML You'll Write a Lot
- 2Building a Real CI Pipeline: Lint, Test & Fail Fast
- 3Secrets, Environment Variables & OIDC: No More Hardcoded Credentials
- 4Reusable Workflows & Custom Actions: Don't Repeat Yourself Across Repos
Project: Build a GitHub Actions pipeline for DeployBot: on every push, run linting and tests in parallel, build a Docker image with a git SHA tag, push it to Amazon ECR, and post a Slack notification on success or failure. Add a manual approval gate for production deploys.
Module 3
Docker: From "Works on My Machine" to Portable Artifacts
Containers are the unit of deployment in modern infrastructure. Learn to build small, secure, reproducible images — and understand what Docker is actually doing under the hood.
- 1How Containers Work: Namespaces, Cgroups & Why It's Not a VM
- 2Dockerfiles That Don't Suck: Layer Caching, Multi-Stage Builds & .dockerignore
- 3Image Security: Non-Root Users, Distroless Bases & Vulnerability Scanning
- 4Docker Compose: Multi-Service Local Dev That Mirrors Production
Project: Containerize DeployBot: write a multi-stage Dockerfile that builds the Node.js app in one stage and runs it in a distroless image (~50MB). Set up Docker Compose with the app, PostgreSQL, and Redis for local development. Run Trivy to scan the image for vulnerabilities.
Module 4
AWS Core Services: Your Production Environment
Before you can deploy to Kubernetes, you need infrastructure. Understand the AWS building blocks — networking, compute, storage, and IAM — that everything else sits on top of.
- 1VPC Design: Subnets, Route Tables, NAT Gateways & Why Networking Matters
- 2IAM: Roles, Policies, Trust Relationships & the Principle of Least Privilege
- 3ECR & S3: Where Your Images and State Files Live
- 4EKS Overview: How AWS Runs Kubernetes (So You Don't Have To)
Project: Manually set up the DeployBot staging environment in AWS: create a VPC with public and private subnets across two AZs, configure a NAT gateway, set up security groups, create an ECR repository for container images, and create an S3 bucket for Terraform state. Document every step — you'll automate it all with Terraform next.
Module 5
Terraform: Infrastructure That Fits in a Git Repo
Clicking through the AWS console doesn't scale. Learn to define your entire infrastructure as code — version it, review it in PRs, and apply it safely with terraform plan.
- 1Terraform 101: Providers, Resources, State & the Apply/Plan Loop
- 2Variables, Locals, Outputs & Data Sources: Making Config Flexible
- 3Modules: Reusable Infrastructure You'd Actually Share With Your Team
- 4Remote State & Locking: Why terraform.tfstate Should Never Be Local
- 5Terraform in CI: Plan on PR, Apply on Merge, Drift Detection on Schedule
Project: Rewrite everything you built manually in Module 4 as Terraform: create reusable modules for VPC, EKS, and ECR. Use remote state in S3 with DynamoDB locking. Add separate tfvars files for staging and production environments. Run terraform plan in your CI pipeline as a PR check.
Module 6
Kubernetes: Deploying & Running Your Application
Deploy DeployBot to the EKS cluster you provisioned. Learn how Kubernetes schedules, networks, scales, and self-heals your containers — and the resource types that make it all work.
- 1Kubernetes Architecture: What Each Component Does & How Scheduling Works
- 2Pods, Deployments & ReplicaSets: The Core Resource Model
- 3Services & Ingress: Routing Traffic Into Your Cluster
- 4ConfigMaps, Secrets & Resource Limits: Configuring Apps for Production
- 5Rolling Updates, Readiness Probes & Zero-Downtime Deploys
Project: Deploy DeployBot to your EKS cluster: create a Deployment with resource requests/limits, a Service, and an Ingress with TLS via cert-manager. Use ConfigMaps for app config, Secrets for database credentials, and a HorizontalPodAutoscaler that scales based on CPU. Verify a rolling update completes with zero downtime.
Module 7
Helm: Packaging Kubernetes for Real Teams
Raw YAML doesn't scale past one environment. Helm lets you template, version, and share Kubernetes manifests — so staging and production use the same chart with different values.
- 1Why Helm Exists: The Problem with Managing Raw YAML at Scale
- 2Chart Anatomy: Templates, Values, Helpers & the _helpers.tpl Pattern
- 3Templating in Practice: Conditionals, Loops & Per-Environment Overrides
- 4Chart Dependencies & Hooks: Compose Charts and Run Migrations Safely
Project: Convert all DeployBot Kubernetes manifests into a Helm chart. Use templates with conditionals for staging vs. production (e.g., replica count, resource limits, ingress hostname). Add chart dependencies for PostgreSQL using the Bitnami subchart. Run helm template to verify the output and helm test to validate the deployed release.
Module 8
Argo CD: GitOps-Driven Delivery
Stop running kubectl apply from your laptop. Argo CD watches your Git repo and automatically reconciles your cluster to match — if it drifts, it self-heals. This is how mature teams ship.
- 1GitOps Principles: Why the Git Repo Is the Source of Truth
- 2Argo CD Setup: Applications, Projects & Repository Connections
- 3Sync Policies: Auto-Sync, Self-Heal, Prune & Manual Gates
- 4Image Updater & Notifications: Close the Loop from CI to CD
Project: Install Argo CD on your EKS cluster. Create Application resources for DeployBot's staging and production environments pointing at different branches. Configure auto-sync with self-heal and pruning on staging, manual sync with approval on production. Set up an automated image updater so pushing a new image tag triggers a deployment without changing any manifests.
Module 9
Monitoring & Observability: Know Before Your Users Do
A pipeline that deploys without visibility is a liability. Build the monitoring, logging, and alerting stack that lets you sleep at night — and actually debug problems when they happen.
- 1The Three Pillars: Metrics, Logs & Traces — and When You Need Each
- 2Prometheus: Scraping, PromQL & the Queries That Actually Matter
- 3Grafana Dashboards: The RED Method & Building Views Your Team Will Use
- 4Alerting Done Right: Alertmanager, Routing & Writing Alerts That Don't Cry Wolf
Project: Deploy the Prometheus + Grafana stack via Helm to your cluster. Instrument DeployBot with a /metrics endpoint, create a Grafana dashboard with request rate, error rate, latency (RED method), and pod resource usage. Configure alerting rules: alert on >1% error rate, >500ms p95 latency, and pod restarts. Route alerts to a Slack channel via Alertmanager.
Ready to start?
Sign up and AI will personalize this roadmap for your experience level.
Create my learning plan