Zero to Production: Complete AKS Setup with Terraform and ArgoCD

If you ask ten DevOps engineers to set up a Kubernetes cluster, you’ll get ten different answers. Some will click through the Azure Portal (please don't), some will run a massive bash script, and the brave ones will use Infrastructure as Code (IaC).
But provisioning the cluster is only half the battle. A "production-ready" cluster isn't just a running control plane; it’s a platform that can self-heal, manage its own configurations, and scale without manual intervention.
In this guide, we're building a complete Azure Kubernetes Service (AKS) environment from scratch using Terraform, then implementing GitOps with ArgoCD. You'll walk away with:
- A production-ready AKS cluster provisioned through modular Terraform
- ArgoCD managing your application deployments
- A GitOps workflow that scales from development to production
- Practical patterns you can adapt for real-world projects
This is for mid-level engineers ready to think like senior engineers—where "getting it working" becomes "building systems that teams can operate reliably."
Architecture Overview
Before touching code, let's understand what we're building and why each piece matters.
The Two Layers
Our architecture separates concerns into two distinct layers:
Infrastructure Layer (Terraform): Everything required to run applications—the AKS cluster, networking, node pools, container registry, and identity management. Terraform manages this layer because infrastructure changes less frequently and requires careful change management.
Application Layer (ArgoCD): Your workloads, services, and application configurations. ArgoCD manages this layer because applications change constantly, and you need fast, reliable deployments with easy rollbacks.
Why This Separation Matters
When infrastructure and applications live in the same deployment pipeline, you risk cascading failures. A bad application deploy shouldn't risk your cluster. A cluster upgrade shouldn't redeploy all applications. Separation gives you:
- Independent scaling: Upgrade infrastructure on a quarterly cycle, deploy applications dozens of times daily
- Clear ownership: Platform team owns Terraform, product teams own ArgoCD applications
- Blast radius control: Changes affect only their layer

Prerequisites & Setup
Azure Resources:
- Active Azure subscription
- Service Principal with Contributor access (for Terraform)
- Resource group for Terraform state storage
Part 1: Infrastructure with Terraform
Modular Terraform: The Production Approach
We're building modules because that's how production Terraform works. Modules give you:
- Reusability: Write once, use across dev/staging/prod
- Testing: Test modules independently
- Abstraction: Hide complexity, expose only what matters
Our modules encapsulate best practices so consuming teams don't need to know every AKS detail.
The Networking Module
First, networking. AKS needs a VNet with proper subnet segmentation:
For brevity, I have linked a GitHub repo containing the Terraform code for the modules instead of adding it here.
Consuming the Modules
Now we compose these modules into an environment. Here's the dev environment:
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.75.0"
}
}
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstatedevaks"
container_name = "tfstate"
key = "dev.terraform.tfstate"
}
}
provider "azurerm" {
features {
key_vault {
purge_soft_delete_on_destroy = true
}
}
}
# Resource group
resource "azurerm_resource_group" "main" {
name = var.resource_group_name
location = var.location
tags = local.common_tags
}
# Networking
module "networking" {
source = "../../modules/networking"
vnet_name = "${var.environment}-vnet"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
vnet_address_space = ["10.0.0.0/16"]
aks_subnet_address_prefix = ["10.0.1.0/24"]
appgw_subnet_address_prefix = ["10.0.2.0/24"]
tags = local.common_tags
}
# Container Registry
module "acr" {
source = "../../modules/acr"
acr_name = "${var.environment}aksacr"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku = "Standard"
tags = local.common_tags
}
# AKS Cluster
module "aks" {
source = "../../modules/aks"
cluster_name = "${var.environment}-aks-cluster"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = "${var.environment}-aks"
kubernetes_version = var.kubernetes_version
subnet_id = module.networking.aks_subnet_id
system_node_pool_vm_size = "Standard_D2s_v3"
system_node_pool_min_count = 1
system_node_pool_max_count = 3
user_node_pool_vm_size = "Standard_D4s_v3"
user_node_pool_min_count = 2
user_node_pool_max_count = 10
admin_group_object_ids = var.admin_group_object_ids
tags = local.common_tags
}
# Grant AKS access to ACR
resource "azurerm_role_assignment" "aks_acr_pull" {
principal_id = module.aks.kubelet_identity_object_id
role_definition_name = "AcrPull"
scope = module.acr.acr_id
skip_service_principal_aad_check = true
}
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = "AKS-GitOps-Demo"
}
}environments/dev/terraform.tfvars
environment = "dev"
location = "eastus"
resource_group_name = "dev-aks-rg"
kubernetes_version = "1.34.0"
admin_group_object_ids = ["your-azure-ad-group-id"]Deploying the Infrastructure
cd terraform/environments/dev
# Initialize Terraform
terraform init
# Review the plan
terraform plan
# Apply (creates ~15-20 resources)
terraform apply
# Get AKS credentials
az aks get-credentials \
--resource-group dev-aks-rg \
--name dev-aks-cluster
# Verify
kubectl get nodesPart 2: Bootstrapping ArgoCD via Terraform

The Bootstrap Strategy
Here's the production approach:
Terraform manages:
- AKS cluster
- ArgoCD installation (via Helm provider)
- Initial ArgoCD root application (App of Apps)
ArgoCD manages everything else:
- cert-manager
- nginx-ingress-controller
- external-secrets-operator
- All application deployments
This gives you:
- Repeatable infrastructure: terraform apply creates a fully functional cluster
- GitOps from day one: Everything after bootstrap is Git-driven
- No manual kubectl: Ever.
Installing ArgoCD via Terraform
Add to your AKS module or create a separate bootstrap module:
(Linked in the GitHub repo)
Now terraform apply installs ArgoCD with zero manual commands.
Part 3: App of Apps Pattern - ArgoCD Manages Infrastructure

The App of Apps Pattern
The root application we created points to kubernetes/bootstrap/dev. This directory contains Argo CD applications that install cluster infrastructure.
kubernetes/
├── bootstrap/
│ ├── dev/
│ │ ├── infrastructure.yaml # App that manages infra apps
│ │ └── applications.yaml # App that manages user apps
│ └── prod/
│ ├── infrastructure.yaml
│ └── applications.yaml
├── infrastructure/
│ ├── cert-manager/
│ │ └── application.yaml
│ ├── nginx-ingress/
│ │ └── application.yaml
│ ├── external-secrets/
│ │ └── application.yaml
│ └── monitoring/
│ └── application.yaml
└── applications/
├── dev/
└── prod/Bootstrap Configuration
kubernetes/bootstrap/dev/infrastructure.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: infrastructure
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/your-org/your-repo
targetRevision: main
path: kubernetes/infrastructure
directory:
recurse: true
include: '*/application.yaml'
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueThis Application watches the infrastructure directory and deploys all application.yaml files it finds.
Infrastructure Components
1. cert-manager
kubernetes/infrastructure/cert-manager/application.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://charts.jetstack.io
chart: cert-manager
targetRevision: v1.13.2
helm:
releaseName: cert-manager
values: |
installCRDs: true
global:
leaderElection:
namespace: cert-manager
prometheus:
enabled: true
servicemonitor:
enabled: true
destination:
server: https://kubernetes.default.svc
namespace: cert-manager
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
maxDuration: 3mkubernetes/infrastructure/cert-manager/cluster-issuer.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: devops@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: devops@yourdomain.com
privateKeySecretRef:
name: letsencrypt-staging-key
solvers:
- http01:
ingress:
class: nginx2. NGINX Ingress Controller
kubernetes/infrastructure/nginx-ingress/application.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: nginx-ingress
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://kubernetes.github.io/ingress-nginx
chart: ingress-nginx
targetRevision: 4.8.3
helm:
releaseName: ingress-nginx
values: |
controller:
replicaCount: 2
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
metrics:
enabled: true
serviceMonitor:
enabled: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
config:
use-forwarded-headers: "true"
compute-full-forwarded-for: "true"
use-proxy-protocol: "false"
destination:
server: https://kubernetes.default.svc
namespace: ingress-nginx
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true3. External Secrets Operator
kubernetes/infrastructure/external-secrets/application.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: external-secrets
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://charts.external-secrets.io
chart: external-secrets
targetRevision: 0.9.11
helm:
releaseName: external-secrets
values: |
installCRDs: true
webhook:
port: 9443
certController:
requeueInterval: 5m
serviceMonitor:
enabled: true
destination:
server: https://kubernetes.default.svc
namespace: external-secrets-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=truekubernetes/infrastructure/external-secrets/cluster-secret-store.yaml:
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: azure-keyvault
spec:
provider:
azurekv:
authType: WorkloadIdentity
vaultUrl: https://your-keyvault.vault.azure.net/
serviceAccountRef:
name: external-secrets-sa
namespace: external-secrets-systemHow This Works
Terraform applies: Creates AKS, installs ArgoCD, creates root app
Root app syncs: Deploys infrastructure.yaml and applications.yaml
Infrastructure app syncs: Deploys cert-manager, nginx-ingress, external-secrets
Everything self-heals: ArgoCD watches Git, auto-syncs changes
One terraform apply, fully functional cluster with:
- SSL certificates (cert-manager)
- Ingress routing (nginx)
- Secrets management (external-secrets)
- GitOps (ArgoCD managing itself)
Part 4: External Secrets Operator - Production Secret Management
Why External Secrets Operator?
External Secrets Operator (ESO) pulls secrets from external secret stores (Azure Key Vault, AWS Secrets Manager, HashiCorp Vault) and creates Kubernetes Secrets. This means:
- Single source of truth: Secrets live in Key Vault, not Git
- Rotation: Update in Key Vault, ESO syncs automatically
- Audit: Key Vault logs all secret access
- No sealed-secrets complexity: No encryption/decryption dance
Azure Key Vault Integration
First, ensure your AKS cluster can access Key Vault. We'll use Workload Identity (the modern, production way).
Update your Terraform AKS module to enable Workload Identity:
modules/aks/main.tf - Add to the cluster config:
Create Key Vault via Terraform:
modules/key-vault/main.tf
Configuring External Secrets with Workload Identity
kubernetes/infrastructure/external-secrets/service-account.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets-sa
namespace: external-secrets-system
annotations:
azure.workload.identity/client-id: "${EXTERNAL_SECRETS_CLIENT_ID}"
labels:
azure.workload.identity/use: "true"ArgoCD ApplicationSet (For Multi-Environment)
If you want to parameterize per environment:
kubernetes/bootstrap/applicationset-infrastructure.yaml:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: infrastructure
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/your-org/your-repo
revision: main
directories:
- path: kubernetes/infrastructure/*
template:
metadata:
name: '{{path.basename}}'
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/your-repo
targetRevision: main
path: '{{path}}'
kustomize:
commonAnnotations:
argocd.argoproj.io/sync-wave: '{{path.basename}}'
destination:
server: https://kubernetes.default.svc
namespace: '{{path.basename}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
syncWaves:
- order: 0 # cert-manager first
- order: 1 # nginx-ingress
- order: 2 # external-secretsUsing External Secrets in Applications
Now that ESO is configured, let's use it. First, add secrets to Key Vault:
Azure CLI:
az keyvault secret set \
--vault-name devakskv \
--name database-password \
--value "super-secret-password"Create an ExternalSecret to sync it:
kubernetes/applications/dev/myapp/external-secret.yaml:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-secrets
namespace: dev-myapp
spec:
refreshInterval: 1h # Sync every hour
secretStoreRef:
name: azure-keyvault
kind: ClusterSecretStore
target:
name: myapp-secrets # Name of the K8s Secret to create
creationPolicy: Owner
template:
type: Opaque
metadata:
labels:
app: myapp
data:
# You can transform secrets here
DATABASE_URL: "postgresql://user:{{ .password }}@db.example.com:5432/myapp"
data:
- secretKey: password # Key in the K8s Secret
remoteRef:
key: database-password # Key in Key Vault
- secretKey: api-key
remoteRef:
key: external-api-keyUse it in your Deployment:
kubernetes/applications/dev/myapp/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: dev-myapp
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myacr.azurecr.io/myapp:v1.0.0
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: DATABASE_URL
- name: API_KEY
valueFrom:
secretKeyRef:
name: myapp-secrets
key: api-key
# Or mount as files
volumeMounts:
- name: secrets
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secrets
secret:
secretName: myapp-secretsArgoCD Application for Your App
kubernetes/argocd/applications/myapp-dev.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-dev
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: dev-apps
source:
repoURL: https://github.com/your-org/your-repo
targetRevision: main
path: kubernetes/applications/dev/myapp
destination:
server: https://kubernetes.default.svc
namespace: dev-myapp
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Health assessment
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # Ignore if HPA manages thisComplete GitOps Workflow
1. Developer workflow:
# Make changes to application code
# Build and push image
docker build -t myacr.azurecr.io/myapp:v1.2.4 .
docker push myacr.azurecr.io/myapp:v1.2.4
# Update App set manifest
# Change newTag: v1.2.4
2. ArgoCD detects the change:
Within 3 minutes (configurable), ArgoCD:
- Polls the Git repo
- Detects the commit
- Runs kustomize build on the path
- Compares with cluster state
- Syncs the new image tag
- Kubernetes rolls out the new version
Conclusion: From Zero to Production-Ready
We've built a complete production-grade Kubernetes platform from scratch. Let's recap what you've accomplished:
Infrastructure as Code:
- Modular Terraform configuration for AKS
- Separate node pools for system and application workloads
- Network isolation with Azure CNI
- Azure Container Registry integration
- Key Vault for secrets management with Workload Identity
GitOps Foundation:
- ArgoCD bootstrapped entirely through Terraform
- Zero manual kubectl or helm commands required
- App of Apps pattern for managing cluster infrastructure
- Self-healing, automated sync policies
Production Infrastructure:
- cert-manager for automatic SSL certificate management
- NGINX Ingress Controller with autoscaling
- External Secrets Operator syncing from Azure Key Vault
- Automatic pod restarts on secret rotation with Reloader
Application Deployment:
- Kustomize-based configuration management
- Environment-specific overlays (dev, staging, prod)
- Horizontal Pod Autoscaling
- Complete GitOps workflow from code commit to deployment
What makes this production-grade:
Repeatability: terraform apply creates identical environments every time
Auditability: Every change tracked in Git commits
Self-service: Developers deploy by pushing to Git
No configuration drift: ArgoCD enforces desired state
Secure by default: Workload Identity, RBAC, network policies
The GitHub Repository
All the Terraform modules and Kubernetes manifests referenced in this article are available in the companion repository:
github.com/Ekene-Chris/aks-terraform-argocd
The repository includes:
- Complete Terraform modules (networking, AKS, ACR, Key Vault, ArgoCD bootstrap)
- ArgoCD App of Apps configurations
- External Secrets examples
What's Next?
This foundation gives you everything you need to run applications in production. From here, you can extend with:
Observability - Prometheus, Grafana, and Loki for metrics, dashboards, and logs
Progressive Delivery - Argo Rollouts for blue/green and canary deployments
Policy Enforcement - OPA Gatekeeper or Kyverno for security and compliance policies
Disaster Recovery - Velero for cluster backups and cross-region replication strategies
Cost Optimization - KEDA for event-driven autoscaling and spot instance strategies
We'll cover these in future articles. Each builds on this foundation without requiring architectural changes.
If you're a mid-level engineer looking to operate at a senior level, this is how you think. You build systems that teams can use reliably, not one-off deployments that work "for now." The difference between "it works on my machine" and "it works in production" isn't luck—it's architecture, automation, and discipline. You now have all three.