Back to all articles

Zero to Production: Complete AKS Setup with Terraform and ArgoCD

February 10, 2026
Ekene Chris
Zero to Production: Complete AKS Setup with Terraform and ArgoCD

If you ask ten DevOps engineers to set up a Kubernetes cluster, you’ll get ten different answers. Some will click through the Azure Portal (please don't), some will run a massive bash script, and the brave ones will use Infrastructure as Code (IaC).

But provisioning the cluster is only half the battle. A "production-ready" cluster isn't just a running control plane; it’s a platform that can self-heal, manage its own configurations, and scale without manual intervention.

In this guide, we're building a complete Azure Kubernetes Service (AKS) environment from scratch using Terraform, then implementing GitOps with ArgoCD. You'll walk away with:

  • A production-ready AKS cluster provisioned through modular Terraform
  • ArgoCD managing your application deployments
  • A GitOps workflow that scales from development to production
  • Practical patterns you can adapt for real-world projects

This is for mid-level engineers ready to think like senior engineers—where "getting it working" becomes "building systems that teams can operate reliably."

Architecture Overview

Before touching code, let's understand what we're building and why each piece matters.

The Two Layers

Our architecture separates concerns into two distinct layers:

Infrastructure Layer (Terraform): Everything required to run applications—the AKS cluster, networking, node pools, container registry, and identity management. Terraform manages this layer because infrastructure changes less frequently and requires careful change management.

Application Layer (ArgoCD): Your workloads, services, and application configurations. ArgoCD manages this layer because applications change constantly, and you need fast, reliable deployments with easy rollbacks.

Why This Separation Matters

When infrastructure and applications live in the same deployment pipeline, you risk cascading failures. A bad application deploy shouldn't risk your cluster. A cluster upgrade shouldn't redeploy all applications. Separation gives you:

  • Independent scaling: Upgrade infrastructure on a quarterly cycle, deploy applications dozens of times daily
  • Clear ownership: Platform team owns Terraform, product teams own ArgoCD applications
  • Blast radius control: Changes affect only their layer
Blog image

Prerequisites & Setup

Azure Resources:

  • Active Azure subscription
  • Service Principal with Contributor access (for Terraform)
  • Resource group for Terraform state storage

Part 1: Infrastructure with Terraform

Modular Terraform: The Production Approach

We're building modules because that's how production Terraform works. Modules give you:

  • Reusability: Write once, use across dev/staging/prod
  • Testing: Test modules independently
  • Abstraction: Hide complexity, expose only what matters

Our modules encapsulate best practices so consuming teams don't need to know every AKS detail.

The Networking Module

First, networking. AKS needs a VNet with proper subnet segmentation:

For brevity, I have linked a GitHub repo containing the Terraform code for the modules instead of adding it here.

Consuming the Modules

Now we compose these modules into an environment. Here's the dev environment:

terraform {
  required_version = ">= 1.5.0"
  
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.75.0"
    }
  }
  
  backend "azurerm" {
    resource_group_name  = "terraform-state-rg"
    storage_account_name = "tfstatedevaks"
    container_name       = "tfstate"
    key                  = "dev.terraform.tfstate"
  }
}

provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = true
    }
  }
}

# Resource group
resource "azurerm_resource_group" "main" {
  name     = var.resource_group_name
  location = var.location
  tags     = local.common_tags
}

# Networking
module "networking" {
  source = "../../modules/networking"
  
  vnet_name           = "${var.environment}-vnet"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  
  vnet_address_space           = ["10.0.0.0/16"]
  aks_subnet_address_prefix    = ["10.0.1.0/24"]
  appgw_subnet_address_prefix  = ["10.0.2.0/24"]
  
  tags = local.common_tags
}

# Container Registry
module "acr" {
  source = "../../modules/acr"
  
  acr_name            = "${var.environment}aksacr"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  sku                 = "Standard"
  
  tags = local.common_tags
}

# AKS Cluster
module "aks" {
  source = "../../modules/aks"
  
  cluster_name        = "${var.environment}-aks-cluster"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "${var.environment}-aks"
  kubernetes_version  = var.kubernetes_version
  
  subnet_id = module.networking.aks_subnet_id
  
  system_node_pool_vm_size   = "Standard_D2s_v3"
  system_node_pool_min_count = 1
  system_node_pool_max_count = 3
  
  user_node_pool_vm_size   = "Standard_D4s_v3"
  user_node_pool_min_count = 2
  user_node_pool_max_count = 10
  
  admin_group_object_ids = var.admin_group_object_ids
  
  tags = local.common_tags
}

# Grant AKS access to ACR
resource "azurerm_role_assignment" "aks_acr_pull" {
  principal_id                     = module.aks.kubelet_identity_object_id
  role_definition_name             = "AcrPull"
  scope                            = module.acr.acr_id
  skip_service_principal_aad_check = true
}

locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    Project     = "AKS-GitOps-Demo"
  }
}

environments/dev/terraform.tfvars

environment            = "dev"
location               = "eastus"
resource_group_name    = "dev-aks-rg"
kubernetes_version     = "1.34.0"
admin_group_object_ids = ["your-azure-ad-group-id"]

Deploying the Infrastructure

cd terraform/environments/dev

# Initialize Terraform
terraform init

# Review the plan
terraform plan

# Apply (creates ~15-20 resources)
terraform apply

# Get AKS credentials
az aks get-credentials \
  --resource-group dev-aks-rg \
  --name dev-aks-cluster

# Verify
kubectl get nodes

Part 2: Bootstrapping ArgoCD via Terraform

Blog image

The Bootstrap Strategy

Here's the production approach:

Terraform manages:

  • AKS cluster
  • ArgoCD installation (via Helm provider)
  • Initial ArgoCD root application (App of Apps)

ArgoCD manages everything else:

  • cert-manager
  • nginx-ingress-controller
  • external-secrets-operator
  • All application deployments

This gives you:

  • Repeatable infrastructure: terraform apply creates a fully functional cluster
  • GitOps from day one: Everything after bootstrap is Git-driven
  • No manual kubectl: Ever.

Installing ArgoCD via Terraform

Add to your AKS module or create a separate bootstrap module:

(Linked in the GitHub repo)

Now terraform apply installs ArgoCD with zero manual commands.

Part 3: App of Apps Pattern - ArgoCD Manages Infrastructure

Blog image

The App of Apps Pattern

The root application we created points to kubernetes/bootstrap/dev. This directory contains Argo CD applications that install cluster infrastructure.

kubernetes/
├── bootstrap/
│   ├── dev/
│   │   ├── infrastructure.yaml      # App that manages infra apps
│   │   └── applications.yaml        # App that manages user apps
│   └── prod/
│       ├── infrastructure.yaml
│       └── applications.yaml
├── infrastructure/
│   ├── cert-manager/
│   │   └── application.yaml
│   ├── nginx-ingress/
│   │   └── application.yaml
│   ├── external-secrets/
│   │   └── application.yaml
│   └── monitoring/
│       └── application.yaml
└── applications/
    ├── dev/
    └── prod/

Bootstrap Configuration

kubernetes/bootstrap/dev/infrastructure.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: infrastructure
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  
  source:
    repoURL: https://github.com/your-org/your-repo
    targetRevision: main
    path: kubernetes/infrastructure
    directory:
      recurse: true
      include: '*/application.yaml'
  
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

This Application watches the infrastructure directory and deploys all application.yaml files it finds.

Infrastructure Components

1. cert-manager

kubernetes/infrastructure/cert-manager/application.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cert-manager
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  
  source:
    repoURL: https://charts.jetstack.io
    chart: cert-manager
    targetRevision: v1.13.2
    helm:
      releaseName: cert-manager
      values: |
        installCRDs: true
        global:
          leaderElection:
            namespace: cert-manager
        prometheus:
          enabled: true
          servicemonitor:
            enabled: true
  
  destination:
    server: https://kubernetes.default.svc
    namespace: cert-manager
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration: 5s
        maxDuration: 3m

kubernetes/infrastructure/cert-manager/cluster-issuer.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: devops@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:
        ingress:
          class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: devops@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-staging-key
    solvers:
    - http01:
        ingress:
          class: nginx

2. NGINX Ingress Controller

kubernetes/infrastructure/nginx-ingress/application.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: nginx-ingress
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  
  source:
    repoURL: https://kubernetes.github.io/ingress-nginx
    chart: ingress-nginx
    targetRevision: 4.8.3
    helm:
      releaseName: ingress-nginx
      values: |
        controller:
          replicaCount: 2
          
          service:
            type: LoadBalancer
            annotations:
              service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
          
          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
          
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          
          autoscaling:
            enabled: true
            minReplicas: 2
            maxReplicas: 10
            targetCPUUtilizationPercentage: 80
          
          config:
            use-forwarded-headers: "true"
            compute-full-forwarded-for: "true"
            use-proxy-protocol: "false"
  
  destination:
    server: https://kubernetes.default.svc
    namespace: ingress-nginx
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

3. External Secrets Operator

kubernetes/infrastructure/external-secrets/application.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: external-secrets
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  
  source:
    repoURL: https://charts.external-secrets.io
    chart: external-secrets
    targetRevision: 0.9.11
    helm:
      releaseName: external-secrets
      values: |
        installCRDs: true
        
        webhook:
          port: 9443
        
        certController:
          requeueInterval: 5m
        
        serviceMonitor:
          enabled: true
  
  destination:
    server: https://kubernetes.default.svc
    namespace: external-secrets-system
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

kubernetes/infrastructure/external-secrets/cluster-secret-store.yaml:

apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: azure-keyvault
spec:
  provider:
    azurekv:
      authType: WorkloadIdentity
      vaultUrl: https://your-keyvault.vault.azure.net/
      serviceAccountRef:
        name: external-secrets-sa
        namespace: external-secrets-system

How This Works

Terraform applies: Creates AKS, installs ArgoCD, creates root app

Root app syncs: Deploys infrastructure.yaml and applications.yaml

Infrastructure app syncs: Deploys cert-manager, nginx-ingress, external-secrets

Everything self-heals: ArgoCD watches Git, auto-syncs changes

One terraform apply, fully functional cluster with:

  • SSL certificates (cert-manager)
  • Ingress routing (nginx)
  • Secrets management (external-secrets)
  • GitOps (ArgoCD managing itself)

Part 4: External Secrets Operator - Production Secret Management

Why External Secrets Operator?

External Secrets Operator (ESO) pulls secrets from external secret stores (Azure Key Vault, AWS Secrets Manager, HashiCorp Vault) and creates Kubernetes Secrets. This means:

  • Single source of truth: Secrets live in Key Vault, not Git
  • Rotation: Update in Key Vault, ESO syncs automatically
  • Audit: Key Vault logs all secret access
  • No sealed-secrets complexity: No encryption/decryption dance

Azure Key Vault Integration

First, ensure your AKS cluster can access Key Vault. We'll use Workload Identity (the modern, production way).

Update your Terraform AKS module to enable Workload Identity:

modules/aks/main.tf - Add to the cluster config:

Create Key Vault via Terraform:

modules/key-vault/main.tf

Configuring External Secrets with Workload Identity

kubernetes/infrastructure/external-secrets/service-account.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-secrets-sa
  namespace: external-secrets-system
  annotations:
    azure.workload.identity/client-id: "${EXTERNAL_SECRETS_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"

ArgoCD ApplicationSet (For Multi-Environment)

If you want to parameterize per environment:

kubernetes/bootstrap/applicationset-infrastructure.yaml:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: infrastructure
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/your-org/your-repo
        revision: main
        directories:
          - path: kubernetes/infrastructure/*
  
  template:
    metadata:
      name: '{{path.basename}}'
      namespace: argocd
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/your-repo
        targetRevision: main
        path: '{{path}}'
        kustomize:
          commonAnnotations:
            argocd.argoproj.io/sync-wave: '{{path.basename}}'
      
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{path.basename}}'
      
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
        syncWaves:
          - order: 0  # cert-manager first
          - order: 1  # nginx-ingress
          - order: 2  # external-secrets

Using External Secrets in Applications

Now that ESO is configured, let's use it. First, add secrets to Key Vault:

Azure CLI:

az keyvault secret set \
  --vault-name devakskv \
  --name database-password \
  --value "super-secret-password"

Create an ExternalSecret to sync it:

kubernetes/applications/dev/myapp/external-secret.yaml:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-secrets
  namespace: dev-myapp
spec:
  refreshInterval: 1h  # Sync every hour
  
  secretStoreRef:
    name: azure-keyvault
    kind: ClusterSecretStore
  
  target:
    name: myapp-secrets  # Name of the K8s Secret to create
    creationPolicy: Owner
    template:
      type: Opaque
      metadata:
        labels:
          app: myapp
      data:
        # You can transform secrets here
        DATABASE_URL: "postgresql://user:{{ .password }}@db.example.com:5432/myapp"
  
  data:
    - secretKey: password  # Key in the K8s Secret
      remoteRef:
        key: database-password  # Key in Key Vault
    
    - secretKey: api-key
      remoteRef:
        key: external-api-key

Use it in your Deployment:

kubernetes/applications/dev/myapp/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: dev-myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myacr.azurecr.io/myapp:v1.0.0
        env:
          - name: DATABASE_URL
            valueFrom:
              secretKeyRef:
                name: myapp-secrets
                key: DATABASE_URL
          
          - name: API_KEY
            valueFrom:
              secretKeyRef:
                name: myapp-secrets
                key: api-key
        
        # Or mount as files
        volumeMounts:
          - name: secrets
            mountPath: /etc/secrets
            readOnly: true
      
      volumes:
        - name: secrets
          secret:
            secretName: myapp-secrets

ArgoCD Application for Your App

kubernetes/argocd/applications/myapp-dev.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-dev
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: dev-apps
  
  source:
    repoURL: https://github.com/your-org/your-repo
    targetRevision: main
    path: kubernetes/applications/dev/myapp
  
  destination:
    server: https://kubernetes.default.svc
    namespace: dev-myapp
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
    
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
  
  # Health assessment
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas  # Ignore if HPA manages this

Complete GitOps Workflow

1. Developer workflow:

# Make changes to application code

# Build and push image
docker build -t myacr.azurecr.io/myapp:v1.2.4 .
docker push myacr.azurecr.io/myapp:v1.2.4

# Update App set manifest
# Change newTag: v1.2.4

2. ArgoCD detects the change:

Within 3 minutes (configurable), ArgoCD:

  • Polls the Git repo
  • Detects the commit
  • Runs kustomize build on the path
  • Compares with cluster state
  • Syncs the new image tag
  • Kubernetes rolls out the new version

Conclusion: From Zero to Production-Ready

We've built a complete production-grade Kubernetes platform from scratch. Let's recap what you've accomplished:

Infrastructure as Code:

  • Modular Terraform configuration for AKS
  • Separate node pools for system and application workloads
  • Network isolation with Azure CNI
  • Azure Container Registry integration
  • Key Vault for secrets management with Workload Identity

GitOps Foundation:

  • ArgoCD bootstrapped entirely through Terraform
  • Zero manual kubectl or helm commands required
  • App of Apps pattern for managing cluster infrastructure
  • Self-healing, automated sync policies

Production Infrastructure:

  • cert-manager for automatic SSL certificate management
  • NGINX Ingress Controller with autoscaling
  • External Secrets Operator syncing from Azure Key Vault
  • Automatic pod restarts on secret rotation with Reloader

Application Deployment:

  • Kustomize-based configuration management
  • Environment-specific overlays (dev, staging, prod)
  • Horizontal Pod Autoscaling
  • Complete GitOps workflow from code commit to deployment

What makes this production-grade:

Repeatability: terraform apply creates identical environments every time

Auditability: Every change tracked in Git commits

Self-service: Developers deploy by pushing to Git

No configuration drift: ArgoCD enforces desired state

Secure by default: Workload Identity, RBAC, network policies

The GitHub Repository

All the Terraform modules and Kubernetes manifests referenced in this article are available in the companion repository:

github.com/Ekene-Chris/aks-terraform-argocd

The repository includes:

  • Complete Terraform modules (networking, AKS, ACR, Key Vault, ArgoCD bootstrap)
  • ArgoCD App of Apps configurations
  • External Secrets examples

What's Next?

This foundation gives you everything you need to run applications in production. From here, you can extend with:

Observability - Prometheus, Grafana, and Loki for metrics, dashboards, and logs

Progressive Delivery - Argo Rollouts for blue/green and canary deployments

Policy Enforcement - OPA Gatekeeper or Kyverno for security and compliance policies

Disaster Recovery - Velero for cluster backups and cross-region replication strategies

Cost Optimization - KEDA for event-driven autoscaling and spot instance strategies

We'll cover these in future articles. Each builds on this foundation without requiring architectural changes.

If you're a mid-level engineer looking to operate at a senior level, this is how you think. You build systems that teams can use reliably, not one-off deployments that work "for now." The difference between "it works on my machine" and "it works in production" isn't luck—it's architecture, automation, and discipline. You now have all three.

About Ekene Chris

You Might Also Like