Skip to main content
GenioCT

YAML-Driven Terraform: Building a Self-Service Infrastructure Catalog

By GenioCT | | 12 min read
Terraform DevOps Azure IaC

In this article

A YAML-driven product catalog sits between application teams and the Terraform codebase, providing a self-service interface with guardrails.

Every platform team hits the same wall. Application teams want cloud resources. They don’t want to learn Terraform. They want to fill in a form (or better, commit a YAML file) and get a working environment. Meanwhile, the platform team wants consistency, governance, and the ability to sleep at night.

The usual answer is Terraform modules. You write reusable modules, document them, and ask teams to use them. This works for a while. Then you end up with 30 slightly different main.tf files across 30 repositories, each one a creative interpretation of your module documentation.

A pattern that works well in larger cloud environments is a catalog-driven provisioning model. Infrastructure intent is captured in YAML. Terraform reads the YAML, transforms it into resource maps, and applies the infrastructure. The platform team controls the provisioning engine. Application teams interact with a catalog contract, whether that’s through YAML files, a portal form, or an API.

The Problem with “Just Use Modules”

Terraform modules are the right building block, but they are the wrong interface for application teams.

Application developers think in YAML, JSON, or environment variables, not in resource blocks and data sources. When teams copy a Terraform root module and modify it, you get drift: different backend configurations, inconsistent naming, forgotten tags. If the platform team reviews every Terraform PR, you become the bottleneck. If you do not, you get surprises in production. And updating a module version across 30 consumers means 30 PRs, 30 plan reviews, and 30 deployment windows.

The YAML-driven approach solves all of these by separating what teams want from how it gets provisioned.

Related reading: Terraform module best practices · Azure CAF Terraform modules

Architecture: How It Works

At a high level, the pattern separates catalog data from provisioning logic, with a thin orchestration layer connecting the two.

The Catalog Layer

Infrastructure intent is defined here. A service registry maps resource types to their configuration, and each type has its own directory with YAML files per environment:

# services.yml - the service registry
networking:
  config_path: resources/networking
secrets:
  config_path: resources/secrets
storage:
  config_path: resources/storage
compute:
  config_path: resources/compute
catalog/
├── services.yml                    # Service registry
└── resources/
    ├── networking/
    │   ├── dev/
    │   │   └── platform.yml        # Dev network config
    │   └── prd/
    │       └── platform.yml        # Prod network config
    ├── secrets/
    │   ├── dev/
    │   │   └── platform.yml
    │   └── prd/
    │       └── platform.yml
    └── compute/
        ├── dev/
        │   └── workstations.yml
        └── prd/
            └── app-servers.yml

The YAML files capture infrastructure intent at a level that is readable by platform consumers, while the provisioning layer maps that intent to provider-specific resources:

# resources/networking/dev/platform.yml
vnets:
  - name: vnet-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    address_spaces:
      - 10.20.0.0/16
    subnets:
      - name: snet-app-dev-01
        address_prefix: 10.20.1.0/24
        nsg: nsg-app-dev-01
        route_table: rt-app-dev-01
        service_endpoints:
          - Microsoft.KeyVault
          - Microsoft.Storage
      - name: snet-data-dev-01
        address_prefix: 10.20.2.0/24
        nsg: nsg-data-dev-01
        delegation:
          name: databricks
          service: Microsoft.Databricks/workspaces

nsgs:
  - name: nsg-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    rules:
      - name: AllowHTTPS
        priority: 100
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_prefix: VirtualNetwork
        destination_ports: ["443"]
      - name: DenyAllInbound
        priority: 4096
        direction: Inbound
        access: Deny
        protocol: "*"
        source_prefix: "*"
        destination_ports: ["*"]

route_tables:
  - name: rt-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    enable_bgp_route_propagation: false
    routes:
      - name: to-firewall
        address_prefix: "0.0.0.0/0"
        next_hop_type: VirtualAppliance
        next_hop_in_ip_address: "192.0.2.10"

Application teams submit this in a pull request. It is easy to read, easy to review, and the schema is well-understood. No HCL knowledge needed.

The Provisioning Layer

The provisioning layer contains one Terraform stack per resource type. Each stack follows the same structure: a locals.tf that loads and transforms the YAML, and a main.tf that creates resources with for_each.

The critical piece is the YAML-to-map transformation:

# Load the service registry and resolve the config file path
locals {
  services    = yamldecode(file("../../catalog/services.yml"))
  config_path = "../../catalog/${local.services[var.stack].config_path}/${var.env}/${var.file}.yml"
  config      = yamldecode(file(local.config_path))
}

# Transform YAML lists into keyed maps for for_each
locals {
  vnets = { for v in try(local.config.vnets, []) : v.name => v }
  nsgs  = { for n in try(local.config.nsgs, [])  : n.name => n }
}

# Flatten nested structures (subnets live inside vnets in YAML)
locals {
  subnets = {
    for s in flatten([
      for v in values(local.vnets) : [
        for sn in try(v.subnets, []) : merge(sn, {
          vnet_name      = v.name
          resource_group = v.resource_group
        })
      ]
    ]) : "${s.vnet_name}/${s.name}" => s
  }
}

Then main.tf uses for_each on these transformed maps:

resource "azurerm_virtual_network" "this" {
  for_each = local.vnets

  name                = each.value.name
  location            = each.value.location
  resource_group_name = each.value.resource_group
  address_space       = each.value.address_spaces

  tags = merge(
    data.azurerm_resource_group.rg[each.value.resource_group].tags,
    try(each.value.tags, {}),
    local.tags
  )
}

resource "azurerm_subnet" "this" {
  for_each = local.subnets

  name                 = each.value.name
  resource_group_name  = each.value.resource_group
  virtual_network_name = azurerm_virtual_network.this[each.value.vnet_name].name
  address_prefixes     = [each.value.address_prefix]

  service_endpoints = try(each.value.service_endpoints, [])
}

No child modules, no abstraction layers. The stack directly creates resources using for_each on the YAML-derived maps. The try() function handles optional fields. Deliberately simple.

Azure docs: Cloud Adoption Framework - Platform automation · Terraform on Azure best practices

The CI/CD Pipeline

The pipeline orchestrates stacks with explicit dependency ordering. A thin wrapper handles the plumbing: loading backend configuration, constructing the config path from the service registry, and calling Terraform with the right variables:

# azure-pipelines.yml (simplified)
parameters:
  - name: run_mode
    type: string
    default: plan
    values: [plan, apply, destroy]
  - name: env
    type: string
  - name: file
    type: string

stages:
  - template: templates/run_stack.yml
    parameters:
      stack: resource_group
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: []

  - template: templates/run_stack.yml
    parameters:
      stack: networking
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [resource_group]

  - template: templates/run_stack.yml
    parameters:
      stack: secrets
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [networking]

  - template: templates/run_stack.yml
    parameters:
      stack: compute
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [secrets]

Each stack stage runs terraform init with the right backend key, then plan or apply depending on the run mode. The state key is derived from the stack, environment, and file name, giving you isolated state per deployment unit.

Azure docs: Azure DevOps Pipelines with Terraform · Environment approvals and gates

From YAML Catalog to API-Backed Catalog

In smaller teams, a pull request with a YAML file is a perfectly good self-service interface. As the platform matures, many teams add an API layer in front of the catalog. Instead of committing configuration directly, users submit a request to a catalog API. The API validates the payload, applies policy checks, stores the approved request in the catalog, and triggers the provisioning workflow. The underlying Terraform stacks remain unchanged; only the intake experience evolves.

A conceptual request might look like:

POST /catalog/requests
{
  "serviceType": "networking",
  "environment": "dev",
  "spec": {
    "name": "app-network",
    "addressSpace": ["10.20.0.0/16"],
    "subnets": [
      { "name": "app", "prefix": "10.20.1.0/24" }
    ]
  }
}

The API validates the request against a schema, the policy layer enforces allowed regions, sizes, and tags, and approved requests are stored as catalog data. The provisioning pipeline reconciles desired state through Terraform, the same way it would with a YAML commit.

This hybrid model keeps YAML as the canonical declarative format, uses an API as the intake and validation layer, and retains Git as the audit trail and execution trigger. Whether your real setup is GitOps-first, API-first, or mixed, the architecture supports all three.

Tools like Backstage fit well as a developer portal front-end for this kind of catalog. Azure API Management can serve as the API gateway with schema validation built in.

Why This Works Better Than Terraform Modules Alone

The key differences with the YAML-driven approach:

The catalog layer is the interface. The provisioning layer is the engine. Different teams own different parts, with different review cycles and release cadences.

Teams that define infrastructure never see a .tf file. They describe what they want, not how to build it. Each stack uses for_each directly on resources, so there’s no module inception and no abstraction layers to debug through.

The YAML or API schema acts as the policy. You can’t order a VM size that isn’t in the catalog. You can’t skip tags. The shape of the request enforces the rules. And every infrastructure change is a diff in git: who changed what, when, and why.

Why Most Platform Catalogs Fail

Before building a catalog, it helps to understand why so many of them end up abandoned.

The most common failure: the catalog becomes a glorified ticketing form. Teams submit YAML or fill in a portal, but someone on the platform team still manually reviews every request, tweaks parameters, and runs terraform apply by hand. The self-service part is an illusion. Provisioning time stays the same, just with extra steps.

Module inconsistency kills trust early. Three teams wrote three VNet modules with different naming conventions, different tag structures, and different opinions on subnet sizing. Application teams pick whichever module they find first. The catalog offers choices that produce inconsistent results.

Policy enforcement happens in Slack instead of in the pipeline. Someone reviews the PR and says “you can’t use that VM size in production” or “you forgot the cost center tag.” If the pipeline does not reject non-compliant requests automatically, the catalog is just a suggestion box.

Too many deployment paths undermine adoption. Some teams use the catalog. Others have their own Terraform repositories. A few still provision through the Azure portal. When three paths coexist, the catalog never reaches critical mass.

Finally, nobody measures whether the catalog actually works. No tracking of adoption rates, provisioning times, or how often teams bypass the catalog entirely. Without a feedback loop, the platform team has no signal on what to improve.

Why Not Terragrunt?

Terragrunt is the obvious question. It solves real problems: DRY backend configuration, dependency management between modules, and generating provider blocks. It works well when those are the main pain points.

But Terragrunt doesn’t solve the interface problem. Application teams still write HCL (or at least terragrunt.hcl). They still need to understand module inputs, variable types, and how include blocks work. Terragrunt makes Terraform easier for Terraform users. It doesn’t make infrastructure provisioning accessible to teams that don’t want to learn Terraform at all.

The same applies to Spacelift, Atlantis, and env0. They’re pipeline orchestrators. They solve “how do I run Terraform safely at scale” but not “how do I let application teams define infrastructure without writing HCL.” Those are different problems. Pipeline orchestrators work well alongside the catalog pattern, but they don’t replace it.

The YAML boundary is the differentiator. Application teams never touch HCL. They describe a VNet, a Key Vault, or a set of VMs. The platform team’s Terraform code reads that description and provisions the resources. If the provisioning engine switched to Pulumi or OpenTofu tomorrow, the catalog interface wouldn’t change.

Patterns That Matter

One stack per resource type

Don’t bundle unrelated resources. A networking stack handles VNets, subnets, NSGs, and route tables, things that are naturally related. But compute and networking go in separate stacks with separate state files and separate pipeline stages.

Use try() liberally for optional fields

The YAML should only contain what needs to vary. Everything else gets a default via Terraform’s try() function:

dns_servers         = try(each.value.dns_servers, [])
enable_bgp          = try(each.value.enable_bgp_route_propagation, true)
private_ip_address  = try(each.value.private_ip_address, null)

This keeps the YAML configs clean. A simple subnet doesn’t need 20 fields.

Flatten nested structures into keyed maps

YAML is hierarchical. Terraform’s for_each wants flat maps. The transformation layer bridges this gap. This pattern works for any parent-child relationship: NSG rules inside NSGs, routes inside route tables, containers inside storage accounts.

The product catalog flow: teams submit YAML orders, the engine validates and provisions, the platform team controls the catalog.

Merge tags from multiple sources

Tags should come from three layers: the resource group (inherited), the YAML config (custom per resource), and the platform (automatic metadata). Merge them in that order so platform tags always win.

Separate state per stack, per environment

Each combination of stack + environment + file gets its own state file. This isolates blast radius. A failed compute deployment doesn’t lock or corrupt networking state.

Getting Started

You don’t need to build the full catalog on day one. Start with:

  1. Pick one resource type. Networking is usually the best first candidate because every environment needs it and it has natural sub-resources (VNets, subnets, NSGs).
  2. Define the schema. Write a sample YAML that describes your current environment. If the YAML is readable by a non-Terraform person, the schema is right.
  3. Write the stack. One transformation layer that converts YAML to maps. One resource file that creates resources with for_each. Keep it flat.
  4. Add a pipeline. Start with plan-only. Add apply after you trust the pattern.
  5. Expand. Add resource types as demand grows. Each new stack follows the same pattern.
  6. Add an API when Git PRs become a bottleneck. Not before.

The teams that succeed with this approach start small, prove the pattern works, and expand organically. The teams that fail try to build a platform for every possible use case before anyone has submitted a single request.

Azure docs: Azure landing zone Terraform modules · Terraform state management

Putting It All Together

A YAML-driven infrastructure catalog isn’t a framework or a product you install. It is a pattern: separating configuration from implementation, with a clear contract between the teams that define infrastructure and the teams that provision it.

The platform team owns the provisioning engine. Application teams own the catalog data. The YAML or API boundary between them is the contract. Whether that boundary is a Git pull request, an API call, or a portal form depends on team maturity and scale.

If your platform team spends more time reviewing copy-pasted Terraform modules than improving the platform, a catalog is the way out.

Related: Bicep vs Terraform: Why We Default to Terraform (and When Bicep Wins) explains our thinking on when to use Terraform and when Bicep is the better choice.

Ready to automate your infrastructure?

From Infrastructure as Code to CI/CD pipelines, we help teams ship faster with confidence and less manual overhead.

Typical engagement: 2-6 weeks depending on scope.
Discuss your automation goals
Share this article

Start with a Platform Health Check

Not sure where to begin? A quick architecture review gives you a clear picture. No obligation.

  • Risk scorecard across identity, network, governance, and security
  • Top 10 issues ranked by impact and effort
  • 30-60-90 day roadmap with quick wins