YAML-Driven Terraform: Building a Self-Service Infrastructure Catalog

By GenioCT | Published on 18 December 2025 | 12 min read

Terraform DevOps Azure IaC

In this article

The Problem with “Just Use Modules”
Architecture: How It Works
The Catalog Layer
The Provisioning Layer
The CI/CD Pipeline
From YAML Catalog to API-Backed Catalog
Why This Works Better Than Terraform Modules Alone
Why Most Platform Catalogs Fail
Why Not Terragrunt?
Patterns That Matter
One stack per resource type
Use try() liberally for optional fields
Flatten nested structures into keyed maps
Merge tags from multiple sources
Separate state per stack, per environment
Getting Started
Putting It All Together

A vintage paint-store colour sample rack with a grid of pulled and stocked cards above a counter with a clipboard — the metaphor for a YAML catalog where standardised options are picked from a curated rack rather than mixed by hand each time.

In short: Terraform modules are the right building block for cloud infrastructure but the wrong interface for application teams. The pattern that scales is a catalog-driven provisioning model: YAML files capture intent, one Terraform stack per resource type maps that intent to resources via for_each, and the platform team owns the engine while application teams own the catalog data. This post walks through the catalog layer, the provisioning layer, the CI/CD pipeline, and the patterns that keep the model maintainable as it grows.

Every platform team hits the same wall. Application teams want cloud resources. They don’t want to learn Terraform. They want to fill in a form (or better, commit a YAML file) and get a working environment. Meanwhile, the platform team wants consistency, governance, and the ability to sleep at night.

The usual answer is Terraform modules. You write reusable modules, document them, and ask teams to use them. This works for a while. Then you end up with 30 slightly different main.tf files across 30 repositories, each one a creative interpretation of your module documentation.

A pattern that works well in larger cloud environments is a catalog-driven provisioning model. Infrastructure intent is captured in YAML. Terraform reads the YAML, transforms it into resource maps, and applies the infrastructure. The platform team controls the provisioning engine. Application teams interact with a catalog contract, whether that’s through YAML files, a portal form, or an API.

The Problem with “Just Use Modules”

Terraform modules are the right building block, but they are the wrong interface for application teams.

Application developers think in YAML, JSON, or environment variables rather than resource blocks and data sources. When teams copy a Terraform root module and modify it, you get drift: different backend configurations, inconsistent naming, forgotten tags. If the platform team reviews every Terraform PR, you become the bottleneck. If you do not, you get surprises in production. And updating a module version across 30 consumers means 30 PRs, 30 plan reviews, and 30 deployment windows.

The YAML-driven approach solves all of these by separating what teams want from how it gets provisioned.

Related reading: Terraform module best practices · Azure CAF Terraform modules

Architecture: How It Works

At a high level, the pattern separates catalog data from provisioning logic, with a thin orchestration layer connecting the two.

The Catalog Layer

Infrastructure intent is defined here. A service registry maps resource types to their configuration, and each type has its own directory with YAML files per environment:

# services.yml - the service registry
networking:
  config_path: resources/networking
secrets:
  config_path: resources/secrets
storage:
  config_path: resources/storage
compute:
  config_path: resources/compute

catalog/
├── services.yml                    # Service registry
└── resources/
    ├── networking/
    │   ├── dev/
    │   │   └── platform.yml        # Dev network config
    │   └── prd/
    │       └── platform.yml        # Prod network config
    ├── secrets/
    │   ├── dev/
    │   │   └── platform.yml
    │   └── prd/
    │       └── platform.yml
    └── compute/
        ├── dev/
        │   └── workstations.yml
        └── prd/
            └── app-servers.yml

The YAML files capture infrastructure intent at a level that is readable by platform consumers, while the provisioning layer maps that intent to provider-specific resources:

# resources/networking/dev/platform.yml
vnets:
  - name: vnet-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    address_spaces:
      - 10.20.0.0/16
    subnets:
      - name: snet-app-dev-01
        address_prefix: 10.20.1.0/24
        nsg: nsg-app-dev-01
        route_table: rt-app-dev-01
        service_endpoints:
          - Microsoft.KeyVault
          - Microsoft.Storage
      - name: snet-data-dev-01
        address_prefix: 10.20.2.0/24
        nsg: nsg-data-dev-01
        delegation:
          name: databricks
          service: Microsoft.Databricks/workspaces

nsgs:
  - name: nsg-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    rules:
      - name: AllowHTTPS
        priority: 100
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_prefix: VirtualNetwork
        destination_ports: ["443"]
      - name: DenyAllInbound
        priority: 4096
        direction: Inbound
        access: Deny
        protocol: "*"
        source_prefix: "*"
        destination_ports: ["*"]

route_tables:
  - name: rt-app-dev-01
    resource_group: rg-network-dev-01
    location: westeurope
    enable_bgp_route_propagation: false
    routes:
      - name: to-firewall
        address_prefix: "0.0.0.0/0"
        next_hop_type: VirtualAppliance
        next_hop_in_ip_address: "192.0.2.10"

Application teams submit this in a pull request. It is easy to read, easy to review, and the schema is well-understood. No HCL knowledge needed.

The Provisioning Layer

The provisioning layer contains one Terraform stack per resource type. Each stack follows the same structure: a locals.tf that loads and transforms the YAML, and a main.tf that creates resources with for_each.

The critical piece is the YAML-to-map transformation:

# Load the service registry and resolve the config file path
locals {
  services    = yamldecode(file("../../catalog/services.yml"))
  config_path = "../../catalog/${local.services[var.stack].config_path}/${var.env}/${var.file}.yml"
  config      = yamldecode(file(local.config_path))
}

# Transform YAML lists into keyed maps for for_each
locals {
  vnets = { for v in try(local.config.vnets, []) : v.name => v }
  nsgs  = { for n in try(local.config.nsgs, [])  : n.name => n }
}

# Flatten nested structures (subnets live inside vnets in YAML)
locals {
  subnets = {
    for s in flatten([
      for v in values(local.vnets) : [
        for sn in try(v.subnets, []) : merge(sn, {
          vnet_name      = v.name
          resource_group = v.resource_group
        })
      ]
    ]) : "${s.vnet_name}/${s.name}" => s
  }
}

Then main.tf uses for_each on these transformed maps:

resource "azurerm_virtual_network" "this" {
  for_each = local.vnets

  name                = each.value.name
  location            = each.value.location
  resource_group_name = each.value.resource_group
  address_space       = each.value.address_spaces

  tags = merge(
    data.azurerm_resource_group.rg[each.value.resource_group].tags,
    try(each.value.tags, {}),
    local.tags
  )
}

resource "azurerm_subnet" "this" {
  for_each = local.subnets

  name                 = each.value.name
  resource_group_name  = each.value.resource_group
  virtual_network_name = azurerm_virtual_network.this[each.value.vnet_name].name
  address_prefixes     = [each.value.address_prefix]

  service_endpoints = try(each.value.service_endpoints, [])
}

No child modules, no abstraction layers. The stack directly creates resources using for_each on the YAML-derived maps. The try() function handles optional fields. Deliberately simple.

Azure docs: Cloud Adoption Framework - Platform automation · Terraform on Azure best practices

The CI/CD Pipeline

The pipeline orchestrates stacks with explicit dependency ordering. A thin wrapper handles the plumbing: loading backend configuration, constructing the config path from the service registry, and calling Terraform with the right variables:

# azure-pipelines.yml (simplified)
parameters:
  - name: run_mode
    type: string
    default: plan
    values: [plan, apply, destroy]
  - name: env
    type: string
  - name: file
    type: string

stages:
  - template: templates/run_stack.yml
    parameters:
      stack: resource_group
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: []

  - template: templates/run_stack.yml
    parameters:
      stack: networking
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [resource_group]

  - template: templates/run_stack.yml
    parameters:
      stack: secrets
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [networking]

  - template: templates/run_stack.yml
    parameters:
      stack: compute
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [secrets]

Each stack stage runs terraform init with the right backend key, then plan or apply depending on the run mode. The state key is derived from the stack, environment, and file name, giving you isolated state per deployment unit.

Azure docs: Azure DevOps Pipelines with Terraform · Environment approvals and gates

From YAML Catalog to API-Backed Catalog

In smaller teams, a pull request with a YAML file is a perfectly good self-service interface. As the platform matures, many teams add an API layer in front of the catalog. Instead of committing configuration directly, users submit a request to a catalog API. The API validates the payload, applies policy checks, stores the approved request in the catalog, and triggers the provisioning workflow. The underlying Terraform stacks remain unchanged; only the intake experience evolves.

A conceptual request might look like:

POST /catalog/requests
{
  "serviceType": "networking",
  "environment": "dev",
  "spec": {
    "name": "app-network",
    "addressSpace": ["10.20.0.0/16"],
    "subnets": [
      { "name": "app", "prefix": "10.20.1.0/24" }
    ]
  }
}

The API validates the request against a schema, the policy layer enforces allowed regions, sizes, and tags, and approved requests are stored as catalog data. The provisioning pipeline reconciles desired state through Terraform, the same way it would with a YAML commit.

This hybrid model keeps YAML as the canonical declarative format, uses an API as the intake and validation layer, and retains Git as the audit trail and execution trigger. Whether your real setup is GitOps-first, API-first, or mixed, the architecture supports all three.

Tools like Backstage fit well as a developer portal front-end for this kind of catalog. Azure API Management can serve as the API gateway with schema validation built in.

Why This Works Better Than Terraform Modules Alone

The key differences with the YAML-driven approach:

The catalog layer is the interface. The provisioning layer is the engine. Different teams own different parts, with different review cycles and release cadences.

Teams that define infrastructure never see a .tf file. They describe what they want. Each stack uses for_each directly on resources, so there’s no module inception and no abstraction layers to debug through.

The YAML or API schema acts as the policy. You can’t order a VM size that isn’t in the catalog. You can’t skip tags. The shape of the request enforces the rules. And every infrastructure change is a diff in git: who changed what, when, and why.

Why Most Platform Catalogs Fail

Before building a catalog, it helps to understand why so many of them end up abandoned.

The most common failure: the catalog becomes a glorified ticketing form. Teams submit YAML or fill in a portal, but someone on the platform team still manually reviews every request, tweaks parameters, and runs terraform apply by hand. The self-service part is an illusion. Provisioning time stays the same, just with extra steps.

Module inconsistency kills trust early. Three teams wrote three VNet modules with different naming conventions, different tag structures, and different opinions on subnet sizing. Application teams pick whichever module they find first. The catalog offers choices that produce inconsistent results.

Policy enforcement happens in Slack instead of in the pipeline. Someone reviews the PR and says “you can’t use that VM size in production” or “you forgot the cost center tag.” If the pipeline does not reject non-compliant requests automatically, the catalog is just a suggestion box.

Too many deployment paths undermine adoption. Some teams use the catalog. Others have their own Terraform repositories. A few still provision through the Azure portal. When three paths coexist, the catalog never reaches critical mass.

Finally, nobody measures whether the catalog actually works. No tracking of adoption rates, provisioning times, or how often teams bypass the catalog entirely. Without a feedback loop, the platform team has no signal on what to improve.

Why Not Terragrunt?

Terragrunt is the obvious question. It solves real problems: DRY backend configuration, dependency management between modules, and generating provider blocks. It works well when those are the main pain points.

But Terragrunt doesn’t solve the interface problem. Application teams still write HCL (or at least terragrunt.hcl). They still need to understand module inputs, variable types, and how include blocks work. Terragrunt makes Terraform easier for Terraform users. It doesn’t make infrastructure provisioning accessible to teams that don’t want to learn Terraform at all.

The same applies to Spacelift, Atlantis, and env0. They’re pipeline orchestrators. They solve “how do I run Terraform safely at scale” but not “how do I let application teams define infrastructure without writing HCL.” Those are different problems. Pipeline orchestrators work well alongside the catalog pattern, but they don’t replace it.

The YAML boundary is the differentiator. Application teams never touch HCL. They describe a VNet, a Key Vault, or a set of VMs. The platform team’s Terraform code reads that description and provisions the resources. If the provisioning engine switched to Pulumi or OpenTofu tomorrow, the catalog interface wouldn’t change.

Patterns That Matter

One stack per resource type

Don’t bundle unrelated resources. A networking stack handles VNets, subnets, NSGs, and route tables, things that are naturally related. But compute and networking go in separate stacks with separate state files and separate pipeline stages.

Use `try()` liberally for optional fields

The YAML should only contain what needs to vary. Everything else gets a default via Terraform’s try() function:

dns_servers         = try(each.value.dns_servers, [])
enable_bgp          = try(each.value.enable_bgp_route_propagation, true)
private_ip_address  = try(each.value.private_ip_address, null)

This keeps the YAML configs clean. A simple subnet doesn’t need 20 fields.

Flatten nested structures into keyed maps

YAML is hierarchical. Terraform’s for_each wants flat maps. The transformation layer bridges this gap. This pattern works for any parent-child relationship: NSG rules inside NSGs, routes inside route tables, containers inside storage accounts.

The product catalog flow: teams submit YAML orders, the engine validates and provisions, the platform team controls the catalog.

Merge tags from multiple sources

Tags should come from three layers: the resource group (inherited), the YAML config (custom per resource), and the platform (automatic metadata). Merge them in that order so platform tags always win.

Separate state per stack, per environment

Each combination of stack + environment + file gets its own state file. This isolates blast radius. A failed compute deployment doesn’t lock or corrupt networking state.

Getting Started

You don’t need to build the full catalog on day one. Start with:

Pick one resource type. Networking is usually the best first candidate because every environment needs it and it has natural sub-resources (VNets, subnets, NSGs).
Define the schema. Write a sample YAML that describes your current environment. If the YAML is readable by a non-Terraform person, the schema is right.
Write the stack. One transformation layer that converts YAML to maps. One resource file that creates resources with for_each. Keep it flat.
Add a pipeline. Start with plan-only. Add apply after you trust the pattern.
Expand. Add resource types as demand grows. Each new stack follows the same pattern.
Add an API when Git PRs become a bottleneck. Not before.

The teams that succeed with this approach start small, prove the pattern works, and expand organically. The teams that fail try to build a platform for every possible use case before anyone has submitted a single request.

Azure docs: Azure landing zone Terraform modules · Terraform state management

Putting It All Together

A YAML-driven infrastructure catalog isn’t a framework or a product you install. It is a pattern: separating configuration from implementation, with a clear contract between the teams that define infrastructure and the teams that provision it.

The platform team owns the provisioning engine. Application teams own the catalog data. The YAML or API boundary between them is the contract. Whether that boundary is a Git pull request, an API call, or a portal form depends on team maturity and scale.

If your platform team spends more time reviewing copy-pasted Terraform modules than improving the platform, a catalog is the way out.

Related: Bicep vs Terraform: Why We Default to Terraform (and When Bicep Wins) explains our thinking on when to use Terraform and when Bicep is the better choice.

Ready to automate your infrastructure?

From Infrastructure as Code to CI/CD pipelines, we help teams ship faster with confidence and less manual overhead.

Typical engagement: 2-6 weeks depending on scope.

Discuss your automation goals

Share this article

YAML-Driven Terraform: Building a Self-Service Infrastructure Catalog

The Problem with “Just Use Modules”

Architecture: How It Works

The Catalog Layer

The Provisioning Layer

The CI/CD Pipeline

From YAML Catalog to API-Backed Catalog

Why This Works Better Than Terraform Modules Alone

Why Most Platform Catalogs Fail

Why Not Terragrunt?

Patterns That Matter

One stack per resource type

Use `try()` liberally for optional fields

Flatten nested structures into keyed maps

Merge tags from multiple sources

Separate state per stack, per environment

Getting Started

Putting It All Together

Ready to automate your infrastructure?

More from the blog

Terraform AzureRM 4.0: What Breaks and How to Migrate

Azure APIM v2 vs Classic: What Changed and What Breaks

Azure Verified Modules: Microsoft's Answer to the Terraform Module Mess

Start with a Governator-powered Azure Health Check

YAML-Driven Terraform: Building a Self-Service Infrastructure Catalog

The Problem with “Just Use Modules”

Architecture: How It Works

The Catalog Layer

The Provisioning Layer

The CI/CD Pipeline

From YAML Catalog to API-Backed Catalog

Why This Works Better Than Terraform Modules Alone

Why Most Platform Catalogs Fail

Why Not Terragrunt?

Patterns That Matter

One stack per resource type

Use try() liberally for optional fields

Flatten nested structures into keyed maps

Merge tags from multiple sources

Separate state per stack, per environment

Getting Started

Putting It All Together

Ready to automate your infrastructure?

More from the blog

Terraform AzureRM 4.0: What Breaks and How to Migrate

Azure APIM v2 vs Classic: What Changed and What Breaks

Azure Verified Modules: Microsoft's Answer to the Terraform Module Mess

Start with a Governator-powered Azure Health Check

Use `try()` liberally for optional fields