ARCH: The NextGen IaC Generator

A Journey from Infrastructure Chaos to Organized Cloud Management

TL;DR: At Transmit Security, we developed ARCH (Automated Reusable Configuration Hierarchy), a comprehensive IaC management system that transforms how we handle infrastructure code across multiple products, environments, and cloud providers. By using a hierarchical YAML structure with templates to generate HCL files, we've reduced setup time by 85%, eliminated configuration drift, and enabled faster team mobility. This article details our journey, implementation, and the real-world impact of standardizing our infrastructure approach.


Let me tell you about the time we almost lost our minds trying to keep track of our cloud infrastructure at Transmit Security.

It started innocently enough. We had one product, a couple of environments, some AWS accounts, and a bit of GCP thrown in. Fast forward five years, and we were drowning in a sea of IaC configurations spread across dozens of product repositories with no clear standards.

"Which module version are we using in production again?" became a daily question. "Why does the staging environment have a different VPC setup than prod?" was another favorite. And my personal nightmare: "Why does this service have different permissions across different environments?"

Sound familiar? I thought so.

After having tons of conflicts that made us create new Cloud resources in several environments and import them to our state, we knew something had to give. That's when we embarked on building ARCH, what I now consider our secret weapon.

Our Automated Reusable Configuration Hierarchy system, a comprehensive Infrastructure as Code management system that has completely transformed how we handle our Infrastructure code.

As a fast growing company with lots of products and environments, working on all major clouds, we needed a better solution for managing our infrastructure and scaling fast. Our identity and security solutions were expanding rapidly across AWS, GCP, and Microsoft Azure, and the traditional approach to infrastructure management simply couldn't keep pace with our growth trajectory. We needed something that would eliminate bottlenecks, standardize deployments, and give our teams the agility to innovate without being bogged down by infrastructure complexities.

The Infrastructure Management Challenge

If you've worked with infrastructure at scale, you know the pain points:

  • Configuration drift between environments
  • Knowledge silos where only certain team members understand certain parts of the infrastructure
  • Inconsistent implementations of similar resources across different projects
  • Tedious, error-prone manual processes for creating new environments or services

Before our current system, our per product repositories were filled with different standards, inconsistent naming conventions, and a significant amount of duplicated and unorganized code. Making a simple change across all environments often meant updating dozens of files, with the constant risk of missing something important.

Our Solution: A Hierarchical, Templated IaC System

After evaluating various approaches, we designed a system that leverages Terragrunt and Gruntwork's boilerplate to generate and maintain HCL files.

Core Concepts

  • Define infrastructure values in a hierarchical and composable YAML structure
  • Use templates to generate the actual HCL configuration files
  • Automate everything with GitHub Actions

The Building Blocks

Central Repository

  • HCL Templates: Organized by module and version
  • default.yaml & globals.yaml: Common configs and latest module versions
  • Reusable workflow: A GitHub Action for rendering templates

Product-Specific Infrastructure Repository

  • Values: YAML files structured by product β†’ cloud (AWS/GCP/Azure) β†’ environment (DEV/STG/PROD) β†’ region β†’ project (Region's main shared project, A specific customer's project, etc.)

Teams work in their own repos, with ARCH generating the right HCL structure behind the scenes.

Managing Terraform Module Compatibility with Version-Aware Templates

Terraform modules evolve over time β€” sometimes with breaking changes that require different inputs, outputs, or structural definitions. To handle this without cluttering our logic or forcing "if-else" chaos in templates, we introduced version-aware HCL templates.

Directory Layout Based on Module Major Version

We structure our templates directory based on the major version of the Terraform module. Each versioned folder contains a complete, valid HCL template that maps to the expected structure of that module version.

templates/ └── bigtable/ β”œβ”€β”€ v0.x/ β”‚ └── bigtable.hcl └── v1.x/ └── bigtable.hcl

ARCH picks the right template automatically based on the module version in your YAML.

How It Works

Our YAML spec (the desired state) includes the desired module version (e.g., v1.2.3). During the rendering phase, the GitHub Action:

  1. Parses the module version.
  2. Extracts the major version (e.g., v1.x from v1.2.3).
  3. Selects the correct template path based on the major version.
  4. Renders the appropriate HCL using the matching template and YAML input.

This approach allows us to support multiple versions of the same module in parallel β€” critical when different environments or products are on different upgrade cycles.

How It All Works Together

The magic happens when we combine these components. Here's the flow:

  1. 🧩 Engineers update YAML values at the correct hierarchy level
  2. πŸ€– GitHub Action merges configs (default β†’ globals β†’ product β†’ cloud β†’ environment β†’ project β†’ region)
  3. πŸ—οΈ HCL templates are rendered using the final merged configuration
  4. πŸ“€ Pull request with the generated HCL is created automatically

This approach means that each service's HCL file contains everything needed for deployment, making the process streamlined and less prone to errors.

Why This Changed Everything

1. Simplified Configuration Management

Define common settings once, override when needed.

Take a look at this folders tree:

values/ β”œβ”€β”€ default.yaml # Default Terraform module versions (Taken from the central repo) β”œβ”€β”€ globals.yaml # Organization-wide settings (Taken from the central repo) β”œβ”€β”€ product.yaml # Product-specific settings β”œβ”€β”€ <cloud>/ β”‚ β”œβ”€β”€ cloud.yaml # Cloud-specific configurations β”‚ β”œβ”€β”€ <environment>/ β”‚ β”‚ β”œβ”€β”€ environment.yaml # Environment settings β”‚ β”‚ β”œβ”€β”€ <project>/ β”‚ β”‚ β”‚ β”œβ”€β”€ project.yaml # Project specific configuration β”‚ β”‚ β”‚ β”œβ”€β”€ regions/ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ region.yaml # Region specific configuration β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ <region>/ β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€values.yaml # Specific configuration

This hierarchy allows us to make broad changes at higher levels while maintaining the flexibility to customize at more specific levels.

2. Consistency Across All Environments

Our pre-built templates ensure that all services are defined consistently, regardless of which team or engineer set them up. This consistency extends across environments, regions, and cloud providers.

For instance, a PostgreSQL database in development will have exactly the configuration structure it needs and will be as close as it can to the one in production, with only the specific values differing based on environment needs.

3. Seamless Scaling

Want to provision a new environment or cloud? Just add a few YAML files. The system takes care of the rest in minutes.

This scalability has been particularly valuable as we've expanded to new regions and added new products, allowing us to maintain a consistent infrastructure approach throughout our growth.

4. Reduced Error Rates

With automation handling the complex parts of infrastructure generation, we've seen a significant decrease in deployment errors. The system ensures that all dependencies are properly defined and that all required values are present before deployment.

5. Centralized place for common configuration

Need to whitelist new IPs of a vendor? Add them once, and they propagate across all products.

6. Eliminating infrastructure drifts

To further reduce the risk of configuration drift and ensure our infrastructure remains up to date with our desired state, we implemented a periodic automation that executes the same infrastructure generation pipeline (generate-infrastructure action) for all relevant environments and opens a Pull Request only if there are changes.

Real Example

Let's take a BigTable template for example, its template will look like that:

terraform { source = "<TERRAFORM_MODULES_URL>?version={{ .gcpBigtable.TFVersion }}" } locals { {{- range .gcpLocals }} {{ .name }} = "{{ .value }}" {{- end }} } include "root" { path = find_in_parent_folders() expose = true } inputs = { project_id = {{ .gcpBigtable.project_id }} cluster_name = "{{ .gcpBigtable.cluster_name }}" zones_list = {{ indent 2 .gcpBigtable.zones_list }} region = "{{ .gcpBigtable.region }}" {{- if hasKey .gcpBigtable "storage_type" }} storage_type = "{{ .gcpBigtable.storage_type }}" {{- end }} {{- if hasKey .gcpBigtable "autoscaling_parameters" }} autoscaling_parameters = {{ indent 2 .gcpBigtable.autoscaling_parameters }} {{- end }} {{- if hasKey .gcpBigtable "bigtable_members_list" }} bigtable_members_list = {{ indent 2 .gcpBigtable.bigtable_members_list }} {{- end }} {{- if hasKey .gcpBigtable "fixed_node_num" }} fixed_node_num = {{ .gcpBigtable.fixed_node_num }} {{- end }} {{- if hasKey .gcpBigtable "kms_rotation_period" }} kms_rotation_period = {{ .gcpBigtable.kms_rotation_period }} {{- end }} }

When merging the yaml files in the hierarchy for product named "product", in our "prod" environment "main" project in "gcp" cloud, located in "us-east1" region and getting file containing the following:

globalLocals: productName: "product" environmentName: "prod" projectName: "main" regionName: "us-east1" gcpBigtable: TFVersion: 1.2.3 project_id: dependency.project.outputs.projects.project_id cluster_name: "{{ .globalLocals.productName }}-{{ .globalLocals.environmentName }}-{{ .globalLocals.projectName }}-{{ .globalLocals.regionName }}" region: "{{ .globalLocals.regionName }}" zones_list: | [ "{{ .globalLocals.regionName }}-a", "{{ .globalLocals.regionName }}-b", "{{ .globalLocals.regionName }}-c" ] storage_type: "SSD" autoscaling_parameters: | { min_nodes_number = 4 max_nodes_number = 500 cpu_target_percent = 35 storage_target_percent = 8192 }

The generated HCL file will look like this:

terraform { source = "<TERRAFORM_MODULES_URL>?version=1.2.3" } locals { product_name = "product" cloud_name = "gcp" env_name = "prod" region_name = "us-east1" project_name = "main" } include "root" { path = find_in_parent_folders() expose = true } inputs = { project_id = dependency.project.outputs.projects.project_id cluster_name = "product-prod-main-us-east1" zones_list = [ "us-east1-a", "us-east1-b", "us-east1-c" ] region = "us-east1" storage_type = "SSD" autoscaling_parameters = { min_nodes_number = 4 max_nodes_number = 500 cpu_target_percent = 35 storage_target_percent = 8192 } }

In all other environments that will use BigTable, the generated HCL will be built with the same standardization, that's how we keep consistency while still preserving the flexibility of every environment to set its needed configuration.

Real-World Impact

We tested ARCH by migrating an old environment and creating a new one.

  • Initial migration? Took time β€” but taught us how to structure everything.
  • Next environment? A single YAML file and we were done.

Results:

  • We cut the time for setting up infrastructure for new projects by roughly 85%. What used to take a working day can take couple of minutes if we configure things correctly.
  • Configuration drift between environments? Pretty much eliminated. When we find a discrepancy, we fix it at the appropriate level in the hierarchy once, and boom β€” it propagates everywhere it needs to.
  • New team members are getting up to speed much faster. Previously, it would have taken a month just to understand our setup.
  • Our SecOps is much happier now that security configurations are applied consistently across all environments.

Bonus Benefits

  1. Team mobility. Now we can move engineers between teams without a massive ramp-up period. Since the infrastructure follows the same patterns across all products, a developer who's been working on Product A can quickly contribute to Product B without learning a whole new infrastructure approach. This has been a game-changer for resource allocation.
  2. Automated module upgrades. We set up a periodic job that updates module versions in our hierarchy, generates the new configurations, and creates PRs. This means we can quickly roll out security patches or new features across our entire infrastructure with minimal manual work.

When a critical vulnerability was announced in one of our cloud providers recently, we were able to patch all affected resources across our entire organization in under a day. Before this system? That would have been weeks of work and plenty of missed instances.

Lessons Learned & Best Practices

Building this system wasn't without challenges. Here are some key lessons we've learned:

The "Define Once" Philosophy

We quickly learned that the single most important practice is our "define once" philosophy. Every value belongs at exactly one level in the hierarchy β€” the highest level where it makes sense.

  • Need to change a default timeout that affects all products? Change it in globals.yaml.
  • Need to update something specific to AWS in all environments? That goes to cloud.yaml file.

This approach means that when we need to make a change that affects multiple services or environments, we make it in one place. Next month, we'll need to update our logging configuration across all products. One change to globals.yaml, one PR, and it will automatically be propagated everywhere. The old way? Probably 10+ PRs, change priority and context of members in every team and days of work.

Self-Contained Definitions Save Sanity

  • No more jumping through files to debug a resource.
  • Everything is in one file.

Automate Everything (Seriously, Everything)

  • Let automation be your best team member.
  • Our GitHub Actions merge, render, validate, and open PRs β€” so we don't have to.

Conclusion

I won't pretend implementing this system was easy. We spent weeks designing the hierarchy, creating the initial templates, and building the automation. There were heated debates about where certain values belonged. And yes, there was pushback from teams comfortable with their existing approaches.

If you're struggling with similar challenges, I'd encourage you to consider a hierarchical approach like ours. Start small, perhaps with just one product or environment, and gradually expand. Focus on building good templates and automation from the beginning. And remember that the goal isn't just consistency for consistency's sake β€” it's about enabling your teams to move faster with confidence.

The cloud infrastructure world isn't getting any simpler. But with the right systems in place, it doesn't have to be chaotic.