Skip to main content

Command Palette

Search for a command to run...

Accelerating Infrastructure as Code Optimization with AI: A Practitioner's Journey with Amazon Q Developer

Updated
Accelerating Infrastructure as Code Optimization with AI: A Practitioner's Journey with Amazon Q Developer

Introduction

I've been working with Infrastructure as Code for the better part of eight years—starting with CloudFormation, migrating teams to Terraform, and lately exploring AWS CDK. Over that time, I've seen platforms grow from a handful of templates to hundreds of modules scattered across dozens of repositories. I've also watched technical debt accumulate: legacy EC2 instance types chosen three years ago, untagged resources, container images piling up in ECR, and NAT gateways draining budgets while sitting mostly idle.

The traditional FinOps workflow—reactive hunting for idle resources using cost optimization hubs and billing alerts—works, but it's exhausting and slow. I wanted to shift left: catch inefficiencies before they hit production, bake cost and security best practices into the templates themselves, and help platform engineers understand inherited code without spending days spelunking through thousands of lines.

This article documents how I've integrated Amazon Q Developer into my IaC workflow—not as a replacement for human judgment, but as a force multiplier. I'll walk through real scenarios, concrete examples, workflow integration, limitations I've encountered, and a practical framework for measuring impact.

The Pain Points of Traditional IaC Management

Before introducing AI assistance, my team faced several recurring bottlenecks:

Legacy comprehension: Inheriting a 2,000-line Terraform module written by someone who left the company two years ago. No README. Cryptic variable names. Comments? Optional, apparently. Understanding what it does, how components interact, and where optimization opportunities exist consumed days of calendar time.

Migration friction: Translating a CloudFormation template to CDK or Terraform—or vice versa—is tedious and error-prone. Even straightforward resources involve syntax mapping, API differences, and validation loops. Multiply that by dozens of modules, and migration projects drag on for quarters.

Review latency: Pull requests with IaC changes sat in queues waiting for someone with enough context to spot that the new RDS instance lacks encryption, or that the NAT gateway could be replaced with VPC endpoints, or that the instance type is three generations old.

Standardization gaps: Every engineer writes modules slightly differently. Some include lifecycle policies; others don't. Tagging strategies diverge. IAM policies are either too permissive or so locked down they break deployments.

Security and cost blind spots: Static analysis tools (tfsec, Checkov) catch obvious mistakes, but they don't suggest improvements. They tell you what's wrong, not what could be better. Cost estimation tools (Infracost) show projected spend, but they don't recommend Graviton instances or Spot for batch workloads.

Onboarding friction: New hires need weeks to become productive with our IaC codebase. The learning curve is steep, and tribal knowledge is poorly documented.

How Amazon Q Developer Fits In

Amazon Q Developer is an AI-powered coding assistant built on over 17 years of AWS cloud experience. It integrates directly into VS Code, JetBrains IDEs, and provides CLI capabilities for automated transformations. It generates deployment-ready infrastructure code for Terraform, AWS CDK, and CloudFormation.

I use it for:

  • Code comprehension: Summarizing what a template does, mapping resource dependencies, identifying entry points.

  • Optimization discovery: Scanning templates for cost, security, and performance improvements aligned with AWS Well-Architected Framework.

  • IaC transformation: Automated translation between IaC frameworks (Terraform ↔ CDK ↔ CloudFormation) using the four-step process: assess, translate, test and refine, deploy.

  • Module generation: Creating deployment-ready modules from natural language requirements with built-in AWS best practices.

  • Pull request reviews: Analyzing diffs, flagging risks, suggesting improvements based on AWS standards.

  • Custom rule enforcement: Using rule-based automation to encode team standards and ensure consistent, repeatable suggestions.

According to AWS internal testing, Amazon Q's agentic capabilities deliver 10x-50x time savings for legacy IaC remediation compared to manual processes. For VMware network migrations, AWS teams translated configurations for 500 VMs in 1 hour—80 times faster than the traditional 2-week manual approach.

I treat Q as a highly skilled junior engineer: fast, knowledgeable, but requiring validation and context.

End-to-End Workflow Integration

Here's how Amazon Q fits into my current IaC lifecycle:

1. Local Development (VS Code + Amazon Q)

  • Open a CDK stack or Terraform module

  • Prompt Q: "Review this file and identify opportunities to optimize for cost efficiency"

  • Q returns recommendations: instance type downsizing, ECR lifecycle policies, Graviton migration paths, NAT gateway elimination, subnet configuration changes

  • I validate recommendations against workload requirements, commitments, and architectural constraints

  • Implement approved changes with Q's assistance (it can write the code inline)

2. Static Analysis

  • Run Checkov, tfsec, or cfn-lint locally

  • If violations appear, I prompt Q: "Fix the security issues flagged by Checkov in this file"

  • Q suggests remediation (e.g., enable encryption, add bucket policies, restrict ingress rules)

3. Policy-as-Code Validation

  • Apply OPA/Conftest or CloudFormation Guard policies

  • For failures, I ask Q to explain the policy intent and adjust the template accordingly

  • Example policy (Rego):

      package terraform.tags
      deny[msg] {
        input.resource_type == "aws_instance"
        not input.tags.Environment
        msg = "Missing required Environment tag"
      }
    

4. Cost Estimation

  • Run Infracost to project monthly spend

  • If costs are higher than expected, I prompt Q: "Suggest ways to reduce cost for this infrastructure while maintaining performance"

  • Q might recommend reserved capacity, Savings Plans eligibility, or Graviton alternatives

5. CI/CD Pipeline Gates

  • Pre-commit hooks run formatters (terraform fmt, prettier)

  • GitHub Actions execute tests, static analysis, policy checks, Infracost diff

  • If checks fail, the pipeline surfaces Q-generated suggestions in PR comments (I've scripted this using Q's API)

6. Pull Request Review

  • Reviewers use Q to summarize changes: "Explain what this PR changes and flag any cost, security, or operational risks"

  • Q highlights: "This PR introduces a new RDS instance without backup retention configured. Estimated monthly cost increase: $150. Consider enabling automated backups and Multi-AZ for production."

7. Post-Deployment

  • Monitor drift using AWS Config or Terraform state comparisons

  • Track cost variance via Cost Explorer

  • Feed learnings back into Q prompts for future iterations

Practical Example 1: Generating a Terraform Module from Requirements

Scenario: I need a reusable Terraform module for an S3 bucket with versioning, lifecycle rules, encryption, and least-privilege IAM access.

Prompt:

Create a Terraform module for an S3 bucket with:
- Versioning enabled
- Lifecycle policy: expire non-current versions after 90 days
- AES-256 encryption at rest
- Block public access
- IAM policy allowing read-only access for a Lambda execution role

Q Output (trimmed):

resource "aws_s3_bucket" "main" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_versioning" "main" {
  bucket = aws_s3_bucket.main.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "main" {
  bucket = aws_s3_bucket.main.id
  rule {
    id     = "expire-noncurrent"
    status = "Enabled"
    noncurrent_version_expiration {
      noncurrent_days = 90
    }
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "main" {
  bucket = aws_s3_bucket.main.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "main" {
  bucket                  = aws_s3_bucket.main.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Outcome: In 30 seconds, I have a production-ready module skeleton. I add variables, outputs, and documentation, then commit. Time saved: 45 minutes.

Practical Example 2: Refactoring Terraform to CDK

Scenario: A team wants to migrate a Terraform module to CDK (TypeScript) for better IDE support and type safety.

Original Terraform (simplified):

resource "aws_lambda_function" "processor" {
  function_name = "data-processor"
  runtime       = "python3.9"
  handler       = "index.handler"
  role          = aws_iam_role.lambda_role.arn
  filename      = "function.zip"
}

resource "aws_iam_role" "lambda_role" {
  name = "lambda-exec-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

Prompt:

Translate this Terraform configuration to AWS CDK TypeScript.
Use best practices: managed policies, environment variables, and proper construct hierarchy.

Q Output (trimmed):

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as iam from 'aws-cdk-lib/aws-iam';

export class DataProcessorStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const lambdaRole = new iam.Role(this, 'LambdaRole', {
      assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
      managedPolicies: [
        iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSLambdaBasicExecutionRole')
      ]
    });

    const processor = new lambda.Function(this, 'Processor', {
      runtime: lambda.Runtime.PYTHON_3_9,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('function.zip'),
      role: lambdaRole,
      functionName: 'data-processor'
    });
  }
}

Outcome: Migration skeleton in under a minute. I adjust asset paths, add CloudWatch alarms, refine IAM permissions. Time saved per module: 1-2 hours.

Practical Example 3: Pull Request Review with Amazon Q

Scenario: A PR introduces a new ECS service with Fargate tasks. Reviewer uses Q to assess impact.

Prompt:

Review this CDK stack diff. Summarize changes, identify cost implications,
flag security or operational risks.

Q Analysis:

This PR creates:

  • ECS Fargate service with 2 tasks (2 vCPU, 4GB RAM each)

  • Application Load Balancer

  • CloudWatch log group (no retention configured)

Cost estimate: ~$95/month (Fargate) + ~$25/month (ALB) = $120/month

Risks:

  • Log group has no retention policy → unbounded storage costs

  • Tasks use x86_64 architecture; Graviton would reduce cost ~20%

  • No autoscaling configured; manual intervention required under load

  • ALB security group allows 0.0.0.0/0 ingress on port 443 (verify if intentional)

Recommendations:

  • Set log retention to 30 days

  • Evaluate ARM64/Graviton compatibility

  • Add autoscaling with target tracking

  • Restrict ALB ingress to known CIDR ranges or CloudFront

Outcome: Reviewer approves with conditions. Author updates the stack. Review cycle time: 15 minutes instead of 2 hours.

Security, Compliance, and Quality Integration

Amazon Q doesn't replace security tooling—it augments it.

IAM Least Privilege

I prompt Q: "Review this IAM policy and restrict to least privilege for a Lambda reading from S3 and writing to DynamoDB."

Q tightens wildcards, removes unnecessary actions, adds conditions for resource tagging.

Secrets Hygiene

Q flags hardcoded credentials or API keys during reviews. I pair this with git-secrets and AWS Secrets Manager integration.

Drift Detection

After deployments, I compare actual infrastructure (via AWS Config or Terraform state) against source templates. If drift occurs, I ask Q: "Why might this resource configuration differ from the template?" It helps hypothesize causes (manual changes, out-of-band automation, CloudFormation stack updates).

Policy-as-Code

I maintain Conftest policies (OPA/Rego) for tagging, encryption, and network segmentation. When policies fail, Q explains the rule intent and suggests compliant configurations.

Cost Guardrails

I integrate Infracost in CI and set thresholds (e.g., no PR increasing monthly cost by >$500 without approval). Q helps identify cost drivers and alternatives.

Repository Improvement Plan (Prioritized)

If I were assessing a typical IaC codebase today, here's what I'd prioritize:

  1. Add (High Priority):

    • Pre-commit hooks: terraform fmt, tflint, Checkov

    • Infracost integration in CI

    • Basic Conftest policies (tagging, encryption)

    • ECR lifecycle policies across all container builds

    • Automated README generation (Q can draft from code)

  2. Refactor (Medium Priority):

    • Consolidate duplicate modules

    • Standardize naming conventions (use Q to generate renaming scripts)

    • Migrate legacy instance types to Graviton where compatible

    • Replace NAT gateways with VPC endpoints for AWS services

  3. Harden (Medium Priority):

    • IAM policy reviews (Q-assisted least privilege tightening)

    • Enable Terraform state locking (DynamoDB + S3)

    • Add drift detection automation (aws-config or Terraform Cloud)

    • Implement environment-specific configurations (dev/stage/prod variants)

  4. Automate (Lower Priority, High Impact):

    • Q-generated PR comment summaries (cost/security/drift)

    • Automated documentation updates on merge

    • Ephemeral preview environments for PRs (using Terraform workspaces or CDK context)

  5. Measure (Ongoing):

    • Track PR review time and test coverage improvements

    • Iterate on Q prompts based on false positives/negatives

    • Refine policy-as-code rules based on team feedback

Skepticism and Limitations

I've hit real limitations with Q:

Hallucinations: Q occasionally invents AWS resource properties that don't exist (e.g., fictional CloudFormation parameters). Always validate against official docs.

Context window: For massive monorepo structures, Q loses context. I work around this by targeting specific files or summarizing first.

Organizational standards: Q doesn't know your company's naming conventions, approved instance families, or compliance requirements unless you explicitly provide them in prompts or customization files.

Noisy recommendations: Q sometimes suggests optimizations that conflict with architectural decisions (e.g., recommending smaller instances when you've standardized on m5.large for operational simplicity). Filtering signal from noise requires domain knowledge.

Overfitting to public examples: Q trained on public repos. If your IaC patterns are highly proprietary or unconventional, its suggestions may miss the mark.

Human validation is non-negotiable: I never merge Q-generated code without review, testing, and static analysis. Treat Q as a draft generator, not a replacement for engineering judgment.

Conclusion

Amazon Q Developer has changed how I work with infrastructure code. It doesn't replace engineering judgment, but it handles the tedious parts—reading legacy code, translating between IaC languages, spotting optimization opportunities, and catching security issues early.

The biggest wins for me have been:

  • Understanding inherited codebases in minutes instead of days

  • Generating module skeletons from plain English requirements

  • Cutting PR review time by helping reviewers quickly understand changes and impacts

  • Catching cost and security issues before they reach production

The key is treating Q as a tool, not a magic solution. I always validate its suggestions, test changes thoroughly, and integrate it with existing tooling like static analysis and policy checks.

If you're considering trying it, start small: pick one messy legacy file, ask Q to explain it, and see what optimization opportunities it finds. Install the VS Code extension (there's a free tier), experiment with prompts, and adjust based on what works for your workflow.

The goal isn't perfection—it's making IaC work less frustrating and more efficient, one template at a time.

More from this blog

T

Timur Galeev Blog

22 posts

AWS Community Builder, Cloud/Platform Architect with hands-on experience programming, supporting, automating and optimizing mission-critical deployments in the cloud