<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Timur Galeev Blog]]></title><description><![CDATA[Talks about AWS.]]></description><link>https://tgaleev.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1680641093406/m0oD1PbjB.png</url><title>Timur Galeev Blog</title><link>https://tgaleev.com</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 22:00:58 GMT</lastBuildDate><atom:link href="https://tgaleev.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Working with AWS European Sovereign Cloud (ESC): Terraform, IaC, and what's different]]></title><description><![CDATA[If you manage AWS infrastructure with code, the European Sovereign Cloud adds a new partition to think about. Different endpoints, separate IAM, its own console. This guide covers what works out of th]]></description><link>https://tgaleev.com/working-with-aws-european-sovereign-cloud-esc-terraform-iac-and-what-s-different</link><guid isPermaLink="true">https://tgaleev.com/working-with-aws-european-sovereign-cloud-esc-terraform-iac-and-what-s-different</guid><category><![CDATA[AWS]]></category><category><![CDATA[ESC]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[#IaC]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 28 Jan 2026 09:30:00 GMT</pubDate><enclosure url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/63d9868baa8c8258b0608ecf/e545b2c6-c9b1-4b1e-9005-d3a7faeae3b4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>If you manage AWS infrastructure with code, the European Sovereign Cloud adds a new partition to think about. Different endpoints, separate IAM, its own console. This guide covers what works out of the box, what needs changes, and the patterns that help when you deploy across both ESC and commercial AWS.</p>
<h2>Why This Exists</h2>
<p>AWS has had European regions since 2007. Ireland came first, then Frankfurt, London, Paris, Stockholm, Milan, Zurich, Spain. Eight regions across Europe. Data stays in Europe. GDPR compliant. Problem solved, right?</p>
<p>Not quite.</p>
<p>Here's the thing about <code>eu-central-1</code> (Frankfurt) — your data sits in Germany, sure. But AWS operations? Support tickets? Billing metadata? That stuff flows through global systems. American employees can access it. The control plane lives in the US. When you call support at 3am, someone in Seattle might answer.</p>
<p>For plenty of companies, that's fine. You're running a SaaS product, your customers don't care where the ops team sits. But for German government agencies processing citizen data? French hospitals handling patient records? Banks under BaFin scrutiny? They've been asking harder questions.</p>
<p>The US Cloud Act made it worse. Passed in 2018, it lets American authorities compel US companies to hand over data, even if that data sits on servers in Frankfurt. Doesn't matter where the bits are physically stored — if an American company controls them, American courts can demand them. AWS has always pushed back on these requests, but "trust us, we'll fight it" isn't the same as "technically impossible."</p>
<p>Then came Schrems II in 2020, when the EU Court of Justice invalidated Privacy Shield. Suddenly every European company using American cloud providers had to justify why their data transfers were legal. Standard contractual clauses helped, but the legal uncertainty never fully went away.</p>
<p>That's the gap ESC fills. Not just "data in Europe" but "everything in Europe" — operations, support, billing, leadership, legal jurisdiction.</p>
<h2>What's Actually Different</h2>
<p>The European Sovereign Cloud is a separate partition entirely. Not a region — a partition. Like how AWS GovCloud is separate from commercial AWS, or how China regions are isolated. Different domain (<code>amazonaws.eu</code> instead of <code>amazonaws.com</code>), different IAM system, different control plane.</p>
<p>The region code is <code>eusc-de-east-1</code>, sitting in Brandenburg, Germany. The partition identifier is <code>aws-eusc</code>. When you construct ARNs, it's <code>arn:aws-eusc:</code> not <code>arn:aws:</code>.</p>
<p>AWS set up a new German parent company to run it — AWS European Sovereign Cloud GmbH — with three subsidiaries handling infrastructure, certificates, and employment. The managing directors are Stéphane Israël (former CEO of Arianespace) and Stefan Hoechbauer (VP of AWS Germany), both EU citizens based in the EU. The board includes independent third-party representatives specifically for sovereignty oversight. Not Amazon employees — actual independent oversight.</p>
<p>Only EU residents work there. Not just "based in Europe" — actually residing in the EU with EU contracts. And going forward, they're only hiring EU citizens. The transition is gradual, but the end state is clear: EU citizens only, no exceptions. No "follow-the-sun" support routing your ticket to Virginia at 3am.</p>
<p>When AWS says the infrastructure has "no critical dependencies on non-EU infrastructure", they mean it literally. The system can keep running even if someone cuts the transatlantic cables. Billing systems, metering engines, security operations center — all contained within the EU. Metadata created in ESC stays in ESC. Your usage data doesn't flow to a US billing system.</p>
<h2>The Security Foundation</h2>
<p>This matters more than the org chart stuff, honestly. Legal structures can change. Technical architecture is harder to undo.</p>
<p>ESC runs on the Nitro System, same as regular AWS. But the Nitro architecture is what makes the sovereignty claims credible. It's not just policy — it's hardware design.</p>
<p>The Nitro System was built with zero operator access as a design goal. There's no SSH into the hypervisor. No console access. No mechanism for AWS employees — or anyone — to access EC2 instance memory or customer data on encrypted storage. When they say "no backdoors", it's not a policy promise, it's a constraint enforced by the silicon.</p>
<p>Administrative access happens through authenticated, authorized, and logged APIs that provide no path to customer data. You can audit operations without giving operators data access. These restrictions are built into the Nitro firmware itself. Not a software toggle someone can flip during an emergency or under legal pressure.</p>
<p>NCC Group, an independent security firm, validated these claims in an audit published May 2023. They specifically looked for gaps that would let someone access customer data or memory. Found none. That audit applies to Nitro everywhere, including ESC.</p>
<p>For ESC specifically, AWS added the Sovereignty Reference Framework (ESC-SRF). It's an independently validated framework with third-party auditor reports documenting the sovereignty controls. Your compliance team can hand these reports to regulators instead of trying to explain AWS architecture themselves.</p>
<h2>The Catch (There's Always a Catch)</h2>
<p>You can't just add ESC to your existing AWS Organization and call it a day. This is a separate cloud, and that separation creates friction.</p>
<p><strong>Separate console, separate login.</strong> ESC has its own management console on the <code>amazonaws.eu</code> domain, separate from <code>console.aws.amazon.com</code>. Different URL, different accounts, different credentials. You can't switch between ESC and commercial AWS with the account dropdown — they're completely separate consoles. Bookmark both if you work in both.</p>
<p><strong>No cross-partition IAM.</strong> Can't assume roles from your regular AWS account into ESC. If you have workloads in both places, you need separate identity management. Set up federation through a third-party IdP like Okta or Azure AD, maintain separate credentials, design your CI/CD to handle both partitions. Your developers need two sets of AWS credentials.</p>
<p><strong>No VPC peering.</strong> Want to connect <code>eu-central-1</code> to ESC? Treat it like connecting to on-premises infrastructure. VPN, Direct Connect, or application-level APIs. You're bridging two clouds, not two regions. Network architects used to multi-region deployments need to reset their mental model.</p>
<p><strong>Separate accounts entirely.</strong> Different accounts, different Organizations, different invoices, different cost allocation tags. If your finance team tracks cloud spend by AWS account ID, they need new processes. Your existing FinOps dashboards won't see ESC spend.</p>
<p><strong>ECR isolation.</strong> You can't pull container images from your existing ECR repos in <code>eu-central-1</code>. ESC's isolation means no cross-partition image pulls. Push your images to ECR in <code>eusc-de-east-1</code>, use a public registry, or set up replication through your CI/CD pipeline.</p>
<p><strong>Terraform works, but check your version.</strong> Terraform 1.14+ and AWS provider 6.x support ESC natively — endpoints resolve correctly without manual configuration. Just set the region:</p>
<pre><code class="language-hcl">provider "aws" {
  region = "eusc-de-east-1"
}
</code></pre>
<p>If you're on an older version, you'll need to upgrade or configure endpoints manually. The S3 backend for state storage also requires Terraform 1.14+.</p>
<h2>What Services Are Available</h2>
<p>AWS didn't launch this with five services and a "coming soon" page. You get 90+ services from day one. That matters because previous sovereign cloud offerings often meant accepting a skeleton service catalog.</p>
<p><strong>Containers:</strong> ECS, EKS, ECR. Full Fargate support. If you're running containers anywhere on AWS today, same capabilities.</p>
<p><strong>Compute:</strong> EC2 with multiple instance families, Lambda for serverless. Enough instance types for most workloads.</p>
<p><strong>AI/ML:</strong> Bedrock, SageMaker, Amazon Q. All available from day one.</p>
<p><strong>Database:</strong> Aurora (MySQL and PostgreSQL compatible), DynamoDB, RDS for managed databases. All the usual engines.</p>
<p><strong>Storage:</strong> S3 with full feature parity, EBS for block storage.</p>
<p><strong>Networking:</strong> VPC, Direct Connect, Route 53 for private hosted zones. Transit Gateway for complex topologies.</p>
<p><strong>Security:</strong> KMS for encryption keys, Secrets Manager, Private CA, IAM with all the normal features.</p>
<p>If you're running containers on Fargate in Frankfurt today, you can run the same workloads on ESC. Same task definitions, same service configs, just different region and endpoints.</p>
<h2>What's Missing</h2>
<p>90 services sounds good until you remember AWS has 240+. Some gaps matter more than others:</p>
<p><strong>CloudFront</strong> — No CDN at launch. If your architecture relies on edge caching, you'll need alternatives. Expected end of 2026.</p>
<p><strong>IAM Identity Center</strong> — The modern way to manage SSO across an Organization isn't there yet. You can still use IAM with external identity providers, but you'll configure it per-account instead of centrally. Expected Q1 2026.</p>
<p><strong>Shield Advanced &amp; Firewall Manager</strong> — DDoS protection and centralized firewall rules aren't available. Basic Shield is included, but advanced protections aren't.</p>
<p><strong>Amazon Inspector</strong> — No automated vulnerability scanning for workloads yet.</p>
<p><strong>GuardDuty</strong> — Available but limited. No Organization-level management, missing some newer detection capabilities.</p>
<p><strong>IoT Services</strong> — IoT Core, Greengrass, and related services aren't included. If you're running IoT workloads, ESC isn't ready for them.</p>
<p><strong>Organizations features</strong> — You get AWS Organizations, but delegated administration isn't supported. StackSets and other governance tools must run from the Management Account.</p>
<p>Also worth noting: S3 Block Public Access isn't enabled by default like it is in commercial AWS. Enable it manually.</p>
<p><strong>Pricing:</strong> Expect 10-15% premium over Frankfurt (eu-central-1) for comparable services.</p>
<h2>Deploying Containers — The Practical Bits</h2>
<p>The patterns are identical to regular AWS. I'm not going to paste hundreds of lines of Terraform — you know how to deploy ECS. The differences are configuration, not architecture:</p>
<ol>
<li><p>Region: <code>eusc-de-east-1</code></p>
</li>
<li><p>ARNs use <code>aws-eusc</code> partition: <code>arn:aws-eusc:iam::aws:policy/...</code></p>
</li>
<li><p>ECR images must come from ESC or public registries</p>
</li>
<li><p>Tag resources with compliance markers for your auditors</p>
</li>
</ol>
<p>A minimal ECS task definition:</p>
<pre><code class="language-hcl">resource "aws_ecs_task_definition" "app" {
  family                   = "my-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.execution.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "your-ecr.eusc-de-east-1.amazonaws.eu/app:latest"
    portMappings = [{ containerPort = 80 }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"  = "/ecs/my-app"
        "awslogs-region" = "eusc-de-east-1"
      }
    }
  }])
}
</code></pre>
<p>VPC setup is standard — public subnets for load balancers, private subnets for tasks, NAT gateways for outbound traffic. Security groups, ALB config, service definitions — all identical to what you'd write for Frankfurt.</p>
<h2>Infrastructure as Code: The Real Story</h2>
<p>If you're managing infrastructure with code (and you should be), here's what actually works with ESC right now.</p>
<h3>Terraform and OpenTofu</h3>
<p>As mentioned, Terraform 1.14+ handles ESC out of the box. But there's more to it than just setting the region. The <code>aws_partition</code> data source correctly returns <code>aws-eusc</code>, which is useful when you're building partition-aware modules:</p>
<pre><code class="language-hcl">data "aws_partition" "current" {}

# Returns "aws-eusc" in ESC, "aws" in commercial
output "partition" {
  value = data.aws_partition.current.partition
}
</code></pre>
<p>For multi-partition deployments, use provider aliases:</p>
<pre><code class="language-hcl">provider "aws" {
  alias  = "esc"
  region = "eusc-de-east-1"
}

provider "aws" {
  alias  = "commercial"
  region = "eu-central-1"
}

# Deploy to ESC
resource "aws_s3_bucket" "sovereign_data" {
  provider = aws.esc
  bucket   = "my-sovereign-bucket"
}

# Deploy to commercial
resource "aws_s3_bucket" "public_assets" {
  provider = aws.commercial
  bucket   = "my-public-bucket"
}
</code></pre>
<p>OpenTofu 1.11+ also supports ESC natively, including the S3 backend in <code>eusc-de-east-1</code>. Confirmed working by community testing in <a href="https://github.com/opentofu/opentofu/issues/3312">December 2025</a>. If you've switched to OpenTofu, same patterns apply.</p>
<h3>AWS CDK</h3>
<p>CDK supports ESC since August 2025. Region registration for <code>eusc-de-east-1</code> and VPC endpoint handling were added in <a href="https://github.com/aws/aws-cdk/issues/34318">PR #34860</a>. No workarounds needed — just set the region:</p>
<pre><code class="language-typescript">const app = new cdk.App();
const stack = new cdk.Stack(app, 'EscStack', {
  env: {
    account: '123456789012',
    region: 'eusc-de-east-1',
  },
});
</code></pre>
<p>ARNs, service endpoints, and partition references resolve correctly out of the box.</p>
<h3>CloudFormation</h3>
<p>Works as expected. CloudFormation is partition-aware by design, so templates deploy without modification. The <code>AWS::Partition</code> pseudo parameter returns <code>aws-eusc</code> automatically.</p>
<pre><code class="language-yaml">Resources:
  MyRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        # Automatically uses aws-eusc partition
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
</code></pre>
<p>One exception: <strong>Landing Zone Accelerator doesn't work</strong>. LZA maps to a single AWS Organization and can't span partitions. You'll need separate LZA deployments for ESC and commercial, with duplicated configurations.</p>
<h3>Multi-Partition Patterns</h3>
<p>Running workloads in both ESC and commercial AWS? Here are patterns that work:</p>
<p><strong>Shared modules with partition-aware variables:</strong></p>
<pre><code class="language-hcl">variable "partition" {
  description = "AWS partition (aws or aws-eusc)"
  type        = string
}

variable "region" {
  description = "AWS region"
  type        = string
}

locals {
  is_sovereign = var.partition == "aws-eusc"

  # Adjust for service availability
  enable_cloudfront    = !local.is_sovereign  # Not available in ESC yet
  enable_guardduty_org = !local.is_sovereign  # Limited in ESC
}
</code></pre>
<p><strong>Separate state files per partition:</strong></p>
<pre><code class="language-hcl"># ESC backend
terraform {
  backend "s3" {
    bucket = "my-tfstate-esc"
    key    = "infrastructure/terraform.tfstate"
    region = "eusc-de-east-1"
  }
}
</code></pre>
<p>Don't try to share state across partitions. The isolation is the point.</p>
<p><strong>CI/CD branching strategy:</strong></p>
<p>Some teams run completely separate pipelines per partition. Others use a single pipeline with partition as a variable. The right choice depends on how different your ESC and commercial configurations are. If they're mostly identical, one pipeline with environment variables works. If they diverge significantly, separate pipelines prevent accidents.</p>
<h2>Planning Your Architecture</h2>
<p>If you're considering ESC, think about workload segmentation early. Not everything needs sovereignty guarantees, and putting everything in ESC when it doesn't need to be there adds cost and complexity.</p>
<p><strong>Tier 0 — Sovereign (ESC):</strong> Sensitive data requiring sovereignty guarantees. Patient health records, citizen personal data, financial records under regulatory requirements, classified government workloads. This is your ESC tier.</p>
<p><strong>Tier 1 — Standard (Commercial AWS or ESC):</strong> Business data without special regulatory requirements. Internal tools, development environments, public-facing websites, marketing systems.</p>
<p>The hard part is the boundary. Your sovereign tier probably needs data from the standard tier sometimes. Options:</p>
<p><strong>API gateways at the boundary.</strong> ESC workloads call commercial AWS through a controlled API layer. Strict authentication, audit logging, minimal data exposure. The API becomes your compliance checkpoint.</p>
<p><strong>Data diodes for one-way flow.</strong> ESC can pull data from commercial AWS on a schedule. Commercial can't push to ESC. Useful for reference data that needs to be in ESC but originates elsewhere.</p>
<p><strong>Message queues with encryption.</strong> Async communication through something like SQS or external message brokers. Decouples the systems while maintaining the boundary.</p>
<p>Don't try to architect this like multi-region. It's multi-cloud, practically speaking. Your <code>eu-central-1</code> workloads can't directly call your ESC workloads over private networking. Plan for that from day one, not as an afterthought.</p>
<h2>Migration Path</h2>
<p>If you're moving existing workloads to ESC, here's a rough sequence:</p>
<p><strong>Phase 1: Assessment.</strong> Which workloads actually need sovereignty? Many teams discover only 20-30% of their infrastructure handles truly sensitive data. Don't move everything just because you can.</p>
<p><strong>Phase 2: Identity setup.</strong> Get your IAM structure in ESC before anything else. Set up federation, create roles, establish your permission model. Test authentication flows.</p>
<p><strong>Phase 3: Network foundation.</strong> VPC, subnets, NAT gateways, security groups. If you need connectivity back to commercial AWS, set up the VPN or Direct Connect tunnel.</p>
<p><strong>Phase 4: Container registry.</strong> Push your images to ECR in ESC. Update your CI/CD to build and push to both registries if you're running in both partitions.</p>
<p><strong>Phase 5: Workload deployment.</strong> Start with non-critical workloads to validate your Terraform and deployment pipelines. Work through the endpoint configuration issues before touching production.</p>
<p><strong>Phase 6: Data migration.</strong> This is usually the hardest part. How do you move data without downtime? Often involves running parallel systems temporarily, with replication from source.</p>
<p><strong>Phase 7: Cutover.</strong> Switch traffic to ESC workloads. Keep the old deployment running until you're confident, then decommission.</p>
<h2>Cost Reality</h2>
<p>ESC pricing follows standard AWS models — you pay for what you use. But the isolation adds costs:</p>
<p><strong>NAT Gateways:</strong> ~€0.045/hour each plus data processing. High availability means two gateways, roughly €65/month before data charges. You're paying this in Frankfurt too, but now you're paying it twice if you have workloads in both partitions.</p>
<p><strong>Data transfer between partitions:</strong> Not free internal transfer. Treat it like cross-region or internet egress. If your architecture involves heavy data movement between ESC and commercial AWS, model those costs.</p>
<p><strong>Operational overhead:</strong> Managing two partitions means duplicated effort. Two sets of IAM policies, two CI/CD pipelines, two monitoring dashboards, two on-call rotations if you have partition-specific issues. That's engineering time.</p>
<p><strong>Compliance tooling:</strong> You'll probably want separate security scanning, compliance monitoring, and audit tooling for ESC. Or tools that understand both partitions. Either way, cost.</p>
<p>AWS has confirmed a 10-15% pricing premium over Frankfurt for comparable services — what they call the "sovereignty premium." Combined with the hidden costs above, budget accordingly.</p>
<h2>Who Should Actually Use This</h2>
<p><strong>Move to ESC if:</strong></p>
<ul>
<li><p>You handle data under strict EU sovereignty requirements — not just GDPR, but sector-specific rules that mandate operational control</p>
</li>
<li><p>Regulators or auditors have specifically asked about US Cloud Act exposure</p>
</li>
<li><p>You're in public sector, healthcare (especially in Germany with patient data), or finance with explicit data residency mandates</p>
</li>
<li><p>Your contracts require EU-only operations and personnel — government contracts often do</p>
</li>
<li><p>You need to demonstrate sovereignty compliance with third-party validated reports</p>
</li>
</ul>
<p><strong>Stick with regular EU regions if:</strong></p>
<ul>
<li><p>Standard GDPR compliance is sufficient for your use case</p>
</li>
<li><p>You need services that haven't launched in ESC yet</p>
</li>
<li><p>Cost optimization is priority over sovereignty guarantees</p>
</li>
<li><p>You're already running multi-region and partition complexity doesn't fit your operating model</p>
</li>
<li><p>Your compliance requirements don't specifically call out operational sovereignty or personnel location</p>
</li>
</ul>
<p>ESC isn't "better" than Frankfurt. It solves a specific problem. If you don't have that problem, you're adding complexity and cost for no benefit. Frankfurt with proper encryption and access controls is fine for most workloads.</p>
<h2>The Competitive Landscape</h2>
<p>AWS isn't alone here. Microsoft announced sovereign cloud offerings for EU customers. Google has Sovereign Controls for GCP. But the approaches differ.</p>
<p>Microsoft's approach involves partnerships with local operators — like T-Systems in Germany running Azure infrastructure. Google focuses on software controls and key management.</p>
<p>AWS went further with complete partition isolation. New legal entities, new domain, separate IAM, the whole stack. Whether that matters depends on what your regulators care about.</p>
<p>The 90+ service catalog at launch also sets AWS apart. Competitors often launch sovereign offerings with limited services and catch up over time. ESC starts nearly feature-complete.</p>
<h2>What's Coming</h2>
<p>AWS announced expansion plans. Local Zones in Belgium, Netherlands, and Portugal — same sovereignty model, lower latency for users in those countries. These extend ESC's footprint without requiring new full regions.</p>
<p>The workforce transition continues. Current staff are EU residents; future hires will be EU citizens only. Over time, the entire operation shifts to citizen-only. That's a commitment you can point to in RFPs.</p>
<p>More regions within ESC are likely but not announced. If demand justifies it, a second ESC region (France? Italy?) would add redundancy options.</p>
<p>The €7.8 billion investment through 2040 signals this isn't an experiment. Amazon is building parallel infrastructure for the next fifteen years.</p>
<h2>Bottom Line</h2>
<p>The European Sovereign Cloud answers three questions that every regulated European organization has been asking. Where exactly is my data? Who can access it? What happens when a foreign government asks for it?</p>
<p>For workloads where those questions have regulatory or contractual weight, ESC provides answers backed by legal structure, organizational isolation, and hardware-level security design. The ESC-SRF gives you auditor reports to prove it.</p>
<p>For everything else, <code>eu-central-1</code> works fine and doesn't require rethinking your account structure, identity model, and network architecture.</p>
<p>Just remember: ESC is a different cloud, not a different region. The isolation that provides sovereignty guarantees also creates operational boundaries. That's the point — but it's also the cost.</p>
<hr />
<p><strong>References</strong></p>
<ul>
<li><p><a href="https://www.tecracer.com/blog/2026/01/aws-european-sovereign-cloud-esc-launch-pricing-and-whats-next.html">tecRacer: AWS ESC Launch, Pricing, and What's Next</a></p>
</li>
<li><p><a href="https://aws.amazon.com/blogs/aws/opening-the-aws-european-sovereign-cloud/">AWS Blog: Opening the European Sovereign Cloud</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/whitepapers/latest/overview-aws-european-sovereign-cloud/introduction.html">AWS ESC Whitepaper</a></p>
</li>
<li><p><a href="https://aws.amazon.com/blogs/security/announcing-initial-services-available-in-the-aws-european-sovereign-cloud-backed-by-the-full-power-of-aws/">AWS Security Blog: ESC Services</a></p>
</li>
<li><p><a href="https://aws.amazon.com/blogs/security/exploring-the-new-aws-european-sovereign-cloud-sovereign-reference-framework/">AWS Security Blog: Sovereignty Reference Framework</a></p>
</li>
<li><p><a href="https://github.com/hashicorp/terraform-provider-aws/issues/44437">Terraform ESC Support (resolved)</a></p>
</li>
<li><p><a href="https://dev.to/alifunk/air-gap-for-the-cloud-why-the-aws-european-sovereign-cloud-changes-everything-3gfp">DEV: Why ESC Changes Everything</a></p>
</li>
<li><p><a href="https://dev.to/kazuya_dev/aws-reinvent-2025-aws-european-sovereign-cloud-your-20-minute-essential-guide-gbl101-2acj">DEV: AWS ESC Essential Guide</a></p>
</li>
<li><p><a href="https://press.aboutamazon.com/aws/2026/1/aws-launches-aws-european-sovereign-cloud-and-announces-expansion-across-europe">Amazon Press Release</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[ECS vs EKS: When You DON'T Need Kubernetes - A Practical Guide to Choosing AWS Container Services]]></title><description><![CDATA[Introduction
You know what? I see teams spinning up Kubernetes clusters for three microservices all the time. Then they spend two months figuring out pods, ingress controllers, and all that magic. And then they pay $70 per month just for three cluste...]]></description><link>https://tgaleev.com/ecs-vs-eks-when-you-dont-need-kubernetes-a-practical-guide-to-choosing-aws-container-services</link><guid isPermaLink="true">https://tgaleev.com/ecs-vs-eks-when-you-dont-need-kubernetes-a-practical-guide-to-choosing-aws-container-services</guid><category><![CDATA[AWS]]></category><category><![CDATA[ECS]]></category><category><![CDATA[EKS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Sun, 04 Jan 2026 16:01:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767542056853/d344cf54-b595-4f93-a0fd-d8e3ee84d25c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>You know what? I see teams spinning up Kubernetes clusters for three microservices all the time. Then they spend two months figuring out pods, ingress controllers, and all that magic. And then they pay $70 per month just for three clusters in different regions, not counting the actual servers.</p>
<p><strong>Here's the honest truth</strong>: Kubernetes is a powerful tool but you don't always need it. Amazon ECS is a simpler alternative that handles most tasks faster and cheaper.</p>
<p>In this article I'll show you:</p>
<ul>
<li><p>When ECS beats EKS (and saves you tons of money)</p>
</li>
<li><p>Real scenarios with numbers and examples</p>
</li>
<li><p>Ready-to-use code snippets for deploying to both platforms</p>
</li>
<li><p>How to make the decision without headaches</p>
</li>
</ul>
<p>Let's dive in!</p>
<h2 id="heading-quick-comparison-ecs-vs-eks"><strong>Quick Comparison: ECS vs EKS</strong></h2>
<p>First let's look at the main differences in a simple table:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>AWS ECS</td><td>AWS EKS</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Cluster Cost</strong></td><td>$0</td><td>$0.10/hour (~$70/month)</td></tr>
<tr>
<td><strong>Setup Complexity</strong></td><td>Low (2-4 hours)</td><td>High (1-2 days)</td></tr>
<tr>
<td><strong>Learning Curve</strong></td><td>Few days</td><td>Several weeks</td></tr>
<tr>
<td><strong>Management</strong></td><td>AWS Console/CLI</td><td>kubectl + AWS Console</td></tr>
<tr>
<td><strong>Ecosystem</strong></td><td>AWS services</td><td>Entire Kubernetes world</td></tr>
<tr>
<td><strong>Portability</strong></td><td>AWS only</td><td>Any cloud/on-prem</td></tr>
<tr>
<td><strong>Updates</strong></td><td>Automatic</td><td>Manual (control plane)</td></tr>
<tr>
<td><strong>Best For</strong></td><td>1-10 services</td><td>10-100+ services</td></tr>
</tbody>
</table>
</div><h3 id="heading-architecture-how-it-works"><strong>Architecture: How It Works</strong></h3>
<p><strong>ECS Architecture</strong>:</p>
<pre><code class="lang-plaintext">Your Application
    ↓
Docker Image (you need this!)
    ↓
Task Definition (container description)
    ↓
ECS Service (manages launch)
    ↓
EC2 or Fargate (where it runs)
    ↓
Container running
</code></pre>
<p><strong>EKS Architecture</strong>:</p>
<pre><code class="lang-plaintext">Your Application
    ↓
Docker Image
    ↓
Kubernetes Pod specification
    ↓
Deployment/StatefulSet
    ↓
Kubernetes Control Plane ($$$)
    ↓
Worker Nodes
    ↓
Container in Pod
</code></pre>
<p>See the difference? ECS has two fewer steps and each one is easier to understand.</p>
<h2 id="heading-when-ecs-is-your-best-choice"><strong>When ECS is Your Best Choice</strong></h2>
<p>This is where it gets interesting. Many people think Kubernetes is always needed but that's not true. Let's break down real situations where ECS wins.</p>
<h3 id="heading-scenario-1-multi-regional-deployment-3-5-services"><strong>Scenario 1: Multi-Regional Deployment (3-5 Services)</strong></h3>
<p>Imagine: you have a simple API and a couple supporting services. You need to deploy them in three regions - Europe, Asia, USA. For redundancy, you know.</p>
<p><strong>With EKS you pay:</strong></p>
<ul>
<li><p>Europe cluster: $70/month</p>
</li>
<li><p>Asia cluster: $70/month</p>
</li>
<li><p>USA cluster: $70/month</p>
</li>
<li><p><strong>Total: $210/month</strong> just for the right to run containers</p>
</li>
</ul>
<p><strong>With ECS you pay:</strong></p>
<ul>
<li><p>Cluster is free: $0</p>
</li>
<li><p><strong>Total: $0</strong> for management</p>
</li>
</ul>
<p>In other words, <strong>save $2,520 per year</strong> just on the control plane! And you still gotta pay for the actual servers.</p>
<h4 id="heading-real-example"><strong>Real Example</strong></h4>
<p>I had a project - e-commerce backend. Five services:</p>
<ol>
<li><p>API Gateway (Node.js)</p>
</li>
<li><p>Order Service (Python)</p>
</li>
<li><p>Payment Service (Go)</p>
</li>
<li><p>Notification Service (Node.js)</p>
</li>
<li><p>Analytics Worker (Python)</p>
</li>
</ol>
<p>Each service needed a Docker image. Here's a simple Dockerfile example for the Node.js API:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Dockerfile for API Gateway</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-alpine

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm install --production</span>

<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>

<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">3000</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"node"</span>, <span class="hljs-string">"server.js"</span>]</span>
</code></pre>
<p>We deployed across three regions using ECS Fargate. <strong>Setup time: 4 hours</strong> including Terraform code. If we'd done it with EKS - that's minimum a week with Helm charts, ingress controllers and all that kitchen.</p>
<p>Here's how we defined the task in ECS (simplified):</p>
<pre><code class="lang-plaintext"># ECS Task Definition - just the container part
resource "aws_ecs_task_definition" "api_gateway" {
  family                   = "api-gateway"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"

  container_definitions = jsonencode([{
    name      = "api"
    image     = "123456789.dkr.ecr.us-east-1.amazonaws.com/api-gateway:latest"
    essential = true

    portMappings = [{
      containerPort = 3000
      protocol      = "tcp"
    }]

    environment = [
      { name = "NODE_ENV", value = "production" },
      { name = "PORT", value = "3000" }
    ]
  }])
}
</code></pre>
<p>Compare this to Kubernetes - you'd need Deployment YAML, Service YAML, maybe Ingress, ConfigMaps... it adds up.</p>
<h3 id="heading-scenario-2-quick-start-and-simplicity"><strong>Scenario 2: Quick Start and Simplicity</strong></h3>
<p>You're a startup. You have an MVP that needs to ship yesterday. Team of three people nobody knows Kubernetes deeply.</p>
<p><strong>ECS gives you:</strong></p>
<ul>
<li><p>Launch in couple hours (not days!)</p>
</li>
<li><p>AWS integration out of the box</p>
</li>
<li><p>No need to hire Kubernetes expert</p>
</li>
<li><p>Less moving parts = less things break</p>
</li>
</ul>
<p>Look I'm not saying <em><s>Kubernetes is bad</s></em>. It's awesome! But do you need it when you just wanna run a container? It's like buying a truck to get bread from the store.</p>
<p><strong>Time to learn:</strong></p>
<ul>
<li><p>ECS: 2-3 days to work comfortably</p>
</li>
<li><p>EKS: 2-3 weeks minimum (or even a month)</p>
</li>
</ul>
<p>Here's a complete minimal ECS setup with Terraform:</p>
<pre><code class="lang-plaintext"># Minimal ECS cluster
resource "aws_ecs_cluster" "main" {
  name = "my-app-cluster"
}

# ECS Service - runs 2 copies of your container
resource "aws_ecs_service" "app" {
  name            = "my-app"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api_gateway.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = ["subnet-xxx", "subnet-yyy"]
    security_groups  = ["sg-xxx"]
    assign_public_ip = true
  }
}
</code></pre>
<p>That's it! No Helm, no kubectl, no YAML soup.</p>
<h3 id="heading-scenario-3-aws-native-project"><strong>Scenario 3: AWS-Native Project</strong></h3>
<p>Your project is fully in AWS:</p>
<ul>
<li><p>Database - RDS</p>
</li>
<li><p>Files - S3</p>
</li>
<li><p>Queues - SQS</p>
</li>
<li><p>Cache - ElastiCache</p>
</li>
<li><p>Logs - CloudWatch</p>
</li>
</ul>
<p>Why Kubernetes here? ECS integrates with these services <strong>natively</strong> and simpler.</p>
<p><strong>Example - S3 access:</strong></p>
<p>ECS Task Role (simple):</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
  <span class="hljs-attr">"Statement"</span>: [{
    <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
    <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:*"</span>,
    <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::my-bucket/*"</span>
  }]
}
</code></pre>
<p>Attach the role to Task Definition - done.</p>
<p>In EKS you do the same through IRSA (IAM Roles for Service Accounts):</p>
<ul>
<li><p>Setup OIDC provider</p>
</li>
<li><p>Create ServiceAccount in Kubernetes</p>
</li>
<li><p>Link with IAM role</p>
</li>
<li><p>Annotate the pod</p>
</li>
</ul>
<p>More steps = more places to mess up.</p>
<h2 id="heading-when-eks-becomes-necessary"><strong>When EKS Becomes Necessary</strong></h2>
<p>Alright enough praising ECS. Let's be honest - there are situations where EKS is really better.</p>
<h3 id="heading-scenario-1-large-microservices-architecture-20-services"><strong>Scenario 1: Large Microservices Architecture (20+ Services)</strong></h3>
<p>When you have 20, 30, 50 microservices - that's different math.</p>
<p><strong>Why EKS wins:</strong></p>
<ul>
<li><p>$70 per cluster is fixed price (whether 5 services or 50)</p>
</li>
<li><p>Kubernetes scales complexity better</p>
</li>
<li><p>Ecosystem: Helm, Operators, service mesh (Istio, Linkerd)</p>
</li>
<li><p>Centralized management of all services</p>
</li>
</ul>
<p><strong>Cost example:</strong></p>
<p>With 30 services in one region:</p>
<ul>
<li><p>ECS: 30 separate ECS Services = lots of config hard to manage</p>
</li>
<li><p>EKS: One cluster all services in namespaces manage through GitOps</p>
</li>
</ul>
<p>Here $70/month pays for convenience.</p>
<p>A typical Kubernetes deployment:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Kubernetes Deployment - simpler at scale</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api-gateway</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">production</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">api-gateway</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">api-gateway</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
        <span class="hljs-attr">image:</span> <span class="hljs-string">my-registry/api-gateway:v1.2.3</span>
        <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">3000</span>
        <span class="hljs-attr">resources:</span>
          <span class="hljs-attr">requests:</span>
            <span class="hljs-attr">memory:</span> <span class="hljs-string">"512Mi"</span>
            <span class="hljs-attr">cpu:</span> <span class="hljs-string">"250m"</span>
          <span class="hljs-attr">limits:</span>
            <span class="hljs-attr">memory:</span> <span class="hljs-string">"512Mi"</span>
            <span class="hljs-attr">cpu:</span> <span class="hljs-string">"250m"</span>
</code></pre>
<p>With Kubernetes you get built-in health checks, rolling updates, easy rollbacks.</p>
<h3 id="heading-scenario-2-multi-cloud-or-hybrid-infrastructure"><strong>Scenario 2: Multi-Cloud or Hybrid Infrastructure</strong></h3>
<p>Your company wants:</p>
<ul>
<li><p>Work in AWS and GCP simultaneously</p>
</li>
<li><p>Keep some workloads on-premise</p>
</li>
<li><p>Have ability to migrate between clouds</p>
</li>
</ul>
<p><strong>EKS (Kubernetes) gives portability:</strong></p>
<ul>
<li><p>Same YAML manifests work everywhere</p>
</li>
<li><p>Can move applications between clouds</p>
</li>
<li><p>Standardization across all infra</p>
</li>
</ul>
<p>ECS is AWS only. Can't move it anywhere. (ECS anywhere?! :) )</p>
<h3 id="heading-scenario-3-advanced-features"><strong>Scenario 3: Advanced Features</strong></h3>
<p><strong>GPU workloads for ML/AI:</strong> EKS supports GPU nodes out of the box + all tooling like Kubeflow.</p>
<p><strong>Complex networking policies:</strong> Network Policies in Kubernetes give precise traffic control between pods.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Network Policy example</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api-policy</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">api-gateway</span>
  <span class="hljs-attr">ingress:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">from:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">podSelector:</span>
        <span class="hljs-attr">matchLabels:</span>
          <span class="hljs-attr">app:</span> <span class="hljs-string">frontend</span>
    <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">3000</span>
</code></pre>
<p><strong>Stateful applications:</strong> StatefulSets Persistent Volumes - all this works better in Kubernetes.</p>
<h2 id="heading-practical-deployment-examples"><strong>Practical Deployment Examples</strong></h2>
<p>Enough theory let's get hands dirty. I'll show how to deploy a simple application to both ECS and EKS. Same application to compare.</p>
<p><strong>Our application:</strong> Nginx + simple Node.js API (both need Docker images)</p>
<h3 id="heading-building-docker-images-first"><strong>Building Docker Images First</strong></h3>
<p>Before deploying anywhere you need Docker images. Here's our setup:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Dockerfile for our Node.js app</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-alpine

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm ci --production</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>

<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">3000</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"node"</span>, <span class="hljs-string">"index.js"</span>]</span>
</code></pre>
<p>Build and push:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Build image</span>
docker build -t my-app:latest .

<span class="hljs-comment"># Tag for ECR</span>
docker tag my-app:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest

<span class="hljs-comment"># Push to ECR</span>
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
</code></pre>
<h3 id="heading-ecs-deployment-with-terraform"><strong>ECS Deployment with Terraform</strong></h3>
<p>Let's start with the simpler one - ECS.</p>
<h4 id="heading-step-1-vpc-setup"><strong>Step 1: VPC Setup</strong></h4>
<pre><code class="lang-plaintext"># Create VPC for containers
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
}

# Public subnets
resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}
</code></pre>
<h4 id="heading-step-2-ecs-cluster-and-service"><strong>Step 2: ECS Cluster and Service</strong></h4>
<pre><code class="lang-plaintext"># Create ECS cluster 
resource "aws_ecs_cluster" "main" {
  name = "my-app-cluster"
}

# Task Definition - describes your Docker container
resource "aws_ecs_task_definition" "app" {
  family                   = "my-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"

  execution_role_arn = aws_iam_role.ecs_execution.arn
  task_role_arn      = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([{
    name      = "app"
    image     = "123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest"
    essential = true

    portMappings = [{
      containerPort = 3000
      protocol      = "tcp"
    }]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = "/ecs/my-app"
        "awslogs-region"        = "us-east-1"
        "awslogs-stream-prefix" = "app"
      }
    }
  }])
}

# ECS Service - runs and maintains containers
resource "aws_ecs_service" "app" {
  name            = "my-app-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.public[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = true
  }
}
</code></pre>
<h4 id="heading-step-3-iam-roles"><strong>Step 3: IAM Roles</strong></h4>
<pre><code class="lang-plaintext"># Role for ECS to pull Docker images and write logs
resource "aws_iam_role" "ecs_execution" {
  name = "ecs-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_execution_policy" {
  role       = aws_iam_role.ecs_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Role for your application (e.g., S3 access)
resource "aws_iam_role" "ecs_task" {
  name = "ecs-task-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}
</code></pre>
<h4 id="heading-deploy-it"><strong>Deploy It</strong></h4>
<pre><code class="lang-bash">terraform init
terraform plan
terraform apply
</code></pre>
<p><strong>Done!</strong> Container is running.</p>
<h3 id="heading-eks-deployment-with-terraform"><strong>EKS Deployment with Terraform</strong></h3>
<p>Now the same thing but in EKS.</p>
<h4 id="heading-step-1-eks-cluster"><strong>Step 1: EKS Cluster</strong></h4>
<pre><code class="lang-plaintext"># EKS cluster 
resource "aws_eks_cluster" "main" {
  name     = "my-eks-cluster"
  role_arn = aws_iam_role.eks_cluster.arn
  version  = "1.28"

  vpc_config {
    subnet_ids = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
  }
}

# Worker nodes
resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "main-nodes"
  node_role_arn   = aws_iam_role.eks_node.arn
  subnet_ids      = aws_subnet.private[*].id

  scaling_config {
    desired_size = 2
    max_size     = 4
    min_size     = 1
  }

  instance_types = ["t3.medium"]
}
</code></pre>
<h4 id="heading-step-2-iam-for-eks"><strong>Step 2: IAM for EKS</strong></h4>
<pre><code class="lang-plaintext"># Cluster role
resource "aws_iam_role" "eks_cluster" {
  name = "eks-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

# Node role
resource "aws_iam_role" "eks_node" {
  name = "eks-node-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_cni" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_node.name
}
</code></pre>
<h4 id="heading-step-3-kubernetes-manifests"><strong>Step 3: Kubernetes Manifests</strong></h4>
<p>After cluster is created deploy application with kubectl:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># deployment.yaml</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">2</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">my-app</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">my-app</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
        <span class="hljs-attr">image:</span> <span class="hljs-number">123456789.</span><span class="hljs-string">dkr.ecr.us-east-1.amazonaws.com/my-app:latest</span>
        <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">3000</span>
        <span class="hljs-attr">resources:</span>
          <span class="hljs-attr">requests:</span>
            <span class="hljs-attr">memory:</span> <span class="hljs-string">"512Mi"</span>
            <span class="hljs-attr">cpu:</span> <span class="hljs-string">"250m"</span>
          <span class="hljs-attr">limits:</span>
            <span class="hljs-attr">memory:</span> <span class="hljs-string">"512Mi"</span>
            <span class="hljs-attr">cpu:</span> <span class="hljs-string">"250m"</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-app-service</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">LoadBalancer</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">my-app</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">3000</span>
</code></pre>
<h4 id="heading-deploy-it-1"><strong>Deploy It</strong></h4>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Apply Terraform </span>
terraform init
terraform apply

<span class="hljs-comment"># 2. Configure kubectl</span>
aws eks update-kubeconfig --name my-eks-cluster --region us-east-1

<span class="hljs-comment"># 3. Check nodes</span>
kubectl get nodes

<span class="hljs-comment"># 4. Deploy application</span>
kubectl apply -f deployment.yaml

<span class="hljs-comment"># 5. Check status</span>
kubectl get pods
kubectl get svc
</code></pre>
<p><strong>Difference:</strong></p>
<ul>
<li><p>ECS: one terraform apply and done</p>
</li>
<li><p>EKS: terraform apply + kubectl commands + wait for everything to come up</p>
</li>
</ul>
<h3 id="heading-complexity-comparison"><strong>Complexity Comparison</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Action</td><td>ECS</td><td>EKS</td></tr>
</thead>
<tbody>
<tr>
<td>Config files</td><td>3-4 Terraform files</td><td>4-5 Terraform + YAML manifests</td></tr>
<tr>
<td>First deploy time</td><td>5-7 minutes</td><td>15-20 minutes</td></tr>
<tr>
<td>Commands to run</td><td>2 (init apply)</td><td>5+ (terraform + kubectl)</td></tr>
<tr>
<td>Need to know</td><td>AWS Terraform Docker</td><td>AWS Terraform Kubernetes kubectl Docker</td></tr>
</tbody>
</table>
</div><h2 id="heading-real-cases-and-economics"><strong>Real Cases and Economics</strong></h2>
<p>Let's calculate concrete numbers for typical scenarios.</p>
<h3 id="heading-case-1-startup-with-5-microservices-in-3-regions"><strong>Case 1: Startup with 5 Microservices in 3 Regions</strong></h3>
<p><strong>Requirements:</strong></p>
<ul>
<li><p>5 services (API Workers Background Jobs)</p>
</li>
<li><p>3 regions: US EU Asia</p>
</li>
<li><p>2 instances each service</p>
</li>
<li><p>All need Docker images built and stored in ECR</p>
</li>
</ul>
<p><strong>ECS Fargate:</strong></p>
<pre><code class="lang-plaintext">Cluster cost: $0
ECR storage: ~$5/month (for Docker images)
Compute (Fargate):
  - 5 services × 2 instances × 3 regions = 30 tasks
  - Each task: 0.25 vCPU 512 MB
  - $0.04048/hour per vCPU $0.004445/hour per GB
  - (~0.25 × $0.04048 + 0.5 × $0.004445) × 730 hours = ~$9/task/month
  - 30 tasks × $9 = $270/month

Total: ~$275/month
</code></pre>
<p><strong>EKS:</strong></p>
<pre><code class="lang-plaintext">Cluster cost: $70 × 3 regions = $210/month
ECR storage: ~$5/month (same Docker images)
Compute (EC2 nodes):
  - Minimum 2× t3.medium per region = 6 instances
  - t3.medium = $0.0416/hour × 730 = ~$30/month
  - 6 × $30 = $180/month

Total: $210 + $5 + $180 = $395/month
</code></pre>
<p><strong>Savings with ECS: $120/month or $1440/year</strong></p>
<p>Plus with ECS you don't pay DevOps engineer to manage Kubernetes :)</p>
<h3 id="heading-case-2-large-project-with-30-services-in-1-region"><strong>Case 2: Large Project with 30 Services in 1 Region</strong></h3>
<p><strong>ECS:</strong></p>
<pre><code class="lang-plaintext">Cluster: $0
Management: 30 separate ECS Services (hard to manage!)
Compute: depends on load
</code></pre>
<p><strong>EKS:</strong></p>
<pre><code class="lang-plaintext">Cluster: $70/month
Management: One namespace GitOps Helm (easier!)
Compute: same + better resource utilization
</code></pre>
<p>Here EKS wins on management convenience. $70 pays for itself.</p>
<h3 id="heading-time-for-setup-and-maintenance"><strong>Time for Setup and Maintenance</strong></h3>
<p>Real numbers from my experience:</p>
<p><strong>Initial setup:</strong></p>
<ul>
<li><p>ECS: 4 hours (Terraform + tests + Docker builds)</p>
</li>
<li><p>EKS: 2 days (cluster + addons + monitoring setup + Docker builds)</p>
</li>
</ul>
<p><strong>Weekly maintenance:</strong></p>
<ul>
<li><p>ECS: ~30 minutes (check logs updates)</p>
</li>
<li><p>EKS: ~2 hours (updates cluster checks monitoring)</p>
</li>
</ul>
<p><strong>Platform updates:</strong></p>
<ul>
<li><p>ECS: automatic</p>
</li>
<li><p>EKS: need to update control plane once a year (takes half a day with tests)</p>
</li>
</ul>
<h2 id="heading-decision-checklist-what-to-choose"><strong>Decision Checklist: What to Choose?</strong></h2>
<p>So here's a simple flowchart for decision making:</p>
<h3 id="heading-choose-ecs-if"><strong>Choose ECS if:</strong></h3>
<p>✅ You have less than 10-15 microservices ✅ Project is AWS only (no multi-cloud plans) ✅ Team doesn't know Kubernetes (and doesn't want to learn) ✅ Need to launch quickly (MVP startup) ✅ Budget is limited ✅ Simple application without complex dependencies ✅ Multi-regional deploy (save on clusters) ✅ Comfortable with Docker basics</p>
<h3 id="heading-choose-eks-if"><strong>Choose EKS if:</strong></h3>
<p>✅ More than 20+ microservices ✅ Need portability (multi-cloud hybrid) ✅ Team knows Kubernetes ✅ Need advanced features (service mesh operators) ✅ GPU workloads for ML/AI ✅ Already using Kubernetes elsewhere ✅ Complex microservices architecture ✅ Want access to Kubernetes ecosystem</p>
<h3 id="heading-middle-ground"><strong>Middle Ground</strong></h3>
<p><strong>You can start with ECS and migrate later!</strong></p>
<p>Many companies do this:</p>
<ol>
<li><p>Start on ECS (fast and cheap)</p>
</li>
<li><p>Grow to 15-20 services</p>
</li>
<li><p>Kubernetes developers join the team</p>
</li>
<li><p>Gradually migrate to EKS</p>
</li>
</ol>
<p>This is normal evolution. Don't go Kubernetes just "because it's cool".</p>
<h2 id="heading-conclusions"><strong>Conclusions</strong></h2>
<p>Here's what's important to remember:</p>
<p><strong>ECS is not a "second-rate" option.</strong> It's a full-fledged solution that handles many tasks excellently. Yes EKS is more powerful in capabilities but most projects simply don't need those capabilities.</p>
<p><strong>Main ECS advantages:</strong></p>
<ul>
<li><p>Free control plane (save $70-210+ per month)</p>
</li>
<li><p>Simplicity and launch speed</p>
</li>
<li><p>Less operational overhead</p>
</li>
<li><p>Native AWS integration</p>
</li>
<li><p>Perfect for multi-regional deployment of small services</p>
</li>
</ul>
<p><strong>When EKS is really needed:</strong></p>
<ul>
<li><p>Large scale (20+ services)</p>
</li>
<li><p>Code portability</p>
</li>
<li><p>Advanced Kubernetes features</p>
</li>
<li><p>Already have expertise in team</p>
</li>
</ul>
<p><strong>My advice:</strong> Don't chase the hype. Start with ECS if the task allows. Save time money and nerves. And when you really grow into Kubernetes - then migrate.</p>
<p>Kubernetes is like a Ferrari - cool car but for a trip to the store a regular Toyota works fine. And uses less gas 😄</p>
<h2 id="heading-final-thoughts"><strong>Final Thoughts</strong></h2>
<p>The choice between ECS and EKS isn't about "better" or "worse" - it's about <strong>right tool for the job</strong>.</p>
<p>Start simple. ECS lets you ship fast without the Kubernetes learning curve. Your Docker skills transfer directly. AWS handles the orchestration.</p>
<p>As you grow, reassess. When you hit 15-20 services or need multi-cloud, EKS makes sense. But many successful companies run production on ECS for years.</p>
<p><strong>Remember:</strong> Complexity is a cost. Every abstraction layer you add costs time money and mental overhead. Sometimes the best architecture is the simplest one that works.</p>
<p>Both platforms use Docker. Both run containers. Both scale. The question is: how much complexity do you actually need?</p>
<p>Choose wisely!</p>
<hr />
<h2 id="heading-sources"><strong>Sources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://aws.amazon.com/blogs/containers/amazon-ecs-vs-amazon-eks-making-sense-of-aws-container-services/">AWS Official: ECS vs EKS</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ecs/">Amazon ECS Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/eks/">Amazon EKS Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://registry.terraform.io/modules/terraform-aws-modules/ecs/aws/latest">Terraform AWS ECS Module</a></p>
</li>
<li><p><a target="_blank" href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest">Terraform AWS EKS Module</a></p>
</li>
<li><p><a target="_blank" href="https://blog.1byte.com/ecs-vs-eks/">ECS vs EKS 2025 Comparison</a></p>
</li>
<li><p><a target="_blank" href="https://www.perfectscale.io/blog/eks-vs-ecs">AWS Container Services Guide</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[AWS ECS Evolution: Managed Instances and Advanced Deployment Strategies]]></title><description><![CDATA[The container orchestration landscape on AWS recently received significant enhancements with two major updates to Amazon Elastic Container Service (ECS): the introduction of ECS Managed Instances and built-in support for Linear and Canary deployment ...]]></description><link>https://tgaleev.com/aws-ecs-evolution-managed-instances-and-advanced-deployment-strategies</link><guid isPermaLink="true">https://tgaleev.com/aws-ecs-evolution-managed-instances-and-advanced-deployment-strategies</guid><category><![CDATA[AWS]]></category><category><![CDATA[ECS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Mon, 13 Oct 2025 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763315436893/057c8f54-820e-41de-83f8-8fb91efab0d6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The container orchestration landscape on AWS recently received significant enhancements with two major updates to Amazon Elastic Container Service (ECS): the introduction of ECS Managed Instances and built-in support for Linear and Canary deployment strategies. These features address common operational challenges while providing more flexibility for teams running containerized workloads.</p>
<h2 id="heading-ecs-managed-instances-bridging-the-gap-between-control-and-simplicity">ECS Managed Instances: Bridging the Gap Between Control and Simplicity</h2>
<p>Amazon ECS Managed Instances represents a new compute option that aims to combine the operational simplicity of managed infrastructure with the flexibility of EC2. This offering positions itself between AWS Fargate and self-managed EC2 instances in the ECS ecosystem.</p>
<h3 id="heading-what-makes-it-different">What Makes It Different?</h3>
<p>The key differentiator lies in how it handles infrastructure management. Unlike Fargate, which abstracts away the underlying compute entirely, ECS Managed Instances gives you visibility and control over instance types while AWS handles the operational burden. Unlike traditional EC2-backed ECS clusters, you don't need to manage instance provisioning, scaling, or patching.</p>
<p><strong>Key capabilities include:</strong></p>
<ul>
<li><p><strong>Instance Selection Flexibility</strong>: By default, AWS automatically selects cost-optimized instance types based on your workload requirements. However, you can specify particular instance attributes when needed, including GPU acceleration, specific CPU architectures (ARM/x86), or enhanced networking capabilities.</p>
</li>
<li><p><strong>Task Bin-Packing</strong>: Unlike Fargate's one-task-per-instance model, Managed Instances supports multiple tasks per instance, optimizing resource utilization and potentially reducing costs through better instance consolidation.</p>
</li>
<li><p><strong>Automated Maintenance</strong>: The service implements security patches every 14 days and handles instance lifecycle management. You can schedule maintenance windows using EC2 event windows to minimize application disruption during critical business hours.</p>
</li>
<li><p><strong>Bottlerocket OS</strong>: Instances run on Bottlerocket, AWS's purpose-built container operating system, which provides a minimal attack surface and improved security posture.</p>
</li>
</ul>
<h3 id="heading-understanding-the-cost-model">Understanding the Cost Model</h3>
<p>It's important to note that ECS Managed Instances adds a management fee on top of EC2 instance costs. This charge varies by instance class and size and is billed at on-demand pricing (per second with a one-minute minimum), even if you're using EC2 Savings Plans for the underlying instances. Teams should evaluate whether the operational savings justify the additional cost for their specific workloads.</p>
<h3 id="heading-when-to-choose-ecs-managed-instances">When to Choose ECS Managed Instances</h3>
<p>This option makes sense when you need:</p>
<ul>
<li><p>Access to specific instance types (bare metal, GPU instances, or specialized compute)</p>
</li>
<li><p>Better cost optimization through task bin-packing</p>
</li>
<li><p>EC2-level control without operational overhead</p>
</li>
<li><p>Integration with existing EC2 pricing commitments</p>
</li>
</ul>
<h2 id="heading-advanced-deployment-strategies-linear-and-canary-deployments">Advanced Deployment Strategies: Linear and Canary Deployments</h2>
<p>Alongside Managed Instances, AWS introduced native support for Linear and Canary deployment strategies in ECS, expanding beyond the existing blue/green deployment option. These strategies are available for services using Application Load Balancer (ALB) or ECS Service Connect.</p>
<h3 id="heading-canary-deployments-controlled-risk-exposure">Canary Deployments: Controlled Risk Exposure</h3>
<p>Canary deployments allow you to validate new service revisions with minimal risk by routing a small percentage of production traffic to the new version first.</p>
<p><strong>The deployment process follows a two-step traffic shift:</strong></p>
<ol>
<li><p>Initially shift a configured percentage (e.g., 10%) to the new revision</p>
</li>
<li><p>After the canary bake time completes successfully, shift 100% of remaining traffic</p>
</li>
</ol>
<p>During the canary bake time, both versions run simultaneously, allowing you to monitor metrics, health checks, and application behavior. If issues are detected, you can quickly roll back by shifting traffic back to the original version.</p>
<h3 id="heading-linear-deployments-gradual-traffic-migration">Linear Deployments: Gradual Traffic Migration</h3>
<p>Linear deployments provide a more gradual approach, shifting traffic in equal percentage increments over a specified time period. You configure:</p>
<ul>
<li><p><strong>Step percentage</strong>: How much traffic shifts at each increment (e.g., 10%)</p>
</li>
<li><p><strong>Step bake time</strong>: The wait period between each increment for monitoring</p>
</li>
</ul>
<p>This strategy validates your application at multiple stages with progressively increasing production traffic, providing more data points for validation compared to canary deployments.</p>
<h3 id="heading-deployment-lifecycle-and-monitoring">Deployment Lifecycle and Monitoring</h3>
<p>Both strategies support several critical features:</p>
<ul>
<li><p><strong>Deployment Bake Time</strong>: After all traffic has shifted to the new revision, AWS waits a configurable period before terminating the old revision, enabling quick rollback without downtime if issues emerge.</p>
</li>
<li><p><strong>Lifecycle Hooks</strong>: You can configure Lambda functions to execute at specific deployment stages for automated validation, custom health checks, or integration with external monitoring systems.</p>
</li>
<li><p><strong>CloudWatch Alarm Integration</strong>: Configure automatic rollback triggers based on CloudWatch alarms, enabling automated failure detection and recovery.</p>
</li>
<li><p><strong>Lifecycle Stages</strong>: Each deployment progresses through distinct stages (SCALE_UP, TEST_TRAFFIC_SHIFT, PRODUCTION_TRAFFIC_SHIFT, BAKE_TIME, CLEAN_UP), with each stage lasting up to 24 hours. For CloudFormation deployments, the entire process must complete within 36 hours.</p>
</li>
</ul>
<h3 id="heading-best-practices-for-production-use">Best Practices for Production Use</h3>
<p>When implementing these deployment strategies, consider:</p>
<ol>
<li><p><strong>Start Conservative</strong>: Begin with smaller percentages (5-10% for canary) to minimize impact if issues occur</p>
</li>
<li><p><strong>Sufficient Monitoring</strong>: Ensure your canary percentage generates enough traffic for meaningful validation</p>
</li>
<li><p><strong>Appropriate Bake Times</strong>: Set evaluation periods long enough to capture meaningful performance data (typically 10-30 minutes)</p>
</li>
<li><p><strong>Comprehensive Metrics</strong>: Monitor response time, error rates, throughput, and business-specific metrics</p>
</li>
<li><p><strong>Automated Rollback</strong>: Configure CloudWatch alarms to automatically trigger rollback when metrics exceed thresholds</p>
</li>
</ol>
<h2 id="heading-regional-availability">Regional Availability</h2>
<p>ECS Managed Instances launched in six AWS Regions: US East (North Virginia), US West (Oregon), Europe (Ireland), Africa (Cape Town), Asia Pacific (Singapore), and Asia Pacific (Tokyo).</p>
<p>Linear and Canary deployment strategies are available in all commercial AWS Regions where Amazon ECS is available and can be configured through the Console, SDK, CLI, CloudFormation, CDK, and Terraform.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>These enhancements demonstrate AWS's continued investment in making ECS more flexible and operationally efficient. ECS Managed Instances provides a middle ground between Fargate's simplicity and EC2's control, while the new deployment strategies offer production-grade deployment patterns that many organizations previously had to build themselves.</p>
<p>For teams running containerized workloads on AWS, these features warrant evaluation against existing deployment patterns and infrastructure management practices. The key is understanding your specific requirements around control, cost optimization, and operational complexity to determine which combination of ECS features best serves your needs.</p>
]]></content:encoded></item><item><title><![CDATA[Accelerating Infrastructure as Code Optimization with AI: A Practitioner's Journey with Amazon Q Developer]]></title><description><![CDATA[Introduction
I've been working with Infrastructure as Code for the better part of eight years—starting with CloudFormation, migrating teams to Terraform, and lately exploring AWS CDK. Over that time, I've seen platforms grow from a handful of templat...]]></description><link>https://tgaleev.com/accelerating-infrastructure-as-code-optimization-with-ai-a-practitioners-journey-with-amazon-q-developer</link><guid isPermaLink="true">https://tgaleev.com/accelerating-infrastructure-as-code-optimization-with-ai-a-practitioners-journey-with-amazon-q-developer</guid><category><![CDATA[AWS]]></category><category><![CDATA[Amazon Q]]></category><category><![CDATA[AI]]></category><category><![CDATA[genai]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Thu, 28 Aug 2025 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760653124101/988ff5a0-22af-49a6-a018-72e6acf73137.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>I've been working with Infrastructure as Code for the better part of eight years—starting with CloudFormation, migrating teams to Terraform, and lately exploring AWS CDK. Over that time, I've seen platforms grow from a handful of templates to hundreds of modules scattered across dozens of repositories. I've also watched technical debt accumulate: legacy EC2 instance types chosen three years ago, untagged resources, container images piling up in ECR, and NAT gateways draining budgets while sitting mostly idle.</p>
<p>The traditional FinOps workflow—reactive hunting for idle resources using cost optimization hubs and billing alerts—works, but it's exhausting and slow. I wanted to shift left: catch inefficiencies before they hit production, bake cost and security best practices into the templates themselves, and help platform engineers understand inherited code without spending days spelunking through thousands of lines.</p>
<p>This article documents how I've integrated Amazon Q Developer into my IaC workflow—not as a replacement for human judgment, but as a force multiplier. I'll walk through real scenarios, concrete examples, workflow integration, limitations I've encountered, and a practical framework for measuring impact.</p>
<h2 id="heading-the-pain-points-of-traditional-iac-management"><strong>The Pain Points of Traditional IaC Management</strong></h2>
<p>Before introducing AI assistance, my team faced several recurring bottlenecks:</p>
<p><strong>Legacy comprehension</strong>: Inheriting a 2,000-line Terraform module written by someone who left the company two years ago. No README. Cryptic variable names. Comments? Optional, apparently. Understanding what it does, how components interact, and where optimization opportunities exist consumed days of calendar time.</p>
<p><strong>Migration friction</strong>: Translating a CloudFormation template to CDK or Terraform—or vice versa—is tedious and error-prone. Even straightforward resources involve syntax mapping, API differences, and validation loops. Multiply that by dozens of modules, and migration projects drag on for quarters.</p>
<p><strong>Review latency</strong>: Pull requests with IaC changes sat in queues waiting for someone with enough context to spot that the new RDS instance lacks encryption, or that the NAT gateway could be replaced with VPC endpoints, or that the instance type is three generations old.</p>
<p><strong>Standardization gaps</strong>: Every engineer writes modules slightly differently. Some include lifecycle policies; others don't. Tagging strategies diverge. IAM policies are either too permissive or so locked down they break deployments.</p>
<p><strong>Security and cost blind spots</strong>: Static analysis tools (tfsec, Checkov) catch obvious mistakes, but they don't suggest improvements. They tell you what's wrong, not what could be better. Cost estimation tools (Infracost) show projected spend, but they don't recommend Graviton instances or Spot for batch workloads.</p>
<p><strong>Onboarding friction</strong>: New hires need weeks to become productive with our IaC codebase. The learning curve is steep, and tribal knowledge is poorly documented.</p>
<h2 id="heading-how-amazon-q-developer-fits-in"><strong>How Amazon Q Developer Fits In</strong></h2>
<p>Amazon Q Developer is an AI-powered coding assistant built on over 17 years of AWS cloud experience. It integrates directly into VS Code, JetBrains IDEs, and provides CLI capabilities for automated transformations. It generates deployment-ready infrastructure code for Terraform, AWS CDK, and CloudFormation.</p>
<p>I use it for:</p>
<ul>
<li><p><strong>Code comprehension</strong>: Summarizing what a template does, mapping resource dependencies, identifying entry points.</p>
</li>
<li><p><strong>Optimization discovery</strong>: Scanning templates for cost, security, and performance improvements aligned with AWS Well-Architected Framework.</p>
</li>
<li><p><strong>IaC transformation</strong>: Automated translation between IaC frameworks (Terraform ↔ CDK ↔ CloudFormation) using the four-step process: assess, translate, test and refine, deploy.</p>
</li>
<li><p><strong>Module generation</strong>: Creating deployment-ready modules from natural language requirements with built-in AWS best practices.</p>
</li>
<li><p><strong>Pull request reviews</strong>: Analyzing diffs, flagging risks, suggesting improvements based on AWS standards.</p>
</li>
<li><p><strong>Custom rule enforcement</strong>: Using rule-based automation to encode team standards and ensure consistent, repeatable suggestions.</p>
</li>
</ul>
<p>According to AWS internal testing, Amazon Q's agentic capabilities deliver <strong>10x-50x time savings</strong> for legacy IaC remediation compared to manual processes. For VMware network migrations, AWS teams translated configurations for 500 VMs in 1 hour—<strong>80 times faster</strong> than the traditional 2-week manual approach.</p>
<p>I treat Q as a highly skilled junior engineer: fast, knowledgeable, but requiring validation and context.</p>
<h2 id="heading-end-to-end-workflow-integration"><strong>End-to-End Workflow Integration</strong></h2>
<p>Here's how Amazon Q fits into my current IaC lifecycle:</p>
<h3 id="heading-1-local-development-vs-code-amazon-q"><strong>1. Local Development (VS Code + Amazon Q)</strong></h3>
<ul>
<li><p>Open a CDK stack or Terraform module</p>
</li>
<li><p>Prompt Q: <em>"Review this file and identify opportunities to optimize for cost efficiency"</em></p>
</li>
<li><p>Q returns recommendations: instance type downsizing, ECR lifecycle policies, Graviton migration paths, NAT gateway elimination, subnet configuration changes</p>
</li>
<li><p>I validate recommendations against workload requirements, commitments, and architectural constraints</p>
</li>
<li><p>Implement approved changes with Q's assistance (it can write the code inline)</p>
</li>
</ul>
<h3 id="heading-2-static-analysis"><strong>2. Static Analysis</strong></h3>
<ul>
<li><p>Run Checkov, tfsec, or cfn-lint locally</p>
</li>
<li><p>If violations appear, I prompt Q: <em>"Fix the security issues flagged by Checkov in this file"</em></p>
</li>
<li><p>Q suggests remediation (e.g., enable encryption, add bucket policies, restrict ingress rules)</p>
</li>
</ul>
<h3 id="heading-3-policy-as-code-validation"><strong>3. Policy-as-Code Validation</strong></h3>
<ul>
<li><p>Apply OPA/Conftest or CloudFormation Guard policies</p>
</li>
<li><p>For failures, I ask Q to explain the policy intent and adjust the template accordingly</p>
</li>
<li><p>Example policy (Rego):</p>
<pre><code class="lang-rego">  <span class="hljs-keyword">package</span> <span class="hljs-variable">terraform</span>.tags
  deny<span class="hljs-punctuation">[msg]</span> <span class="hljs-punctuation">{
</span>    <span class="hljs-variable">input</span>.resource_type == <span class="hljs-string">"aws_instance"</span>
    <span class="hljs-keyword">not</span> <span class="hljs-variable">input</span>.<span class="hljs-variable">tags</span>.Environment
    msg = <span class="hljs-string">"Missing required Environment tag"</span>
  <span class="hljs-punctuation">}</span>
</code></pre>
</li>
</ul>
<h3 id="heading-4-cost-estimation"><strong>4. Cost Estimation</strong></h3>
<ul>
<li><p>Run Infracost to project monthly spend</p>
</li>
<li><p>If costs are higher than expected, I prompt Q: <em>"Suggest ways to reduce cost for this infrastructure while maintaining performance"</em></p>
</li>
<li><p>Q might recommend reserved capacity, Savings Plans eligibility, or Graviton alternatives</p>
</li>
</ul>
<h3 id="heading-5-cicd-pipeline-gates"><strong>5. CI/CD Pipeline Gates</strong></h3>
<ul>
<li><p>Pre-commit hooks run formatters (terraform fmt, prettier)</p>
</li>
<li><p>GitHub Actions execute tests, static analysis, policy checks, Infracost diff</p>
</li>
<li><p>If checks fail, the pipeline surfaces Q-generated suggestions in PR comments (I've scripted this using Q's API)</p>
</li>
</ul>
<h3 id="heading-6-pull-request-review"><strong>6. Pull Request Review</strong></h3>
<ul>
<li><p>Reviewers use Q to summarize changes: <em>"Explain what this PR changes and flag any cost, security, or operational risks"</em></p>
</li>
<li><p>Q highlights: <em>"This PR introduces a new RDS instance without backup retention configured. Estimated monthly cost increase: $150. Consider enabling automated backups and Multi-AZ for production."</em></p>
</li>
</ul>
<h3 id="heading-7-post-deployment"><strong>7. Post-Deployment</strong></h3>
<ul>
<li><p>Monitor drift using AWS Config or Terraform state comparisons</p>
</li>
<li><p>Track cost variance via Cost Explorer</p>
</li>
<li><p>Feed learnings back into Q prompts for future iterations</p>
</li>
</ul>
<h2 id="heading-practical-example-1-generating-a-terraform-module-from-requirements"><strong>Practical Example 1: Generating a Terraform Module from Requirements</strong></h2>
<p><strong>Scenario</strong>: I need a reusable Terraform module for an S3 bucket with versioning, lifecycle rules, encryption, and least-privilege IAM access.</p>
<p><strong>Prompt</strong>:</p>
<pre><code class="lang-plaintext">Create a Terraform module for an S3 bucket with:
- Versioning enabled
- Lifecycle policy: expire non-current versions after 90 days
- AES-256 encryption at rest
- Block public access
- IAM policy allowing read-only access for a Lambda execution role
</code></pre>
<p><strong>Q Output</strong> (trimmed):</p>
<pre><code class="lang-plaintext">resource "aws_s3_bucket" "main" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_versioning" "main" {
  bucket = aws_s3_bucket.main.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "main" {
  bucket = aws_s3_bucket.main.id
  rule {
    id     = "expire-noncurrent"
    status = "Enabled"
    noncurrent_version_expiration {
      noncurrent_days = 90
    }
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "main" {
  bucket = aws_s3_bucket.main.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "main" {
  bucket                  = aws_s3_bucket.main.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}
</code></pre>
<p><strong>Outcome</strong>: In 30 seconds, I have a production-ready module skeleton. I add variables, outputs, and documentation, then commit. Time saved: 45 minutes.</p>
<h2 id="heading-practical-example-2-refactoring-terraform-to-cdk"><strong>Practical Example 2: Refactoring Terraform to CDK</strong></h2>
<p><strong>Scenario</strong>: A team wants to migrate a Terraform module to CDK (TypeScript) for better IDE support and type safety.</p>
<p><strong>Original Terraform</strong> (simplified):</p>
<pre><code class="lang-plaintext">resource "aws_lambda_function" "processor" {
  function_name = "data-processor"
  runtime       = "python3.9"
  handler       = "index.handler"
  role          = aws_iam_role.lambda_role.arn
  filename      = "function.zip"
}

resource "aws_iam_role" "lambda_role" {
  name = "lambda-exec-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}
</code></pre>
<p><strong>Prompt</strong>:</p>
<pre><code class="lang-plaintext">Translate this Terraform configuration to AWS CDK TypeScript.
Use best practices: managed policies, environment variables, and proper construct hierarchy.
</code></pre>
<p><strong>Q Output</strong> (trimmed):</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> cdk <span class="hljs-keyword">from</span> <span class="hljs-string">'aws-cdk-lib'</span>;
<span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> lambda <span class="hljs-keyword">from</span> <span class="hljs-string">'aws-cdk-lib/aws-lambda'</span>;
<span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> iam <span class="hljs-keyword">from</span> <span class="hljs-string">'aws-cdk-lib/aws-iam'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> DataProcessorStack <span class="hljs-keyword">extends</span> cdk.Stack {
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">scope: cdk.App, id: <span class="hljs-built_in">string</span>, props?: cdk.StackProps</span>) {
    <span class="hljs-built_in">super</span>(scope, id, props);

    <span class="hljs-keyword">const</span> lambdaRole = <span class="hljs-keyword">new</span> iam.Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LambdaRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> iam.ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
      managedPolicies: [
        iam.ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AWSLambdaBasicExecutionRole'</span>)
      ]
    });

    <span class="hljs-keyword">const</span> processor = <span class="hljs-keyword">new</span> lambda.Function(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Processor'</span>, {
      runtime: lambda.Runtime.PYTHON_3_9,
      handler: <span class="hljs-string">'index.handler'</span>,
      code: lambda.Code.fromAsset(<span class="hljs-string">'function.zip'</span>),
      role: lambdaRole,
      functionName: <span class="hljs-string">'data-processor'</span>
    });
  }
}
</code></pre>
<p><strong>Outcome</strong>: Migration skeleton in under a minute. I adjust asset paths, add CloudWatch alarms, refine IAM permissions. Time saved per module: 1-2 hours.</p>
<h2 id="heading-practical-example-3-pull-request-review-with-amazon-q"><strong>Practical Example 3: Pull Request Review with Amazon Q</strong></h2>
<p><strong>Scenario</strong>: A PR introduces a new ECS service with Fargate tasks. Reviewer uses Q to assess impact.</p>
<p><strong>Prompt</strong>:</p>
<pre><code class="lang-plaintext">Review this CDK stack diff. Summarize changes, identify cost implications,
flag security or operational risks.
</code></pre>
<p><strong>Q Analysis</strong>:</p>
<blockquote>
<p>This PR creates:</p>
<ul>
<li><p>ECS Fargate service with 2 tasks (2 vCPU, 4GB RAM each)</p>
</li>
<li><p>Application Load Balancer</p>
</li>
<li><p>CloudWatch log group (no retention configured)</p>
</li>
</ul>
<p><strong>Cost estimate</strong>: ~$95/month (Fargate) + ~$25/month (ALB) = $120/month</p>
<p><strong>Risks</strong>:</p>
<ul>
<li><p>Log group has no retention policy → unbounded storage costs</p>
</li>
<li><p>Tasks use x86_64 architecture; Graviton would reduce cost ~20%</p>
</li>
<li><p>No autoscaling configured; manual intervention required under load</p>
</li>
<li><p>ALB security group allows 0.0.0.0/0 ingress on port 443 (verify if intentional)</p>
</li>
</ul>
<p><strong>Recommendations</strong>:</p>
<ul>
<li><p>Set log retention to 30 days</p>
</li>
<li><p>Evaluate ARM64/Graviton compatibility</p>
</li>
<li><p>Add autoscaling with target tracking</p>
</li>
<li><p>Restrict ALB ingress to known CIDR ranges or CloudFront</p>
</li>
</ul>
</blockquote>
<p><strong>Outcome</strong>: Reviewer approves with conditions. Author updates the stack. Review cycle time: 15 minutes instead of 2 hours.</p>
<h2 id="heading-security-compliance-and-quality-integration"><strong>Security, Compliance, and Quality Integration</strong></h2>
<p>Amazon Q doesn't replace security tooling—it augments it.</p>
<h3 id="heading-iam-least-privilege"><strong>IAM Least Privilege</strong></h3>
<p>I prompt Q: <em>"Review this IAM policy and restrict to least privilege for a Lambda reading from S3 and writing to DynamoDB."</em></p>
<p>Q tightens wildcards, removes unnecessary actions, adds conditions for resource tagging.</p>
<h3 id="heading-secrets-hygiene"><strong>Secrets Hygiene</strong></h3>
<p>Q flags hardcoded credentials or API keys during reviews. I pair this with git-secrets and AWS Secrets Manager integration.</p>
<h3 id="heading-drift-detection"><strong>Drift Detection</strong></h3>
<p>After deployments, I compare actual infrastructure (via AWS Config or Terraform state) against source templates. If drift occurs, I ask Q: <em>"Why might this resource configuration differ from the template?"</em> It helps hypothesize causes (manual changes, out-of-band automation, CloudFormation stack updates).</p>
<h3 id="heading-policy-as-code"><strong>Policy-as-Code</strong></h3>
<p>I maintain Conftest policies (OPA/Rego) for tagging, encryption, and network segmentation. When policies fail, Q explains the rule intent and suggests compliant configurations.</p>
<h3 id="heading-cost-guardrails"><strong>Cost Guardrails</strong></h3>
<p>I integrate Infracost in CI and set thresholds (e.g., no PR increasing monthly cost by &gt;$500 without approval). Q helps identify cost drivers and alternatives.</p>
<h2 id="heading-repository-improvement-plan-prioritized"><strong>Repository Improvement Plan (Prioritized)</strong></h2>
<p>If I were assessing a typical IaC codebase today, here's what I'd prioritize:</p>
<ol>
<li><p><strong>Add</strong> (High Priority):</p>
<ul>
<li><p>Pre-commit hooks: terraform fmt, tflint, Checkov</p>
</li>
<li><p>Infracost integration in CI</p>
</li>
<li><p>Basic Conftest policies (tagging, encryption)</p>
</li>
<li><p>ECR lifecycle policies across all container builds</p>
</li>
<li><p>Automated README generation (Q can draft from code)</p>
</li>
</ul>
</li>
<li><p><strong>Refactor</strong> (Medium Priority):</p>
<ul>
<li><p>Consolidate duplicate modules</p>
</li>
<li><p>Standardize naming conventions (use Q to generate renaming scripts)</p>
</li>
<li><p>Migrate legacy instance types to Graviton where compatible</p>
</li>
<li><p>Replace NAT gateways with VPC endpoints for AWS services</p>
</li>
</ul>
</li>
<li><p><strong>Harden</strong> (Medium Priority):</p>
<ul>
<li><p>IAM policy reviews (Q-assisted least privilege tightening)</p>
</li>
<li><p>Enable Terraform state locking (DynamoDB + S3)</p>
</li>
<li><p>Add drift detection automation (aws-config or Terraform Cloud)</p>
</li>
<li><p>Implement environment-specific configurations (dev/stage/prod variants)</p>
</li>
</ul>
</li>
<li><p><strong>Automate</strong> (Lower Priority, High Impact):</p>
<ul>
<li><p>Q-generated PR comment summaries (cost/security/drift)</p>
</li>
<li><p>Automated documentation updates on merge</p>
</li>
<li><p>Ephemeral preview environments for PRs (using Terraform workspaces or CDK context)</p>
</li>
</ul>
</li>
<li><p><strong>Measure</strong> (Ongoing):</p>
<ul>
<li><p>Track PR review time and test coverage improvements</p>
</li>
<li><p>Iterate on Q prompts based on false positives/negatives</p>
</li>
<li><p>Refine policy-as-code rules based on team feedback</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-skepticism-and-limitations"><strong>Skepticism and Limitations</strong></h2>
<p>I've hit real limitations with Q:</p>
<p><strong>Hallucinations</strong>: Q occasionally invents AWS resource properties that don't exist (e.g., fictional CloudFormation parameters). Always validate against official docs.</p>
<p><strong>Context window</strong>: For massive monorepo structures, Q loses context. I work around this by targeting specific files or summarizing first.</p>
<p><strong>Organizational standards</strong>: Q doesn't know your company's naming conventions, approved instance families, or compliance requirements unless you explicitly provide them in prompts or customization files.</p>
<p><strong>Noisy recommendations</strong>: Q sometimes suggests optimizations that conflict with architectural decisions (e.g., recommending smaller instances when you've standardized on m5.large for operational simplicity). Filtering signal from noise requires domain knowledge.</p>
<p><strong>Overfitting to public examples</strong>: Q trained on public repos. If your IaC patterns are highly proprietary or unconventional, its suggestions may miss the mark.</p>
<p><strong>Human validation is non-negotiable</strong>: I never merge Q-generated code without review, testing, and static analysis. Treat Q as a draft generator, not a replacement for engineering judgment.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Amazon Q Developer has changed how I work with infrastructure code. It doesn't replace engineering judgment, but it handles the tedious parts—reading legacy code, translating between IaC languages, spotting optimization opportunities, and catching security issues early.</p>
<p>The biggest wins for me have been:</p>
<ul>
<li><p>Understanding inherited codebases in minutes instead of days</p>
</li>
<li><p>Generating module skeletons from plain English requirements</p>
</li>
<li><p>Cutting PR review time by helping reviewers quickly understand changes and impacts</p>
</li>
<li><p>Catching cost and security issues before they reach production</p>
</li>
</ul>
<p>The key is treating Q as a tool, not a magic solution. I always validate its suggestions, test changes thoroughly, and integrate it with existing tooling like static analysis and policy checks.</p>
<p>If you're considering trying it, start small: pick one messy legacy file, ask Q to explain it, and see what optimization opportunities it finds. Install the VS Code extension (there's a free tier), experiment with prompts, and adjust based on what works for your workflow.</p>
<p>The goal isn't perfection—it's making IaC work less frustrating and more efficient, one template at a time.</p>
]]></content:encoded></item><item><title><![CDATA[Kiro: Bridging the Gap Between AI Prototyping and Production-Ready Code]]></title><description><![CDATA[As DevOps engineers and Cloud Architects, we have all experienced the excitement of using AI coding assistants to rapidly prototype applications. A few prompts later, you have working code. But then reality hits: deploying to production requires docu...]]></description><link>https://tgaleev.com/kiro-bridging-the-gap-between-ai-prototyping-and-production-ready-code</link><guid isPermaLink="true">https://tgaleev.com/kiro-bridging-the-gap-between-ai-prototyping-and-production-ready-code</guid><category><![CDATA[AWS]]></category><category><![CDATA[Kiro]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Sun, 27 Jul 2025 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763316951248/97a7c8e9-d584-4e09-8a40-52826cbfbffb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As DevOps engineers and Cloud Architects, we have all experienced the excitement of using AI coding assistants to rapidly prototype applications. A few prompts later, you have working code. But then reality hits: deploying to production requires documentation, proper architecture decisions, testing strategies, and maintainability considerations that AI-generated prototypes often lack.</p>
<h2 id="heading-the-production-problem">The Production Problem</h2>
<p>AI coding tools excel at quick prototyping—what some call "vibe coding." However, getting these prototypes production-ready presents challenges:</p>
<ul>
<li><p><strong>Undocumented assumptions</strong>: The AI made decisions during development, but those choices aren't captured anywhere</p>
</li>
<li><p><strong>Missing requirements clarity</strong>: You guided the agent throughout, but fuzzy requirements mean you can't verify if the application truly meets needs</p>
</li>
<li><p><strong>Architecture blindspots</strong>: Understanding how the system design affects performance, scalability, and your infrastructure isn't immediately clear</p>
</li>
<li><p><strong>Maintenance difficulties</strong>: Without proper documentation and structure, future changes become increasingly complex</p>
</li>
</ul>
<h2 id="heading-enter-kiro-spec-driven-development-meets-ai">Enter Kiro: Spec-Driven Development Meets AI</h2>
<p>Kiro is a new agentic IDE :) that tackles these challenges through spec-driven development. Rather than jumping straight to code, Kiro helps you think through decisions systematically while maintaining the speed of AI-assisted development.</p>
<h3 id="heading-key-features">Key Features</h3>
<p><strong>1. Requirements Specification</strong></p>
<p>Kiro transforms a simple prompt like "Add a review system for products" into detailed user stories with EARS (Easy Approach to Requirements Syntax) acceptance criteria. This makes implicit assumptions explicit, ensuring the AI builds what you actually need—not what it thinks you need.</p>
<p><strong>2. Technical Design Documentation</strong></p>
<p>After requirements approval, Kiro analyzes your codebase and generates comprehensive design documents including:</p>
<ul>
<li><p>Data flow diagrams</p>
</li>
<li><p>TypeScript interfaces and type definitions</p>
</li>
<li><p>Database schemas</p>
</li>
<li><p>API endpoint specifications</p>
</li>
</ul>
<p>This eliminates the typical back-and-forth on requirements clarity that slows down development cycles.</p>
<p><strong>3. Task Decomposition and Sequencing</strong></p>
<p>Kiro automatically generates implementation tasks with proper dependency ordering. Each task includes considerations often missed in quick prototypes:</p>
<ul>
<li><p>Unit and integration tests</p>
</li>
<li><p>Loading states and error handling</p>
</li>
<li><p>Mobile responsiveness</p>
</li>
<li><p>Accessibility requirements (WCAG compliance)</p>
</li>
</ul>
<p><strong>4. Agent Hooks for Automation</strong></p>
<p>Hooks are event-driven automations that act like an experienced team member catching issues in the background:</p>
<ul>
<li><p>Update tests automatically when components change</p>
</li>
<li><p>Refresh API documentation when endpoints are modified</p>
</li>
<li><p>Scan for security issues before commits</p>
</li>
<li><p>Enforce coding standards across the entire team</p>
</li>
</ul>
<p>These hooks commit to Git, ensuring consistent quality checks across all developers.</p>
<h2 id="heading-why-this-matters-for-devops-and-cloud-architecture">Why This Matters for DevOps and Cloud Architecture</h2>
<p>For those of us managing infrastructure and deployment pipelines, Kiro addresses several pain points:</p>
<p><strong>Infrastructure as Code Compatibility</strong>: Spec-driven development aligns naturally with IaC practices. Design documents provide the clarity needed for proper resource planning and cost optimization.</p>
<p><strong>CI/CD Integration</strong>: Automated test generation and security scanning hooks integrate seamlessly into existing pipelines, reducing manual review overhead.</p>
<p><strong>Documentation Drift Prevention</strong>: Kiro keeps specs synchronized with code changes—solving the eternal problem of outdated documentation that complicates infrastructure modifications.</p>
<p><strong>Team Consistency</strong>: When managing multiple services or microservices architectures, enforcing standards through hooks ensures uniform code quality across repositories.</p>
<h2 id="heading-technical-details">Technical Details</h2>
<ul>
<li><p>Built on Code OSS (VS Code compatible)</p>
</li>
<li><p>Supports Model Context Protocol (MCP) for specialized tool integration</p>
</li>
<li><p>Works with Open VSX plugins</p>
</li>
<li><p>Available for Mac, Windows, and Linux</p>
</li>
<li><p>Supports most popular programming languages</p>
</li>
<li><p>Free during preview period</p>
</li>
</ul>
<h2 id="heading-the-bigger-picture">The Bigger Picture</h2>
<p>While Kiro isn't specifically an AWS or cloud tool, its approach to structured development, automated quality checks, and documentation maintenance addresses fundamental challenges in modern software delivery—challenges that become amplified when deploying to cloud environments where misconfigurations can have immediate cost and security implications.</p>
<p>For DevOps practitioners and cloud architects, Kiro represents a shift from treating AI coding assistants as simple code generators to treating them as collaborative partners in the entire development lifecycle—from requirements gathering through production deployment.</p>
]]></content:encoded></item><item><title><![CDATA[Building an AI-Optimized Platform on Amazon EKS with NVIDIA NIM and OpenAI Models]]></title><description><![CDATA[Introduction
The rise of artificial intelligence (AI) has brought about an unprecedented demand for infrastructure that can handle large-scale computations, support GPU acceleration, and provide scalable, flexible management of workloads. Kubernetes ...]]></description><link>https://tgaleev.com/building-an-ai-optimized-platform-on-amazon-eks-with-nvidia-nim-and-openai-models</link><guid isPermaLink="true">https://tgaleev.com/building-an-ai-optimized-platform-on-amazon-eks-with-nvidia-nim-and-openai-models</guid><category><![CDATA[AWS]]></category><category><![CDATA[EKS]]></category><category><![CDATA[NVIDIA]]></category><category><![CDATA[AI]]></category><category><![CDATA[genai]]></category><category><![CDATA[Nim]]></category><category><![CDATA[llm]]></category><category><![CDATA[models]]></category><category><![CDATA[vpc]]></category><category><![CDATA[EFS]]></category><category><![CDATA[ec2]]></category><category><![CDATA[karpenter]]></category><category><![CDATA[GPU, NVIDIA, AMD]]></category><category><![CDATA[GPU]]></category><category><![CDATA[Grafana]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 18 Dec 2024 21:08:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734555975245/1154ae81-3a3d-4ab8-99c4-97e73b91873d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>The rise of artificial intelligence (AI) has brought about an unprecedented demand for infrastructure that can handle large-scale computations, support GPU acceleration, and provide scalable, flexible management of workloads. Kubernetes has emerged as a leading platform for orchestrating these workloads, and Amazon Elastic Kubernetes Service (EKS) extends Kubernetes’ capabilities by simplifying deployment and scaling in the cloud.</p>
<p>NVIDIA Infrastructure Manager (NIM) complements Kubernetes by optimizing GPU workloads, a critical need for training large language models (LLMs), computer vision, and other computationally intensive AI tasks. Additionally, OpenAI models can be integrated into this ecosystem to unlock cutting-edge AI capabilities, such as text generation, image recognition, and decision-making systems.</p>
<p>This article provides an in-depth guide to building a complete AI platform using EKS, NVIDIA NIM, and OpenAI models, with Terraform automating the deployment. Whether you are an AI researcher or a business looking to adopt AI, this guide outlines how to build a robust and scalable platform. Complete code for this setup is available on <a target="_blank" href="https://github.com/timurgaleev/eks-nim-llm-openai">GitHub</a> <a target="_blank" href="https://github.com/timurgaleev/eks-nim-llm-openai">https://github.com/timurgaleev/eks-nim-llm-openai.</a></p>
<hr />
<h2 id="heading-why-choose-nvidia-nim-and-eks-for-ai-workloads">Why Choose NVIDIA NIM and EKS for AI Workloads?</h2>
<h3 id="heading-challenges-of-ai-workloads">Challenges of AI Workloads</h3>
<p>AI applications, especially those involving LLMs, have unique challenges:</p>
<ul>
<li><p><strong>GPU Resource Management</strong>: Training and inference rely on GPUs, which are scarce and expensive resources. Efficient allocation and monitoring are crucial.</p>
</li>
<li><p><strong>Scalability</strong>: AI workloads often need to scale dynamically based on user demand or data processing requirements.</p>
</li>
<li><p><strong>Storage for Large Datasets</strong>: AI models and datasets can require hundreds of gigabytes, necessitating persistent, shared, and scalable storage.</p>
</li>
<li><p><strong>Observability</strong>: Monitoring system performance, especially GPU utilization and latency, is essential for optimizing workloads.</p>
</li>
</ul>
<h3 id="heading-nvidia-nim-a-solution-for-gpu-workloads">NVIDIA NIM: A Solution for GPU Workloads</h3>
<p>NVIDIA NIM addresses these challenges by providing:</p>
<ol>
<li><p><strong>GPU Scheduling</strong>: Maximizes GPU usage across workloads.</p>
</li>
<li><p><strong>Integration with Kubernetes</strong>: Leverages Kubernetes to manage pods, jobs, and resources efficiently.</p>
</li>
<li><p><strong>AI Model Management</strong>: Simplifies deployment and scaling of AI models with Helm charts and Kubernetes CRDs (Custom Resource Definitions).</p>
</li>
<li><p><strong>Support for Persistent Storage</strong>: Integrates with shared storage solutions like AWS EFS for storing datasets and models.</p>
</li>
</ol>
<h3 id="heading-amazon-eks-a-scalable-kubernetes-solution">Amazon EKS: A Scalable Kubernetes Solution</h3>
<p>Amazon EKS adds value by:</p>
<ol>
<li><p><strong>Managed Kubernetes</strong>: Reduces operational overhead by handling Kubernetes cluster setup, updates, and management.</p>
</li>
<li><p><strong>Elastic Compute Integration:</strong> Dynamically provisions GPU-enabled instances, such as g4dn and p4d, to handle AI workloads. Ensure that your AWS account has sufficient quotas and availability for these instance types to avoid provisioning issues.</p>
</li>
<li><p><strong>Built-in Security</strong>: Integrates with AWS IAM and VPC for secure access and network segmentation.</p>
</li>
</ol>
<p>Together, NVIDIA NIM and Amazon EKS create a powerful platform for AI model training, inference, and experimentation.</p>
<hr />
<h2 id="heading-architecture-overview">Architecture Overview</h2>
<p>The platform architecture integrates NVIDIA NIM and OpenAI models into an EKS cluster, combining compute, storage, and monitoring components.</p>
<h3 id="heading-key-components">Key Components</h3>
<ol>
<li><p><strong>EKS Cluster</strong>: Manages Kubernetes workloads and scales GPU-enabled nodes.</p>
</li>
<li><p><strong>Karpenter</strong>: Dynamically provisions and scales nodes (CPU and GPU) based on workload demands, optimizing resource utilization and cost.</p>
</li>
<li><p><strong>GPU Node Groups</strong>: Nodes equipped with NVIDIA GPUs for ML and AI inference tasks.</p>
</li>
<li><p><strong>NVIDIA NIM</strong>: Deploys GPU workloads, manages AI pipelines, and integrates with Kubernetes.</p>
</li>
<li><p><strong>OpenAI Web UI</strong>: Provides a user-friendly interface for interacting with AI models.</p>
</li>
<li><p><strong>Persistent Storage</strong>: AWS EFS supports shared storage for datasets and models.</p>
</li>
<li><p><strong>Observability Tools</strong>: Prometheus and Grafana offer real-time monitoring of system metrics, including GPU utilization and pod performance.</p>
</li>
</ol>
<hr />
<h2 id="heading-deployment-guide">Deployment Guide</h2>
<p>This guide provides step-by-step instructions to deploy the architecture using Terraform. While the focus is on essential components like EKS, GPU workloads, and observability, we skip detailed VPC configuration to allow flexibility based on your specific requirements.</p>
<p>For a VPC example that fits this deployment, refer to the repository: <a target="_blank" href="https://github.com/timurgaleev/eks-nim-llm-openai">https://github.com/timurgaleev/eks-nim-llm-openai</a>.</p>
<h3 id="heading-step-1-provisioning-the-eks-cluster">Step 1: Provisioning the EKS Cluster</h3>
<p>Provisioning an Amazon EKS cluster is the foundation for Kubernetes workloads. Below is the <strong>EKS Cluster Configuration</strong> with key highlights to focus on scalability, system add-ons, and Karpenter integration.</p>
<hr />
<h4 id="heading-eks-cluster-configuration"><strong>EKS Cluster Configuration</strong></h4>
<pre><code class="lang-plaintext">module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~&gt; 19.15"

  cluster_name                   = local.name
  cluster_version                = var.eks_cluster_version
  cluster_endpoint_public_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = compact([
    for subnet_id, cidr_block in zipmap(module.vpc.private_subnets, module.vpc.private_subnets_cidr_blocks) :
    substr(cidr_block, 0, 4) == "100." ? subnet_id : null
  ])

  manage_aws_auth_configmap = true
  aws_auth_roles = [
    {
      rolearn  = module.eks_blueprints_addons.karpenter.node_iam_role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes"
      ]
    }
  ]

  eks_managed_node_group_defaults = {
    iam_role_additional_policies = {
      AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
    }
    ebs_optimized = true
    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs = {
          volume_size = 100
          volume_type = "gp3"
        }
      }
    }
  }

  eks_managed_node_groups = {
    core_node_group = {
      name            = "core-node-group"
      description     = "EKS Core node group for hosting system add-ons"
      subnet_ids      = compact([
        for subnet_id, cidr_block in zipmap(module.vpc.private_subnets, module.vpc.private_subnets_cidr_blocks) :
        substr(cidr_block, 0, 4) == "100." ? subnet_id : null
      ])
      ami_type        = "AL2_x86_64"
      instance_types  = ["m5.xlarge"]
      capacity_type   = "SPOT"
      desired_size    = 2
      min_size        = 2
      max_size        = 4
      labels = {
        WorkerType    = "SPOT"
        NodeGroupType = "core"
      }
      tags = merge(local.tags, { Name = "core-node-grp" })
    }
  }
}
</code></pre>
<hr />
<h3 id="heading-key-highlights"><strong>Key Highlights</strong></h3>
<ol>
<li><p><strong>Networking</strong>:</p>
<ul>
<li>Subnets are filtered to include only CIDR blocks starting with <code>100.</code> to ensure specific subnet assignment for nodes.</li>
</ul>
</li>
<li><p><strong>IAM and Auth</strong>:</p>
<ul>
<li>Integration with <strong>Karpenter</strong> is configured via the <code>aws_auth_roles</code> block, allowing Karpenter to dynamically provision nodes.</li>
</ul>
</li>
<li><p><strong>Managed Node Groups</strong>:</p>
<ul>
<li><p><strong>Core Node Group</strong>:</p>
<ul>
<li><p>Optimized for system-level workloads.</p>
</li>
<li><p>Configured with <code>m5.xlarge</code> spot instances for cost efficiency.</p>
</li>
<li><p>Labels such as <code>NodeGroupType: core</code> and taints can be used to restrict workloads to this node group.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Storage</strong>:</p>
<ul>
<li>Nodes are configured with <code>gp3</code> root volumes (100 GiB) for system usage. Additional storage for workloads should be configured separately.</li>
</ul>
</li>
<li><p><strong>Scaling</strong>:</p>
<ul>
<li>Use Karpenter for workload-based scaling instead of additional managed node groups. The <code>eks_managed_node_groups</code> block here is only for critical system workloads.</li>
</ul>
</li>
</ol>
<hr />
<h3 id="heading-step-2-deploying-nvidia-nim-for-ai-workloads">Step 2: Deploying NVIDIA NIM for AI Workloads</h3>
<p>Deploying NVIDIA NIM (NVIDIA Inference Manager) requires configuring persistent storage for large datasets and allocating GPU resources for optimal performance. Here's an expanded guide breaking down the essential steps.</p>
<hr />
<h4 id="heading-1-persistent-storage-with-aws-efs"><strong>1. Persistent Storage with AWS EFS</strong></h4>
<p>AI workloads often require storage that exceeds local node capacity. <strong>AWS EFS (Elastic File System)</strong> provides a shared and scalable storage solution across multiple pods. Below is the configuration for creating a Persistent Volume Claim (PVC) backed by EFS:</p>
<h5 id="heading-code-persistent-volume-claim-pvc"><strong>Code: Persistent Volume Claim (PVC)</strong></h5>
<pre><code class="lang-plaintext">kubernetes_persistent_volume_claim_v1 "efs_pvc" {
  metadata {
    name      = "efs-storage"
    namespace = "nim"
  }
  spec {
    access_modes       = ["ReadWriteMany"] # Enables sharing storage across multiple pods.
    storage_class_name = "efs"             # Links the PVC to an EFS storage class.
    resources {
      requests = {
        storage = "200Gi" # Reserves 200 GiB of scalable storage.
      }
    }
  }
}
</code></pre>
<h5 id="heading-key-points"><strong>Key Points</strong>:</h5>
<ul>
<li><p><strong>Access Mode</strong>: <code>"ReadWriteMany"</code> allows simultaneous access by multiple pods, critical for parallel workloads.</p>
</li>
<li><p><strong>Storage Class</strong>: Must correspond to an EFS provisioner configured in the Kubernetes cluster.</p>
</li>
<li><p><strong>Capacity</strong>: Start with 200 GiB and scale as per your dataset requirements.</p>
</li>
</ul>
<hr />
<h4 id="heading-2-deploying-nvidia-nim-helm-chart"><strong>2. Deploying NVIDIA NIM Helm Chart</strong></h4>
<p>After configuring storage, deploy NVIDIA NIM using Helm. The Helm chart simplifies GPU allocation and links the persistent storage to NIM-managed workloads.</p>
<h3 id="heading-configure-the-ngc-api-key">Configure the NGC API Key</h3>
<p>Before deploying NVIDIA NIM, you need to retrieve your <strong>NGC API Key</strong> from NVIDIA’s cloud platform and set it as an environment variable. This key enables secure authentication with NVIDIA’s container registry and services.</p>
<h4 id="heading-steps-to-retrieve-the-ngc-api-key">Steps to Retrieve the NGC API Key:</h4>
<ol>
<li><p>Log in to your <a target="_blank" href="https://ngc.nvidia.com/">NGC account.</a></p>
</li>
<li><p>Navigate to <strong>Setup</strong> &gt; <strong>API Keys</strong>.</p>
</li>
<li><p>Click <strong>Generate API Key</strong> if you don’t already have one.</p>
</li>
<li><p>Copy the generated key to use in your deployment process.</p>
</li>
</ol>
<h4 id="heading-set-the-ngc-api-key-as-an-environment-variable">Set the NGC API Key as an Environment Variable:</h4>
<p>Run the following command in your terminal to make the key accessible to Terraform during deployment:</p>
<pre><code class="lang-plaintext">export TF_VAR_ngc_api_key=&lt;replace-me&gt;
</code></pre>
<p>Replace <code>&lt;replace-me&gt;</code> with your actual API key. This key will be passed to NVIDIA NIM to enable seamless model deployment.</p>
<h5 id="heading-code-helm-release-for-nvidia-nim"><strong>Code: Helm Release for NVIDIA NIM</strong></h5>
<pre><code class="lang-plaintext">helm_release "nim_llm" {
  name      = "nim-llm"
  chart     = "./nim-llm"                # Points to the NIM Helm chart location.
  namespace = "nim"
  values = [
    templatefile("nim-llm-values.yaml", {
      model_id    = var.model_id            # Specifies the LLM model (e.g., GPT-like models).
      num_gpu     = var.num_gpu             # Allocates GPU resources for inference tasks.
      ngc_api_key = var.ngc_api_key
      pvc_name    = kubernetes_persistent_volume_claim_v1.efs_pvc.metadata[0].name
    })
  ]
}
</code></pre>
<h5 id="heading-key-points-1"><strong>Key Points</strong>:</h5>
<ul>
<li><p><code>model_id</code>: The identifier of the model being deployed (e.g., GPT-3, BERT).</p>
</li>
<li><p><code>num_gpu</code>: Configures GPU resources for inference tasks. The value should align with the instance type used in your cluster (e.g., g4dn.xlarge for one GPU).</p>
</li>
<li><p><code>pvc_name</code>: Links the EFS-backed PVC to the workload for storing large datasets or models.</p>
</li>
</ul>
<hr />
<h4 id="heading-3-configuration-highlights"><strong>3. Configuration Highlights</strong></h4>
<p><strong>Why Persistent Storage?</strong></p>
<ul>
<li><p>AI models and datasets are often larger than the node's local storage. Using EFS ensures:</p>
<ul>
<li><p>Scalability: Adjust storage as required without downtime.</p>
</li>
<li><p>High Availability: Accessible across multiple Availability Zones.</p>
</li>
</ul>
</li>
</ul>
<p><strong>GPU Allocation</strong></p>
<ul>
<li>NVIDIA NIM optimizes GPU usage for inference. Use the <code>num_gpu</code> variable to specify the number of GPUs for your workload, ensuring efficient resource utilization.</li>
</ul>
<hr />
<h4 id="heading-summary"><strong>Summary</strong></h4>
<ol>
<li><p><strong>Storage Configuration</strong>: Use AWS EFS with Kubernetes PVC for shared, scalable storage across pods.</p>
</li>
<li><p><strong>GPU Allocation</strong>: NVIDIA NIM enables efficient GPU resource management for AI inference tasks.</p>
</li>
<li><p><strong>Helm Chart Deployment</strong>: Leverage Helm for streamlined deployment, linking GPU resources and persistent storage.</p>
</li>
</ol>
<hr />
<h3 id="heading-step-3-adding-openai-web-ui">Step 3: Adding OpenAI Web UI</h3>
<p>The OpenAI Web UI provides an interface for users to interact with deployed AI models.</p>
<pre><code class="lang-plaintext">"helm_release" "openai_webui" {
  name       = "openai-webui"
  chart      = "open-webui"
  repository = "https://helm.openwebui.com/"
  namespace  = "openai-webui"
  values = [
    jsonencode({
      replicaCount = 1,
      image = {
        repository = "ghcr.io/open-webui/open-webui"
        tag        = "main"
      }
    })
  ]
}
</code></pre>
<hr />
<h3 id="heading-step-4-observability-with-prometheus-grafana-and-custom-metrics">Step 4: Observability with Prometheus, Grafana, and Custom Metrics</h3>
<p>Prometheus and Grafana are essential tools for monitoring AI workloads. Prometheus collects resource metrics, including GPU-specific data, while Grafana visualizes these metrics through tailored dashboards. These tools help ensure that AI operations are running smoothly and efficiently.</p>
<p>To extend observability, the Prometheus Adapter is configured with custom rules for tracking AI-specific metrics. Key configurations include:</p>
<ul>
<li><p><strong>Tracking Active Requests</strong>: Using the <code>num_requests_running</code> metric, Prometheus monitors the number of ongoing requests, providing insights into workload intensity.</p>
</li>
<li><p><strong>Inference Queue Monitoring</strong>: The <code>nv_inference_queue_duration_us</code> metric tracks NVIDIA inference queue times, converted into milliseconds for enhanced readability.</p>
</li>
</ul>
<h3 id="heading-sample-configuration-for-prometheus-adapter">Sample Configuration for Prometheus Adapter:</h3>
<pre><code class="lang-plaintext">prometheus:
  url: http://kube-prometheus-stack-prometheus.${prometheus_namespace}
  port: 9090
rules:
  default: false
  custom:
  - seriesQuery: '{__name__=~"num_requests_running"}'
    resources:
      template: &lt;&lt;.Resource&gt;&gt;
    name:
      matches: "num_requests_running"
      as: ""
    metricsQuery: sum(&lt;&lt;.Series&gt;&gt;{&lt;&lt;.LabelMatchers&gt;&gt;}) by (&lt;&lt;.GroupBy&gt;&gt;)
  - seriesQuery: 'nv_inference_queue_duration_us{namespace!="", pod!=""}'
    resources:
      overrides:
        namespace:
          resource: "namespace"
        pod:
          resource: "pod"
    name:
      matches: "nv_inference_queue_duration_us"
      as: "nv_inference_queue_duration_ms"
    metricsQuery: 'avg(rate(nv_inference_queue_duration_us{&lt;&lt;.LabelMatchers&gt;&gt;}[1m])/1000) by (&lt;&lt;.GroupBy&gt;&gt;)'
</code></pre>
<p>These configurations enable Prometheus to expose meaningful custom metrics that are critical for scaling and optimizing AI workloads. By integrating these metrics into Grafana dashboards, users gain actionable insights into system performance and bottlenecks.</p>
<hr />
<h2 id="heading-step-5-scaling-and-optimization-with-karpenter">Step 5: Scaling and Optimization with Karpenter</h2>
<p>In large-scale AI deployments, workload demands fluctuate significantly. Dynamic scaling is essential for managing these workloads effectively while minimizing costs. <strong>Karpenter</strong>, a Kubernetes-native cluster autoscaler, provides powerful mechanisms for optimizing resource utilization. It dynamically provisions nodes tailored to the specific demands of applications, including GPU-heavy AI workloads.</p>
<p>This section integrates Karpenter into the EKS Blueprint framework, highlighting its configuration for both CPU and GPU workloads. The full implementation and configurations are available in the <a target="_blank" href="https://github.com/timurgaleev/eks-nim-llm-openai">https://github.com/timurgaleev/eks-nim-llm-openai</a>.</p>
<hr />
<h4 id="heading-deploying-karpenter-with-eks-blueprints">Deploying Karpenter with EKS Blueprints</h4>
<p>Karpenter is added to the EKS cluster as a Blueprint add-on. Below is an example of the configuration block for enabling Karpenter, focusing on both CPU and GPU workload optimization:</p>
<pre><code class="lang-plaintext">module "eks_blueprints_addons" {
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "~&gt; 1.2"

  enable_karpenter                  = true
  karpenter_enable_spot_termination = true
  karpenter_node = {
    iam_role_additional_policies = {
      AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
    }
  }
  karpenter = {
    chart_version = "0.37.0"
  }
}
</code></pre>
<p>This configuration enables Karpenter with support for Spot instance termination handling and assigns additional IAM policies for managing nodes.</p>
<hr />
<h4 id="heading-configuring-karpenter-for-cpu-and-gpu-workloads">Configuring Karpenter for CPU and GPU Workloads</h4>
<p>For effective scaling, Karpenter relies on <strong>Provisioner</strong> configurations tailored to workload requirements. The following examples showcase how Karpenter dynamically provisions CPU and GPU nodes.</p>
<h5 id="heading-cpu-workloads">CPU Workloads</h5>
<pre><code class="lang-plaintext">name: cpu-karpenter
clusterName: ${module.eks.cluster_name}
ec2NodeClass:
  karpenterRole: ${split("/", module.eks_blueprints_addons.karpenter.node_iam_role_arn)[1]}
  subnetSelectorTerms:
    id: ${module.vpc.private_subnets[2]}
  securityGroupSelectorTerms:
    tags:
      Name: ${module.eks.cluster_name}-node
  instanceStorePolicy: RAID0

nodePool:
  labels:
    - type: karpenter
    - NodeGroupType: cpu-karpenter
  requirements:
    - key: "karpenter.k8s.aws/instance-family"
      operator: In
      values: ["m5"]
    - key: "karpenter.k8s.aws/instance-size"
      operator: In
      values: ["xlarge", "2xlarge", "4xlarge"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 180s
    expireAfter: 720h
  weight: 100
</code></pre>
<h5 id="heading-gpu-workloads">GPU Workloads</h5>
<pre><code class="lang-plaintext">name: gpu-workloads
clusterName: ${module.eks.cluster_name}
ec2NodeClass:
  karpenterRole: ${split("/", module.eks_blueprints_addons.karpenter.node_iam_role_arn)[1]}
  subnetSelectorTerms:
    id: ${module.vpc.private_subnets[1]}
  securityGroupSelectorTerms:
    tags:
      Name: ${module.eks.cluster_name}-node
  instanceStorePolicy: RAID0

nodePool:
  labels:
    - type: karpenter
    - NodeGroupType: gpu-workloads
  requirements:
    - key: "karpenter.k8s.aws/instance-family"
      operator: In
      values: ["g5", "p4", "p5"]  # GPU instances
    - key: "karpenter.k8s.aws/instance-size"
      operator: In
      values: ["2xlarge", "4xlarge", "8xlarge", "12xlarge"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 180s
    expireAfter: 720h
  weight: 100
</code></pre>
<hr />
<h3 id="heading-terraform-automation-scripts">Terraform Automation Scripts</h3>
<p>To streamline the deployment and teardown of resources, the project includes two utility scripts: <a target="_blank" href="http://install.sh"><code>install.sh</code></a> and <a target="_blank" href="http://cleanup.sh"><code>cleanup.sh</code></a>.</p>
<ul>
<li><p><a target="_blank" href="http://install.sh"><code>install.sh</code></a>: Automates the deployment process. It initializes Terraform, applies modules sequentially (e.g., VPC and EKS), and ensures all resources are provisioned successfully. A final Terraform apply captures any remaining dependencies.</p>
</li>
<li><p><a target="_blank" href="http://cleanup.sh"><code>cleanup.sh</code></a>: Safely destroys the deployed infrastructure. It handles dependencies like Kubernetes services, Load Balancers, and Security Groups, ensuring proper teardown order. Each module is destroyed sequentially, with a final pass to catch residual resources.</p>
</li>
</ul>
<p>These scripts enhance operational efficiency and minimize errors during deployment and cleanup phases, making the workflow more robust and reproducible.</p>
<h3 id="heading-key-features-of-karpenter-in-ai-ecosystems">Key Features of Karpenter in AI Ecosystems</h3>
<ol>
<li><p><strong>Dynamic Node Provisioning</strong>: Automatically provisions CPU or GPU nodes based on real-time workload needs.</p>
</li>
<li><p><strong>Cost Optimization</strong>: Leverages Spot instances while ensuring reliable on-demand scaling for critical workloads.</p>
</li>
<li><p><strong>Enhanced Resource Utilization</strong>: Consolidates underutilized nodes and removes idle resources with disruption policies.</p>
</li>
<li><p><strong>Tailored Scaling Policies</strong>: Supports node pools for diverse workload types, such as inference tasks or data preprocessing.</p>
</li>
</ol>
<p>Karpenter’s integration with GPU-optimized workloads ensures that demanding AI models benefit from high-performance compute nodes while maintaining cost efficiency.</p>
<hr />
<h2 id="heading-use-cases">Use Cases</h2>
<h3 id="heading-1-ai-model-training">1. AI Model Training</h3>
<p>NVIDIA NIM’s GPU optimizations allow for efficient training of models like BERT or GPT, reducing runtime and costs.</p>
<h3 id="heading-2-real-time-inference">2. Real-Time Inference</h3>
<p>Deploy models for real-time applications such as fraud detection, image recognition, or natural language understanding.</p>
<h3 id="heading-3-experimentation-and-research">3. Experimentation and Research</h3>
<p>With the OpenAI Web UI, data scientists can quickly test and iterate on models.</p>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>This platform enables the scalable and efficient deployment of AI workloads by integrating NVIDIA NIM with Amazon EKS. Terraform automates the process, ensuring repeatable and reliable setups. With GPU optimization, persistent storage, and observability tools, the platform is well-suited for businesses and researchers alike.</p>
<p>By following this guide, you can build a scalable and efficient AI platform. For detailed code and further exploration, visit the GitHub repository <a target="_blank" href="https://github.com/timurgaleev/eks-nim-llm-openai">https://github.com/timurgaleev/eks-nim-llm-openai</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Deploying AWS EKS with Terraform and Blueprints Addons]]></title><description><![CDATA[After a pause from covering AWS and infrastructure management, I’m back with insights for those looking to navigate the world of AWS containers and Kubernetes with ease. For anyone new to deploying Kubernetes in AWS, leveraging Terraform for setting ...]]></description><link>https://tgaleev.com/deploying-aws-eks-with-terraform-and-blueprints-addons</link><guid isPermaLink="true">https://tgaleev.com/deploying-aws-eks-with-terraform-and-blueprints-addons</guid><category><![CDATA[AWS]]></category><category><![CDATA[EKS]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[vpc]]></category><category><![CDATA[containers]]></category><category><![CDATA[containerization]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Thu, 07 Nov 2024 08:45:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1730968032791/35b49d03-2c53-4c75-8245-3e169b6f5644.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After a pause from covering AWS and infrastructure management, I’m back with insights for those looking to navigate the world of AWS containers and Kubernetes with ease. For anyone new to deploying Kubernetes in AWS, leveraging Terraform for setting up an EKS (Elastic Kubernetes Service) cluster can be a game-changer. By combining Terraform’s infrastructure-as-code capabilities with AWS’s EKS Blueprints Addons, users can create a scalable, production-ready Kubernetes environment without the usual complexity.</p>
<p>In this article, I'll guide you through using Terraform to deploy EKS with essential add-ons, which streamline the configuration and management of your Kubernetes clusters. With these modular add-ons, you can quickly incorporate features like CoreDNS, the AWS Load Balancer Controller, and other powerful tools to customize and enhance your setup. Whether you’re new to container orchestration or just seeking an efficient AWS solution, this guide will help you build a resilient EKS environment in a few straightforward steps.</p>
<h3 id="heading-so-lets-start">So let’s start.</h3>
<h2 id="heading-setting-up-the-vpc-for-eks">Setting Up the VPC for EKS</h2>
<p>The VPC configuration is foundational for your EKS cluster, establishing a secure, isolated environment with both public and private subnets. Private subnets are typically used to host your Kubernetes nodes, keeping them inaccessible from the internet. Here’s the configuration provided in the <a target="_blank" href="http://vpc.tf"><code>vpc.tf</code></a> file, which sets up both public and private subnets along with NAT and Internet Gateway options for flexible networking.</p>
<pre><code class="lang-plaintext">module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~&gt; 5.0"

  name                 = local.name
  cidr                 = var.vpc_cidr
  azs                  = local.azs
  secondary_cidr_blocks = var.secondary_cidr_blocks
  private_subnets      = concat(local.private_subnets, local.secondary_ip_range_private_subnets)
  public_subnets       = local.public_subnets
  enable_nat_gateway   = true
  single_nat_gateway   = true
  public_subnet_tags   = {"kubernetes.io/role/elb" = 1}
  private_subnet_tags  = {
    "kubernetes.io/role/internal-elb" = 1
    "karpenter.sh/discovery" = local.name
  }
  tags = local.tags
}
</code></pre>
<p>This setup:</p>
<ul>
<li><p>Creates private and public subnets across multiple availability zones.</p>
</li>
<li><p>Configures a secondary CIDR block for the EKS data plane, which is crucial for large-scale deployments.</p>
</li>
<li><p>Enables a NAT gateway for private subnets, ensuring secure internet access for internal resources.</p>
</li>
<li><p>Tags subnets for Kubernetes service and discovery, essential for integration with other AWS services like load balancers and Karpenter.</p>
</li>
</ul>
<h2 id="heading-deploying-eks-with-managed-node-groups">Deploying EKS with Managed Node Groups</h2>
<p>Now that the VPC is configured, let’s move on to deploying the EKS cluster with the <a target="_blank" href="http://eks.tf"><code>eks.tf</code></a> file configuration. This setup includes defining managed node groups within the EKS cluster, specifying node configurations, security rules, and IAM roles.</p>
<pre><code class="lang-plaintext">module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~&gt; 19.15"

  cluster_name                   = local.name
  cluster_version                = var.eks_cluster_version
  cluster_endpoint_public_access = true
  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = compact([for subnet_id, cidr_block in zipmap(module.vpc.private_subnets, module.vpc.private_subnets_cidr_blocks) : substr(cidr_block, 0, 4) == "100." ? subnet_id : null])

  aws_auth_roles = [
    {
      rolearn  = module.eks_blueprints_addons.karpenter.node_iam_role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups   = ["system:bootstrappers", "system:nodes"]
    }
  ]

  eks_managed_node_groups = {
    core_node_group = {
      name             = "core-node-group"
      ami_type         = "AL2_x86_64"
      min_size         = 2
      max_size         = 8
      desired_size     = 2
      instance_types   = ["m5.xlarge"]
      capacity_type    = "SPOT"
      labels           = { WorkerType = "SPOT", NodeGroupType = "core" }
      tags             = merge(local.tags, { Name = "core-node-grp" })
    }
  }
}
</code></pre>
<p>Key components:</p>
<ul>
<li><p><strong>VPC and Subnets</strong>: The <code>vpc_id</code> and <code>subnet_ids</code> reference the private subnets, providing a secure foundation for EKS nodes.</p>
</li>
<li><p><strong>Managed Node Groups</strong>: This setup defines a core node group with spot instances (<code>capacity_type = "SPOT"</code>) to optimize cost, with configurable instance types, sizes, and labels for workload placement.</p>
</li>
<li><p><strong>Security Rules and IAM Roles</strong>: Configures additional security rules to manage access between nodes and clusters, along with IAM roles to control permissions for Karpenter and node management.</p>
</li>
</ul>
<p>This configuration gives you a scalable and cost-effective EKS environment that is ready for production workloads, with flexibility to adjust nodes and subnets as needed</p>
<h2 id="heading-configuring-eks-add-ons">Configuring EKS Add-ons</h2>
<p>Add-ons enhance your EKS cluster by integrating additional AWS services and open-source tools. With the EKS Blueprints, you can easily set up these add-ons, which range from storage solutions to observability and monitoring tools.</p>
<h4 id="heading-setting-up-the-ebs-csi-driver-for-persistent-storage">Setting Up the EBS CSI Driver for Persistent Storage</h4>
<p>The Amazon EBS CSI Driver is essential for persistent storage on EKS. This module configures the necessary IAM roles for the driver, enabling it to provision and manage EBS volumes.</p>
<pre><code class="lang-plaintext">module "ebs_csi_driver_irsa" {
  source                = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version               = "~&gt; 5.20"
  role_name_prefix      = format("%s-%s-", local.name, "ebs-csi-driver")
  attach_ebs_csi_policy = true
  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
  tags = local.tags
}
</code></pre>
<p>This configuration creates an IAM role for the EBS CSI Driver using IAM Roles for Service Accounts (IRSA), which allows the driver to interact with EBS securely.</p>
<h4 id="heading-enabling-amazon-cloudwatch-observability">Enabling Amazon CloudWatch Observability</h4>
<p>The <code>amazon-cloudwatch-observability</code> add-on integrates CloudWatch for monitoring and logging, providing insights into your cluster’s performance.</p>
<pre><code class="lang-plaintext">eks_addons = {
  amazon-cloudwatch-observability = {
    preserve                 = true
    service_account_role_arn = aws_iam_role.cloudwatch_observability_role.arn
  }
}
</code></pre>
<p>This snippet specifies the IAM role required for CloudWatch, enabling detailed observability for your workloads.</p>
<h4 id="heading-integrating-aws-load-balancer-controller">Integrating AWS Load Balancer Controller</h4>
<p>The AWS Load Balancer Controller allows you to provision and manage Application Load Balancers (ALBs) for Kubernetes services. Here’s how it’s configured:</p>
<pre><code class="lang-plaintext">enable_aws_load_balancer_controller = true
aws_load_balancer_controller = {
  set = [{
    name  = "enableServiceMutatorWebhook"
    value = "false"
  }]
}
</code></pre>
<p>The <code>enableServiceMutatorWebhook</code> setting is disabled to avoid automatic modification of service annotations, making it ideal for custom configurations.</p>
<h4 id="heading-adding-karpenter-for-autoscaling">Adding Karpenter for Autoscaling</h4>
<p>Karpenter is an open-source autoscaler designed for Kubernetes, enabling efficient and dynamic scaling of EC2 instances based on workload requirements. This configuration sets up Karpenter with support for spot instances, reducing costs for non-critical workloads.</p>
<pre><code class="lang-plaintext">enable_karpenter                  = true
karpenter_enable_spot_termination = true
karpenter_node = {
  iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }
}
karpenter = {
  chart_version       = "0.37.0"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
}
</code></pre>
<p>This configuration includes additional IAM policies for Karpenter nodes, making it easier to integrate with AWS services like EC2 for flexible scaling.</p>
<p>These add-ons, configured through the AWS EKS Blueprints and Terraform, help streamline Kubernetes management on AWS while offering enhanced storage, observability, and autoscaling. ​</p>
<p>To explore the complete configuration, you can find the full code in the GitHub repository <a target="_blank" href="https://github.com/timurgaleev/aws-eks-terraform-addons">https://github.com/timurgaleev/aws-eks-terraform-addons</a>. The repository includes <a target="_blank" href="http://install.sh"><code>install.sh</code></a> to deploy the EKS cluster and configure the add-ons seamlessly, along with <a target="_blank" href="http://cleanup.sh"><code>cleanup.sh</code></a> to tear down the environment when it’s no longer needed.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>This Terraform setup provides a powerful framework for deploying EKS with essential add-ons, such as storage, observability, and autoscaling, to support scalable applications. Specifically, <a target="_blank" href="https://github.com/timurgaleev/aws-eks-terraform-addons">this configuration</a> is designed to enable deployment of applications like OpenAI Chat, showcasing Kubernetes' flexibility for real-time, interactive workloads. With this setup, you’re ready to deploy and manage robust, production-grade EKS clusters in AWS.</p>
]]></content:encoded></item><item><title><![CDATA[Getting Started with Amazon EKS Anywhere: A Practical Guide for On-Premise Kubernetes Deployment]]></title><description><![CDATA[Introduction
As businesses increasingly move towards hybrid and multi-cloud environments, managing infrastructure across multiple platforms has become more complex. However, Amazon Web Services (AWS) has introduced a game-changer for organizations th...]]></description><link>https://tgaleev.com/getting-started-with-amazon-eks-anywhere-a-practical-guide-for-on-premise-kubernetes-deployment</link><guid isPermaLink="true">https://tgaleev.com/getting-started-with-amazon-eks-anywhere-a-practical-guide-for-on-premise-kubernetes-deployment</guid><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 25 Sep 2024 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733147262493/32adad5d-38b7-48a8-84ce-a536e5e0a0f6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-introduction">Introduction</h3>
<p>As businesses increasingly move towards hybrid and multi-cloud environments, managing infrastructure across multiple platforms has become more complex. However, Amazon Web Services (AWS) has introduced a game-changer for organizations that want the power and flexibility of Kubernetes on their on-premise infrastructure. This is where <strong>Amazon EKS Anywhere</strong> comes into play. In this article, we’ll explore what EKS Anywhere is, its benefits, and how you can set up and manage Kubernetes clusters on your own on-prem servers using VMware vSphere.</p>
<p>Having recently tested EKS Anywhere with my on-prem servers, I can confidently say that it streamlines the process of deploying and managing Kubernetes clusters without the need for complicated third-party tools. Let's walk through the process, from setup to deployment, with some real-world examples.</p>
<hr />
<h3 id="heading-what-is-amazon-eks-anywhere">What is Amazon EKS Anywhere?</h3>
<p>Amazon <strong>Elastic Kubernetes Service (EKS)</strong> is a managed Kubernetes service provided by AWS for running containerized applications. While it simplifies the Kubernetes management process, it traditionally required cloud infrastructure like AWS EC2 instances. However, with <strong>EKS Anywhere</strong>, AWS now offers a deployment option for customers to create and manage Kubernetes clusters <strong>on their on-premise hardware</strong>.</p>
<p>The key benefits of EKS Anywhere are:</p>
<ol>
<li><p><strong>Consistent Management Experience</strong>: It offers the same management tools and experience as Amazon EKS in the AWS Cloud.</p>
</li>
<li><p><strong>Open Source</strong>: Built on <strong>Amazon EKS Distro</strong>, an open-source Kubernetes distribution, it allows users to deploy Kubernetes clusters with minimal effort.</p>
</li>
<li><p><strong>Integration with AWS Tools</strong>: Seamlessly integrates with AWS services like <strong>AWS Systems Manager</strong> (SSM) for monitoring and operations.</p>
</li>
</ol>
<p>In essence, EKS Anywhere allows you to run Kubernetes on your existing infrastructure, ensuring you still benefit from the rich ecosystem and features AWS provides.</p>
<hr />
<h3 id="heading-key-features-of-eks-anywhere">Key Features of EKS Anywhere</h3>
<ul>
<li><p><strong>Hardware Support</strong>: It runs on your own hardware or on VMware vSphere, making it ideal for on-premise deployments.</p>
</li>
<li><p><strong>Control Plane</strong>: Unlike EKS, where the control plane is managed by AWS, with EKS Anywhere, you manage the control plane yourself.</p>
</li>
<li><p><strong>Cluster Lifecycle Management</strong>: EKS Anywhere includes tooling for automating cluster creation, scaling, updates, and even the destruction of Kubernetes clusters.</p>
</li>
<li><p><strong>AWS Integration</strong>: Easily view and manage your on-prem Kubernetes clusters using the EKS console, integrating seamlessly with AWS Cloud services.</p>
</li>
<li><p><strong>Support for Third-party Tools</strong>: EKS Anywhere supports integrations with tools like <strong>Flux</strong> for GitOps, <strong>eksctl</strong> for cluster management, and <strong>Cilium</strong> for networking.</p>
</li>
</ul>
<hr />
<h3 id="heading-setting-up-eks-anywhere-on-vmware-vsphere">Setting Up EKS Anywhere on VMware vSphere</h3>
<p>For this guide, I’ll walk you through setting up an EKS Anywhere cluster on your on-prem VMware vSphere infrastructure. While you can set up a test cluster on your desktop, here we focus on a more realistic production setup.</p>
<p><strong>Prerequisites:</strong></p>
<ul>
<li><p>VMware vSphere version 7.0 or higher.</p>
</li>
<li><p>EKS Anywhere tools installed on your machine.</p>
</li>
<li><p>At least three control plane nodes and three worker nodes for high availability.</p>
</li>
</ul>
<h4 id="heading-step-1-install-eks-anywhere-cli-tools">Step 1: Install EKS Anywhere CLI Tools</h4>
<p>Start by installing the necessary CLI tools. On a Mac, you can do this via <strong>Homebrew</strong>.</p>
<pre><code class="lang-plaintext">$ brew install aws/tap/eks-anywhere
$ eksctl anywhere version
v0.5.0
</code></pre>
<h4 id="heading-step-2-generate-cluster-config-and-create-a-cluster">Step 2: Generate Cluster Config and Create a Cluster</h4>
<p>Let’s create a Kubernetes cluster using <strong>eksctl</strong>. First, you need to generate a cluster configuration file.</p>
<pre><code class="lang-plaintext">$ CLUSTER_NAME=my-eks-cluster
$ eksctl anywhere generate clusterconfig $CLUSTER_NAME --provider vsphere &gt; $CLUSTER_NAME.yaml
</code></pre>
<p>Now that we have the configuration, we can create the cluster on vSphere.</p>
<pre><code class="lang-plaintext">$ eksctl anywhere create cluster -f $CLUSTER_NAME.yaml
</code></pre>
<p>The CLI will handle the setup of the control plane, the worker nodes, and the networking components for your cluster. Once the cluster is created, it will be fully operational, and you can use <strong>kubectl</strong> to interact with it.</p>
<h4 id="heading-step-3-export-kubeconfig-and-deploy-a-test-app">Step 3: Export Kubeconfig and Deploy a Test App</h4>
<p>Once the cluster is created, you'll have a <strong>kubeconfig</strong> file to connect to your Kubernetes cluster:</p>
<pre><code class="lang-plaintext">$ export KUBECONFIG=${PWD}/${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig
$ kubectl get ns
</code></pre>
<p>You can now deploy a simple test application to verify everything is working:</p>
<pre><code class="lang-plaintext">$ kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/hello-eks-a.yaml"
$ kubectl get pods -l app=hello-eks-a
</code></pre>
<p>This will deploy a basic pod that you can access locally:</p>
<pre><code class="lang-plaintext">$ kubectl port-forward deploy/hello-eks-a 8000:80
$ curl localhost:8000
</code></pre>
<p>You should see a simple “Hello from EKS Anywhere” message, confirming that the cluster is up and running.</p>
<hr />
<h3 id="heading-managing-your-cluster-high-availability-and-updates">Managing Your Cluster: High Availability and Updates</h3>
<p>In a production environment, you’ll want to ensure high availability and smooth updates for your clusters. EKS Anywhere allows you to scale your cluster as needed and manage rolling updates.</p>
<p>For high availability, it's recommended to have at least <strong>three control plane nodes</strong> and <strong>three worker nodes</strong>. You can scale the cluster using:</p>
<pre><code class="lang-plaintext">$ eksctl anywhere scale cluster --control-plane-nodes 3 --worker-nodes 3
</code></pre>
<p>To update the cluster, use the built-in update tools provided by EKS Anywhere, which work much like the updates on AWS-managed EKS clusters. The update process ensures that your cluster remains stable during the upgrade, even with multiple nodes.</p>
<hr />
<h3 id="heading-using-eks-connector-for-centralized-management">Using EKS Connector for Centralized Management</h3>
<p>One of the standout features of EKS Anywhere is <strong>EKS Connector</strong>, which allows you to manage your on-prem clusters directly from the EKS console. This makes it easy to view and monitor all your Kubernetes clusters, whether they’re running on AWS or on-prem.</p>
<p>To connect your EKS Anywhere cluster to the EKS console:</p>
<ol>
<li><p>Register the cluster through the EKS console.</p>
</li>
<li><p>Download and apply the necessary <strong>eks-connector.yaml</strong> configuration to your cluster.</p>
</li>
<li><p>Once applied, your cluster will be available in the AWS Management Console for monitoring and management.</p>
</li>
</ol>
<pre><code class="lang-plaintext">$ kubectl apply -f eks-connector.yaml
</code></pre>
<p>This allows you to manage your on-prem clusters alongside your AWS-based clusters in a single interface.</p>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>Amazon EKS Anywhere has made managing on-prem Kubernetes clusters much simpler by bringing AWS-level tools and support to local infrastructures. Whether you're running on VMware vSphere or other compatible environments, EKS Anywhere allows you to benefit from a consistent, simplified management experience, without the need for complex, third-party tools. It also integrates seamlessly with AWS services, making it easy to monitor and scale your infrastructure.</p>
<p>If you're looking to bring Kubernetes to your on-prem servers, EKS Anywhere is an excellent choice that I would highly recommend based on my recent hands-on testing.</p>
]]></content:encoded></item><item><title><![CDATA[AWS EKS vs. AWS ECS: Choosing the Right Container Service for Your Needs]]></title><description><![CDATA[IntroductionIn the world of cloud-based applications, containers have become a staple for deploying and scaling applications efficiently. AWS offers two primary container services: Elastic Kubernetes Service (EKS) and Elastic Container Service (ECS)....]]></description><link>https://tgaleev.com/aws-eks-vs-aws-ecs-choosing-the-right-container-service-for-your-needs</link><guid isPermaLink="true">https://tgaleev.com/aws-eks-vs-aws-ecs-choosing-the-right-container-service-for-your-needs</guid><category><![CDATA[AWS]]></category><category><![CDATA[EKS]]></category><category><![CDATA[ECS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Tue, 06 Aug 2024 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1731416679385/a4ebd36a-40aa-4e8e-870e-6a2d21cbb100.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Introduction</strong><br />In the world of cloud-based applications, containers have become a staple for deploying and scaling applications efficiently. AWS offers two primary container services: <strong>Elastic Kubernetes Service (EKS)</strong> and <strong>Elastic Container Service (ECS)</strong>. Both help developers deploy and manage containers, but they have distinct architectures and best-use scenarios. Let’s explore the differences, pros, and cons of each to help you choose the best one for your needs.</p>
<hr />
<h3 id="heading-1-what-is-aws-ecs">1. What is AWS ECS?</h3>
<p>AWS <strong>Elastic Container Service (ECS)</strong> is a fully managed container service built by Amazon to deploy and manage containers. ECS is optimized for simplicity and efficiency, especially for AWS environments, making it a great choice if you want a straightforward, managed solution for running containers without the complexity of Kubernetes.</p>
<p><strong>Example Code for ECS (Using AWS CLI):</strong></p>
<pre><code class="lang-plaintext">bashCopy code# Create an ECS cluster
aws ecs create-cluster --cluster-name my-ecs-cluster

# Register a task definition
aws ecs register-task-definition --family my-task \
  --container-definitions '[{"name":"my-container","image":"nginx","memory":512,"cpu":256}]'

# Run a task in the ECS cluster
aws ecs run-task --cluster my-ecs-cluster --task-definition my-task
</code></pre>
<h3 id="heading-2-what-is-aws-eks">2. What is AWS EKS?</h3>
<p>AWS <strong>Elastic Kubernetes Service (EKS)</strong> is a managed Kubernetes service. It allows you to deploy, scale, and operate Kubernetes on AWS, fully aligned with Kubernetes standards. EKS provides flexibility and compatibility with the Kubernetes ecosystem, ideal for teams with experience in Kubernetes or those wanting more control over container orchestration.</p>
<p><strong>Example Code for EKS (Using kubectl and AWS CLI):</strong></p>
<pre><code class="lang-plaintext">bashCopy code# Create a Kubernetes deployment
kubectl create deployment nginx-deployment --image=nginx

# Expose the deployment as a service
kubectl expose deployment nginx-deployment --port=80 --target-port=80 --type=LoadBalancer
</code></pre>
<h3 id="heading-3-key-differences-between-ecs-and-eks">3. Key Differences Between ECS and EKS</h3>
<p>The primary difference between ECS and EKS lies in the underlying orchestration. ECS is AWS-native, offering tight integration and simplified operations but is exclusive to AWS. EKS, being Kubernetes-based, is portable and lets you run the same configurations across different cloud providers or on-premises if you use Kubernetes elsewhere.</p>
<h3 id="heading-4-advantages-of-aws-ecs">4. Advantages of AWS ECS</h3>
<ul>
<li><p><strong>Simplicity:</strong> ECS is straightforward to set up, especially if you’re already working with other AWS services.</p>
</li>
<li><p><strong>Tight AWS Integration:</strong> ECS has deep integration with AWS IAM, CloudWatch, and other services, making security and monitoring seamless.</p>
</li>
<li><p><strong>Lower Management Overhead:</strong> ECS manages most of the infrastructure, so you don’t have to worry about the underlying components like control planes or etcd clusters.</p>
</li>
</ul>
<h3 id="heading-5-advantages-of-aws-eks">5. Advantages of AWS EKS</h3>
<ul>
<li><p><strong>Kubernetes Compatibility:</strong> EKS is compatible with Kubernetes, making it ideal for teams familiar with Kubernetes and tools like Helm, kubectl, and Prometheus.</p>
</li>
<li><p><strong>Hybrid and Multi-Cloud Flexibility:</strong> Since it’s based on Kubernetes, EKS allows applications to be portable, ideal for multi-cloud or hybrid environments.</p>
</li>
<li><p><strong>Extensibility:</strong> EKS enables integration with a wide array of Kubernetes plugins and tools, giving developers more control and customization options.</p>
</li>
</ul>
<h3 id="heading-6-when-to-choose-ecs-over-eks">6. When to Choose ECS Over EKS</h3>
<p>If your team values simplicity and deep AWS integration, ECS can be an excellent choice. ECS is also ideal when running smaller applications or when your team prefers a managed service that takes care of infrastructure details. ECS may require less management and works well when you need to deploy on AWS alone without multi-cloud portability.</p>
<h3 id="heading-7-when-to-choose-eks-over-ecs">7. When to Choose EKS Over ECS</h3>
<p>EKS is a powerful choice if your team has Kubernetes experience or needs hybrid cloud deployment. EKS enables portability, so if there’s a need to run parts of your app on other clouds or on-premises, EKS is better. Kubernetes allows more control over networking, storage, and plugins—ideal for complex applications.</p>
<h3 id="heading-8-pros-and-cons-summary">8. Pros and Cons Summary</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>ECS</td><td>EKS</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Ease of Use</strong></td><td>Simplified, AWS-native</td><td>More complex, Kubernetes-native</td></tr>
<tr>
<td><strong>Multi-Cloud</strong></td><td>AWS-only</td><td>Multi-cloud flexibility</td></tr>
<tr>
<td><strong>Integrations</strong></td><td>Deeply integrated with AWS</td><td>Compatible with the Kubernetes ecosystem</td></tr>
<tr>
<td><strong>Management</strong></td><td>AWS handles most infrastructure details</td><td>More user control, but requires management</td></tr>
<tr>
<td><strong>Scalability</strong></td><td>Scalable within AWS environment</td><td>Scalable across clouds and on-premises</td></tr>
</tbody>
</table>
</div><h3 id="heading-9-which-to-use-practical-scenarios">9. Which to Use: Practical Scenarios</h3>
<p>For example, if you’re a small team running microservices exclusively on AWS, ECS will likely meet your needs with less management overhead. However, if you’re developing a complex, multi-tiered application that may need to scale across multiple clouds, EKS could be more suitable.</p>
<h3 id="heading-10-conclusion">10. Conclusion</h3>
<p>While both AWS ECS and EKS are strong options, the choice depends on your team’s needs, skill level, and deployment goals. ECS is straightforward and integrates deeply into AWS, making it perfect for teams focused on AWS-native applications. EKS, on the other hand, is ideal for those who want flexibility, Kubernetes compatibility, and multi-cloud options. For most straightforward applications, ECS is often the preferred choice, but EKS brings value for larger and more complex architectures. Choose wisely based on your priorities, but remember that both services are backed by AWS, ensuring scalability and reliability.</p>
]]></content:encoded></item><item><title><![CDATA[Leveraging GitLab and GitLab Self-Managed as Source Providers in AWS CodeBuild]]></title><description><![CDATA[In an exciting update, Amazon Web Services (AWS) has announced that GitLab and self-managed GitLab instances are now supported as source providers for AWS CodeBuild projects. This enhancement simplifies the continuous integration and continuous deliv...]]></description><link>https://tgaleev.com/leveraging-gitlab-and-gitlab-self-managed-as-source-providers-in-aws-codebuild</link><guid isPermaLink="true">https://tgaleev.com/leveraging-gitlab-and-gitlab-self-managed-as-source-providers-in-aws-codebuild</guid><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Tue, 16 Apr 2024 14:47:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1713278850270/a4c16f69-5831-4156-a52d-0f565f6012e6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2024/03/aws-codebuild-gitlab-gitlab-self-managed/">In an exciting update, Amazon Web Services (AWS)</a> has announced that GitLab and self-managed GitLab instances are now supported as source providers for AWS CodeBuild projects. This enhancement simplifies the continuous integration and continuous delivery (CI/CD) process, allowing users to initiate builds directly from changes in their source code hosted in GitLab repositories.</p>
<p>AWS CodeBuild is a fully managed, scalable, and flexible build service that compiles your source code, runs tests, and produces software artifacts. With the addition of GitLab and GitLab Self-Managed as source providers, developers can now seamlessly connect their projects to AWS CodeBuild and automate build processes.</p>
<p>To set up a connection between your GitLab repository and AWS CodeBuild, follow these steps:</p>
<ol>
<li><p>Navigate to the AWS CodeBuild console in your AWS Management Console.</p>
</li>
<li><p>Create a new build project or select an existing one.</p>
</li>
<li><p>In the "Source" section, choose "GitLab" as your source provider</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713265679656/723bf122-b0fd-469a-ba3e-8e3093ae48e3.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Provide the necessary details about your GitLab repository, such as the project URL and branch name.</p>
</li>
<li><p>Create or select an existing IAM role with appropriate permissions for AWS CodeBuild to interact with your GitLab repository, AWS resources, and other required services.</p>
</li>
<li><p>Set up any necessary buildspec files or configurations for your project.</p>
<pre><code class="lang-plaintext"> version: 0.2

 phases:
   install:
     runtime-versions:
       nodejs: 14
   build:
     commands:
       - npm install
       - npm run build
</code></pre>
</li>
<li><p>Complete the setup process and start or schedule a new build.</p>
</li>
</ol>
<p>The integration between GitLab and AWS CodeBuild enables developers to take advantage of the following benefits:</p>
<ul>
<li><p>Streamlined CI/CD processes: With direct access to your GitLab repositories, AWS CodeBuild can automatically initiate builds when changes are detected in the source code. This automation reduces manual intervention and accelerates development cycles.</p>
</li>
<li><p>Enhanced security: By establishing a secure connection using an access token, AWS CodeBuild can interact with your GitLab repository and other related resources while maintaining the necessary security measures.</p>
</li>
<li><p>Scalability: AWS CodeBuild offers a highly scalable build service, allowing you to handle multiple builds concurrently and efficiently. This capability is particularly valuable for large projects or teams that require parallel processing.</p>
</li>
<li><p>Flexibility: The integration supports both GitLab and self-managed GitLab instances, providing developers with the flexibility to choose their preferred source code management solution.</p>
</li>
</ul>
<p>In conclusion, the integration of GitLab and GitLab Self-Managed as source providers in AWS CodeBuild is a significant step forward for streamlined CI/CD processes. By enabling builds to be initiated directly from changes in GitLab repositories, developers can now enjoy an even more efficient and secure development experience when working with AWS services.</p>
]]></content:encoded></item><item><title><![CDATA[Beginning the Journey into ML, AI and GenAI on AWS]]></title><description><![CDATA[Machine Learning (ML), Artificial Intelligence (AI), and Generative Artificial Intelligence (GenAI) are transformative technologies that have the potential to revolutionize industries across the globe.
At the last AWS re:Invent, there were numerous u...]]></description><link>https://tgaleev.com/beginning-the-journey-into-ml-ai-and-genai-on-aws</link><guid isPermaLink="true">https://tgaleev.com/beginning-the-journey-into-ml-ai-and-genai-on-aws</guid><category><![CDATA[FMs]]></category><category><![CDATA[broadai]]></category><category><![CDATA[amazon recognition]]></category><category><![CDATA[AWS]]></category><category><![CDATA[ML]]></category><category><![CDATA[genai]]></category><category><![CDATA[AI]]></category><category><![CDATA[sagemaker ]]></category><category><![CDATA[llm]]></category><category><![CDATA[chatgpt]]></category><category><![CDATA[openai]]></category><category><![CDATA[Amazon Q]]></category><category><![CDATA[Amazon Bedrock]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Mon, 22 Jan 2024 21:50:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705674652170/b48dde44-a668-4e36-b510-fca57f7d8540.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Machine Learning (ML)</strong>, <strong>Artificial Intelligence (AI)</strong>, and <strong>Generative Artificial Intelligence (GenAI)</strong> are transformative technologies that have the potential to revolutionize industries across the globe.</p>
<p>At the last <a target="_blank" href="https://reinvent.awsevents.com/"><strong>AWS re:Invent</strong></a>, there were numerous updates related to ML/AI and everything associated with these technologies. I also decided to delve into these topics and immerse myself in this field.</p>
<p>I won't delve into explaining the meanings of ML, AI, DL(Deep Learning), and GenAI. However, I'd like to touch upon <strong>FMs</strong> and <strong>LLM</strong> as we will focus our attention there. I found myself losing the same question when I came across this topic in my reading or listening. :)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705934423311/1a74d120-7342-401d-b468-b52fff658c75.jpeg" alt class="image--center mx-auto" /></p>
<p><strong>Foundational Models (FMs)</strong> within the AWS ecosystem represent fundamental structures and algorithms essential for diverse AI applications. These models, often created by industry-leading AI companies, are integral to the development and functionality of AWS services, shaping the landscape of artificial intelligence on the platform. In the context of Amazon Bedrock, <strong>Language Models (LMs)</strong> play a pivotal role. These LMs contribute to the service's linguistic capabilities, facilitating advanced language understanding and content generation within the AWS environment.</p>
<p>AWS provides various services for Machine Learning and Artificial Intelligence, including  <a target="_blank" href="https://aws.amazon.com/sagemaker/">Amazon SageMaker</a>, <a target="_blank" href="https://aws.amazon.com/deeplens/">AWS DeepLens</a>, <a target="_blank" href="https://aws.amazon.com/deepcomposer/">AWS DeepComposer</a>, Amazon Forecast and more. Familiarize yourself with the services available to determine which ones suit your specific needs.</p>
<p><strong>Generative Artificial Intelligence</strong> <strong>(GenAI)</strong> is a type of artificial intelligence that can generate text, images, or other media using generative models. AWS offers a range of services for building and scaling generative AI applications, including <a target="_blank" href="https://aws.amazon.com/sagemaker/">Amazon SageMaker</a>, <a target="_blank" href="https://aws.amazon.com/de/rekognition/">Amazon Rekognition</a>, <a target="_blank" href="https://aws.amazon.com/deepracer/">AWS DeepRacer</a>, and <a target="_blank" href="https://aws.amazon.com/forecast/">Amazon Forecast</a>. AWS has also invested in developing foundation models (FMs) for generative AI, which are ultra-large machine learning models that generative AI relies on. AWS has also launched the Generative AI Innovation Center, which connects AWS AI and ML experts with customers around the world to help them envision, design, and launch new generative AI products and services. Generative AI has the potential to revolutionize the way we create and consume media, but it is important to use it responsibly and ethically.</p>
<p>Some examples GenAI: One of the most well-known examples of GenAI is <a target="_blank" href="https://chat.openai.com/">ChatGPT</a>, launched by <a target="_blank" href="https://openai.com/">OpenAI</a>, which became wildly popular overnight and galvanized public attention. Another model from OpenAI, called text-embedding-ada-002, is specifically designed to work with embeddings a type of database specifically designed to feed data into large language models (LLM). However, it’s important to note that generative AI creates artifacts that can be inaccurate or biased, making human validation essential and potentially limiting the time it saves workers. Therefore, end users should be realistic about the value they are looking to achieve, especially when using a service as is.</p>
<p>I've also delved a bit deeper into Broad AI when learning GenAI and I'd like to show this in the form of the following picture as it explains a lot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705960020199/bcb2ca7d-96a9-441b-bd62-ac1dc6737557.jpeg" alt="Layers from broad artificial Intelligence to generative AI" class="image--center mx-auto" /></p>
<p><strong>Broad AI</strong> includes task-specific algorithms, Machine Learning (ML), and Deep Learning. These layers enable AI to perform tasks like image recognition, natural language processing, and complex pattern modeling.</p>
<p>The <strong>transition to GenAI</strong> involves Transfer Learning, Reinforcement Learning, and Autonomous Learning. These layers allow AI to apply knowledge across contexts, learn from interactions, and independently gather and learn from information.</p>
<p>So, the journey from Broad AI to GenAI represents significant leaps in AI capabilities, moving towards AI systems that can truly understand, learn, and adapt like a human brain.</p>
<p>Let's explore a couple of AWS services that, from my perspective, are among the more popular today.</p>
<p><a target="_blank" href="https://aws.amazon.com/sagemaker/">Amazon SageMaker</a><strong>:</strong></p>
<p>Amazon SageMaker is a comprehensive platform that simplifies the machine learning workflow. It covers everything from data labeling and preparation to model training and deployment. Take advantage of SageMaker's Jupyter notebook integration for interactive data exploration and model development. The platform also supports popular ML frameworks like TensorFlow and PyTorch.</p>
<p><a target="_blank" href="https://aws.amazon.com/q/"><strong>Amazon Q</strong></a> is a groundbreaking Generative AI assistant crafted with a focus on security and privacy. Its purpose is to unleash the transformative capabilities of this technology for employees within organizations of varying sizes and across diverse industries.</p>
<p>Introduces robust enhancements to the generative AI service, <strong>Amazon Bedrock</strong>.</p>
<p><a target="_blank" href="https://aws.amazon.com/de/bedrock/"><strong>Amazon Bedrock</strong></a>, an entirely managed service on AWS, provides access to extensive language models and other foundational models (FMs) from prominent artificial intelligence (AI) companies such as AI21, Anthropic, Cohere, Meta, and Stability AI, all consolidated through a unified API.</p>
<p>I would also like to share more information about Amazon Bedrock here about the innovations that were announced at the latest AWS re:Invent.</p>
<p>Fine-tuning for Amazon Bedrock:<br />Now, there are increased opportunities for model customization in Amazon Bedrock, featuring fine-tuning support for Cohere Command Lite, Meta Llama 2, and Amazon Titan Text models, with Anthropic Claude's support expected soon.</p>
<p>These recent enhancements to Amazon Bedrock significantly reshape how organizations, regardless of their size or industry, can leverage generative AI to drive innovation and redefine customer experiences.</p>
<p>AWS is compatible with all the leading deep-learning frameworks, facilitating their deployment. The deep-learning Amazon Machine Image, accessible on both Amazon Linux and Ubuntu, allows for the creation of managed, auto-scalable GPU clusters. This enables training and inference processes to be conducted at any scale. Also, AWS offers a range of AI services that allow you to integrate pre-trained models into your applications without the need for deep expertise in machine learning. Services like <strong>Amazon Rekognition</strong> for image and video analysis, <strong>Amazon Comprehend</strong> for natural language processing, and <strong>Amazon Polly</strong> for text-to-speech can enhance your applications with AI capabilities.</p>
<p>The best way to solidify your understanding of ML, AI, and GenAI on AWS is through hands-on projects. Start with simple projects and gradually increase complexity as you gain confidence. Use datasets available on platforms like Kaggle or create your own to train and test models.</p>
<p>Conclusion:</p>
<p>Embarking on a journey into Machine Learning, Artificial Intelligence, and Generative Artificial Intelligence on AWS is an exciting endeavor. By following these steps, you can lay a solid foundation for your understanding and proficiency in leveraging AWS services for ML and AI applications. Remember, the key to success is a combination of hands-on experience, continuous learning, and active engagement with the AWS community. Happy training!</p>
]]></content:encoded></item><item><title><![CDATA[CloudFormation or Terraform or both :)]]></title><description><![CDATA[Both tools allow provisioning AWS infrastructure as code, but have key differences in approach and capabilities.
Infrastructure Modeling
CloudFormation uses YAML/JSON templates that define resources sequentially.
CloudFormation uses JSON/YAML templat...]]></description><link>https://tgaleev.com/cloudformation-or-terraform-or-both</link><guid isPermaLink="true">https://tgaleev.com/cloudformation-or-terraform-or-both</guid><category><![CDATA[Terraform]]></category><category><![CDATA[cloudformation]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Sat, 04 Nov 2023 23:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1710415539172/85538b80-d74a-476c-811d-672d1a5a8ef5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Both tools allow provisioning AWS infrastructure as code, but have key differences in approach and capabilities.</p>
<h2 id="heading-infrastructure-modeling"><strong>Infrastructure Modeling</strong></h2>
<p><em>CloudFormation uses YAML/JSON templates that define resources sequentially.</em></p>
<p>CloudFormation uses JSON/YAML templates to define AWS resources and their properties sequentially. Resources are created in the order defined in the template.</p>
<p><em>Terraform uses declarative configuration files and references between resources.</em></p>
<p>Terraform uses declarative configuration files written in HCL to define resources. Resources can reference attributes of other resources to establish dependencies between them in a flexible way.</p>
<h3 id="heading-example"><strong>Example</strong></h3>
<pre><code class="lang-plaintext"># CloudFormation
Resources:
  VPC:
    Type: AWS::EC2::VPC

  Subnet:
    Type: AWS::EC2::Subnet 
    Properties: 
      VpcId: !Ref VPC
</code></pre>
<pre><code class="lang-plaintext"># Terraform
resource "aws_vpc" "main" {}

resource "aws_subnet" "example" {
  vpc_id = aws_vpc.main.id
}
</code></pre>
<h2 id="heading-state-management"><strong>State Management</strong></h2>
<p>CloudFormation relies on the template to implicitly define the desired state. It does not maintain an explicit real-time state of deployed resources.</p>
<p>Terraform explicitly tracks the real-time state of all resources in a state file, usually stored locally or in remote storage like S3. This allows checking differences between the configuration and current state to maintain consistency.</p>
<h2 id="heading-programming-interface"><strong>Programming Interface</strong></h2>
<p>CloudFormation provides CLI and APIs.</p>
<p>CloudFormation provides a CLI and AWS APIs for managing templates and deployments. Custom logic can be added through custom resources.</p>
<p>Terraform offers rich plugins and SDK for custom providers.</p>
<p>In addition to the CLI and APIs, Terraform has a rich plugin ecosystem and supports programming infrastructure with its own API and SDK. This allows writing custom providers, provisioners and other automation tools.</p>
<h2 id="heading-use-cases"><strong>Use Cases</strong></h2>
<ul>
<li><p>Simple single AWS account deployments use CloudFormation</p>
</li>
<li><p>Complex multi-account infrastructure uses Terraform</p>
</li>
<li><p>Automating tasks beyond IaC requires Terraform</p>
</li>
</ul>
<p>For example, a multi-tier app could use:</p>
<ul>
<li><p>CloudFormation for per-account VPCs and load balancers</p>
</li>
<li><p>Terraform for cross-account databases/queues</p>
</li>
<li><p>Custom Terraform provider to deploy containers</p>
</li>
</ul>
<h2 id="heading-other-considerations"><strong>Other Considerations</strong></h2>
<ul>
<li><p>Version control</p>
</li>
<li><p>Stack policies</p>
</li>
<li><p>Change sets</p>
</li>
<li><p>Target types</p>
</li>
<li><p>Modules</p>
</li>
<li><p>Automation</p>
</li>
<li><p>IDE integration</p>
</li>
</ul>
<p>In summary, while both serve IaC purposes, Terraform provides more flexibility, portability and automation capabilities - especially for multi-account, hybrid infrastructure deployments at scale.</p>
]]></content:encoded></item><item><title><![CDATA[Using EKS with Lambda on AWS]]></title><description><![CDATA[AWS EKS (Elastic Kubernetes Service) allows you to easily run Kubernetes clusters in the AWS cloud. Lambda is AWS' serverless compute service that allows you to run code without provisioning or managing servers. This article discusses how you can int...]]></description><link>https://tgaleev.com/using-eks-with-lambda-on-aws</link><guid isPermaLink="true">https://tgaleev.com/using-eks-with-lambda-on-aws</guid><category><![CDATA[AWS]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[lambda]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Sun, 08 Oct 2023 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1710413081999/0bbce727-f388-4997-a1d0-645efda78c58.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS EKS (Elastic Kubernetes Service) allows you to easily run Kubernetes clusters in the AWS cloud. Lambda is AWS' serverless compute service that allows you to run code without provisioning or managing servers. This article discusses how you can integrate EKS with Lambda to build serverless applications on Kubernetes.</p>
<h2 id="heading-deploying-lambda-functions-to-eks"><strong>Deploying Lambda functions to EKS</strong></h2>
<p>The main way to integrate EKS and Lambda is by deploying Lambda functions as Kubernetes deployments and services. This allows Kubernetes to manage and orchestrate the execution of Lambda code.</p>
<p>The steps to deploy a Lambda function to EKS are:</p>
<ol>
<li><p>Create a Lambda function using the AWS CLI or SDK. This deploys the code and configuration to Lambda.</p>
</li>
<li><p>Create a Kubernetes Deployment and Service that points to the Lambda function ARN (Amazon Resource Name). The service exposes the function through a ClusterIP.</p>
</li>
<li><p>Kubernetes will trigger invocations of the Lambda function through the service endpoint. It handles load balancing, auto-scaling and orchestration of the function.</p>
</li>
</ol>
<h2 id="heading-benefits-of-using-eks-and-lambda-together"><strong>Benefits of using EKS and Lambda together</strong></h2>
<ul>
<li><p>Leverage Kubernetes APIs and tools to deploy, manage and scale Lambda functions</p>
</li>
<li><p>Build serverless applications as Kubernetes workloads for portability across environments</p>
</li>
<li><p>Take advantage of Kubernetes features like auto-scaling, rolling updates, blue-green deployments etc. for Lambda code</p>
</li>
<li><p>Integrate Lambda functions into existing container-based applications on EKS</p>
</li>
</ul>
<p>This allows building fully serverless applications that leverage the power of Kubernetes for orchestration along with AWS Lambda's ease of use.</p>
<p>Here is a sample Python code for deploying a Lambda function as a Kubernetes workload:</p>
<pre><code class="lang-plaintext"># Create Lambda function
lambda_client.create_function(
   FunctionName='myfunction',
   Runtime='python3.8', 
   Handler='index.handler',
   Code={
      'ZipFile': bytecode
   }
)

# Create Kubernetes Deployment
api.create_namespaced_deployment(
   namespace='default',
   body={
      'apiVersion': 'apps/v1',
      'kind': 'Deployment',
      'metadata': {
         'name': 'lambda-deploy'
      },
      'spec': {
         'replicas': 1,
         'selector': {
            'matchLabels': {
               'app': 'lambda'
            }
         },
         'template': {
            'metadata': {
               'labels': {
                  'app': 'lambda'
               }
            },
            'spec': {
               'containers': [
                  {
                     'name': 'lambda-container',
                     'image': 'public.ecr.aws/lambda/python:3.8',
                     'env': [
                        {
                           'name': 'AWS_LAMBDA_FUNCTION_NAME', 
                           'value': 'myfunction'
                        },
                        {
                           'name': 'AWS_REGION',
                           'value': 'us-east-1' 
                        }
                     ]
                  }
               ]
            }
         }
      }
   }
)

# Expose Deployment as Kubernetes Service
api.create_namespaced_service(
   namespace='default',
   body={
      'apiVersion': 'v1', 
      'kind': 'Service',
      'metadata': {
         'name': 'lambda-service' 
      },
      'spec': {
         'ports': [
            {
               'port': 8080, 
               'targetPort': 8080
            }
         ],
         'selector': {
            'app': 'lambda'
         }
      }
   }
)
</code></pre>
<p>This demonstrates how to deploy a Lambda function as a Kubernetes workload and expose it through a service for invocation. EKS and Lambda provide a powerful way to build serverless applications on Kubernetes.</p>
]]></content:encoded></item><item><title><![CDATA[Security Group AWS NLB (AWS new feature)]]></title><description><![CDATA[You can now create security groups in AWS Network Load Balancer (AWS NLB)
With this update, you can configure rules to ensure that your NLB only accepts traffic from trusted IP addresses, and centrally enforce access control policies
If you are using...]]></description><link>https://tgaleev.com/security-group-aws-nlb-aws-new-feature</link><guid isPermaLink="true">https://tgaleev.com/security-group-aws-nlb-aws-new-feature</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS Security Group]]></category><category><![CDATA[nlb]]></category><category><![CDATA[lb]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 09 Aug 2023 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1693124954704/b7a17822-012b-483b-86a1-7ecfe054c272.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can now create security groups in AWS Network Load Balancer (AWS NLB)</p>
<p>With this update, you can configure rules to ensure that your NLB only accepts traffic from trusted IP addresses, and centrally enforce access control policies</p>
<p>If you are using EKS just update your LB controller to 2.6.0 version and configure it🫡</p>
<p>Please check out more information here:</p>
<p><a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2023/08/network-load-balancer-supports-security-groups/">https://aws.amazon.com/about-aws/whats-new/2023/08/network-load-balancer-supports-security-groups/</a></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html">https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html</a></p>
<h2 id="heading-lets-jump-deep">Let's jump deep</h2>
<p><img src="https://d2908q01vomqb2.cloudfront.net/fe2ef495a1152561572949784c16bf23abb28057/2023/08/18/Picture1-5-1024x574.png" alt /></p>
<p>A crucial aspect of configuring an NLB is setting up security groups to control inbound and outbound traffic to and from the load balancer. A security group acts as a virtual firewall, allowing only specific traffic to reach the NLB based on predefined rules. This article will discuss how to configure security groups for an AWS NLB.</p>
<h3 id="heading-creating-a-security-group-for-nlb">Creating a Security Group for NLB</h3>
<p>To create a security group for an NLB, follow these steps:</p>
<ol>
<li><p>Log in to the AWS Management Console and navigate to the VPC dashboard.</p>
</li>
<li><p>Click on "Security Groups" in the left-hand menu, then click the "Create security group" button.</p>
</li>
<li><p>Enter a name and description for the security group, select the VPC in which to create it, and click "Create."</p>
</li>
</ol>
<h3 id="heading-configuring-inbound-rules">Configuring Inbound Rules</h3>
<p>Once the security group is created, you need to configure inbound rules to allow traffic to reach the NLB. To do this, follow these steps:</p>
<ol>
<li><p>Click on the security group you just created.</p>
</li>
<li><p>Click the "Edit inbound rules" button.</p>
</li>
<li><p>Add a rule for each type of traffic you want to allow, specifying the protocol, port range, and source. For example, if you want to allow HTTP traffic from anywhere, add a rule with the following settings:</p>
<ul>
<li><p>Type: Custom TCP Rule</p>
</li>
<li><p>Protocol: TCP</p>
</li>
<li><p>Port range: 80</p>
</li>
<li><p>Source: 0.0.0.0/0 (or a specific IP address or range)</p>
</li>
</ul>
</li>
<li><p>Click "Save rules" to apply the changes.</p>
</li>
</ol>
<h3 id="heading-configuring-outbound-rules">Configuring Outbound Rules</h3>
<p>By default, outbound traffic is allowed from an NLB. However, you can configure outbound rules to restrict the types of traffic that can leave the NLB. To do this, follow these steps:</p>
<ol>
<li><p>Click on the security group you just created.</p>
</li>
<li><p>Click the "Edit outbound rules" button.</p>
</li>
<li><p>Add a rule for each type of traffic you want to allow, specifying the protocol, port range, and destination. For example, if you want to allow all outbound traffic, add a rule with the following settings:</p>
<ul>
<li><p>Type: Allow All Traffic</p>
</li>
<li><p>Protocol: All</p>
</li>
<li><p>Port range: All</p>
</li>
<li><p>Destination: 0.0.0.0/0 (or a specific IP address or range)</p>
</li>
</ul>
</li>
<li><p>Click "Save rules" to apply the changes.</p>
</li>
</ol>
<h3 id="heading-best-practices-for-configuring-security-groups">Best Practices for Configuring Security Groups</h3>
<p>When configuring security groups for an AWS NLB, follow these best practices:</p>
<ul>
<li><p>Allow only the minimum necessary traffic: Only allow the specific types of traffic that your application requires. This reduces the attack surface and helps prevent unauthorized access.</p>
</li>
<li><p>Use specific sources and destinations: Instead of allowing all traffic from anywhere, specify a specific IP address or range. This provides an additional layer of security.</p>
</li>
<li><p>Use security groups in combination with network ACLs: Security groups and network access control lists (ACLs) work together to provide an additional layer of security. While security groups are stateful, meaning that they track the state of connections and allow return traffic, network ACLs are stateless and do not track the state of connections.</p>
</li>
<li><p>Regularly review security group rules: Regularly review your security group rules to ensure that they still meet your needs and are up-to-date with any changes in your application requirements.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Configuring security groups for an AWS NLB is a crucial aspect of setting up and securing your load balancer. By following the best practices outlined in this article, you can ensure that only the necessary traffic is allowed to reach your NLB and that your application remains secure.</p>
]]></content:encoded></item><item><title><![CDATA[🥳Mountpoint for AWS S3💥]]></title><description><![CDATA[This will make it easier for me.🫡🤗
With this update you can create a mount point and mount AWS S3 bucket (or a path within a bucket) at the mount point, and then access the bucket using shell commands (ls, cat, dd, find, and so forth), library func...]]></description><link>https://tgaleev.com/mountpoint-for-aws-s3</link><guid isPermaLink="true">https://tgaleev.com/mountpoint-for-aws-s3</guid><category><![CDATA[AWS]]></category><category><![CDATA[S3]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Tue, 08 Aug 2023 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1693125336625/9eb24c2e-e252-407c-bee6-caead58d7c9c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This will make it easier for me.🫡🤗</p>
<p>With this update you can create a mount point and mount AWS S3 bucket (or a path within a bucket) at the mount point, and then access the bucket using shell commands (ls, cat, dd, find, and so forth), library functions (open, close, read, write, creat, opendir, and so forth) or equivalent commands and functions as supported in the tools and languages that you already use.</p>
<p>Before many AWS users use the S3 APIs and the AWS SDKs to build applications that can list, access, and process the contents of an S3 bucket and now you can more 🫡🥳</p>
<p>Some information about this update:</p>
<p>Pricing – you pay only for the underlying S3 operations.<br />Performance – Mountpoint is able to take advantage of the elastic throughput offered by S3, including data transfer at up to 100 Gb/second between each EC2 instance and S3.<br />Credentials – Mountpoint accesses your S3 buckets using the AWS credentials that are in effect when you mount the bucket.<br />Storage Classes – You can use Mountpoint to access S3 objects in all storage classes except S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, S3 Intelligent-Tiering Archive Access Tier, and S3 Intelligent-Tiering Deep Archive Access Tier.<br />Open Source – Mountpoint is open source and has a public roadmap. Your contributions are welcome; be sure to read our Contributing Guidelines and our Code of Conduct first.</p>
<p>Some links:</p>
<p><a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2023/08/mountpoint-amazon-s3-generally-available/">https://aws.amazon.com/about-aws/whats-new/2023/08/mountpoint-amazon-s3-generally-available/</a></p>
<h2 id="heading-lets-jump-deep">Let's jump deep</h2>
<p>Mountpoint for AWS S3 is an open-source tool that enables you to mount an S3 bucket as a file system in your Linux environment, effectively bridging the gap between object storage and traditional file systems. Developed by MinIO, Mountpoint for AWS S3 brings several benefits to the table, making it easier for businesses to manage their cloud storage and streamline operations.</p>
<h3 id="heading-ease-of-integration">Ease of Integration</h3>
<p>Mountpoint for AWS S3 allows you to mount an S3 bucket as a local file system, enabling seamless integration with existing applications and tools that rely on traditional file I/O operations. This eliminates the need for additional programming or customization efforts when working with object storage, saving both time and resources.</p>
<p>Moreover, Mountpoint for AWS S3 supports various Linux file systems, including XFS, ext4, and Btrfs, providing flexibility in choosing the right file system for your specific use case.</p>
<h3 id="heading-improved-performance">Improved Performance</h3>
<p>By mounting an S3 bucket as a file system, Mountpoint for AWS S3 enables you to take advantage of native Linux caching mechanisms, such as the page cache and the dentry cache. These caches help reduce latency and improve throughput by storing frequently accessed data in memory, resulting in faster access times and more efficient data transfers.</p>
<p>Additionally, Mountpoint for AWS S3 supports multi-threaded operations, allowing it to leverage multiple CPU cores for parallel data processing. This further enhances performance and enables you to handle large datasets more efficiently.</p>
<h3 id="heading-data-durability-and-security">Data Durability and Security</h3>
<p>Amazon S3 is designed for 99.999999999% durability and provides a range of security features, such as access control policies, encryption, and data integrity checks. Mountpoint for AWS S3 ensures that these benefits are passed on to the file system level, allowing you to maintain the same level of data protection and durability without additional configuration.</p>
<p>Furthermore, Mountpoint for AWS S3 supports object locking, a feature that provides an additional layer of protection against accidental or malicious data modifications. Object locking can be used to create write-once-read-many (WORM) workflows, ensuring that critical data remains immutable and cannot be altered or deleted for a specified retention period.</p>
<h3 id="heading-cost-effective-scalability">Cost-Effective Scalability</h3>
<p>As your storage needs grow, so does the cost of managing and maintaining on-premises infrastructure. AWS S3 offers a pay-as-you-go pricing model, allowing you to scale your storage capacity without the need for upfront investments or complex capacity planning.</p>
<p>Mountpoint for AWS S3 enables you to tap into this scalability while maintaining a familiar file system interface, making it an attractive option for businesses looking to optimize their cloud storage costs.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Mountpoint for AWS S3 is a powerful tool that simplifies the integration of Amazon S3 with your Linux environment, improves performance, and maintains data durability and security. By bridging the gap between object storage and traditional file systems, Mountpoint for AWS S3 offers a cost-effective and scalable solution for managing your cloud storage needs. Whether you're working with big data, media assets, or backup and archival data, Mountpoint for AWS S3 can help streamline your operations and improve overall efficiency.</p>
]]></content:encoded></item><item><title><![CDATA[Deploying Kubernetes Clusters to AWS with k8s-cdk]]></title><description><![CDATA[The Kubernetes CDK (k8s-cdk) is an open-source project that makes it easy to define and provision Kubernetes infrastructure on AWS using the AWS CDK. It provides constructs for core Kubernetes resources like Clusters, Nodes, and Services that simplif...]]></description><link>https://tgaleev.com/deploying-kubernetes-clusters-to-aws-with-k8s-cdk</link><guid isPermaLink="true">https://tgaleev.com/deploying-kubernetes-clusters-to-aws-with-k8s-cdk</guid><category><![CDATA[AWS]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 07 Jun 2023 22:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1710410155409/40fdac3a-d8ed-47e6-848c-8bccf090bcfe.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Kubernetes CDK (k8s-cdk) is an open-source project that makes it easy to define and provision Kubernetes infrastructure on AWS using the AWS CDK. It provides constructs for core Kubernetes resources like Clusters, Nodes, and Services that simplify the deployment of Kubernetes applications to AWS.</p>
<p><strong>So let's start to create:</strong></p>
<p>To get started, install the k8s-cdk library:</p>
<pre><code class="lang-plaintext">npm install --save @kubernetes-cdk/cdk-core
</code></pre>
<p>Then create a new CDK app:</p>
<pre><code class="lang-plaintext">cdk init app --language=typescript
</code></pre>
<p>This will set up a basic CDK app structure with TypeScript support.</p>
<h2 id="heading-defining-a-cluster"><strong>Defining a Cluster</strong></h2>
<p>To define a Kubernetes cluster, import the Cluster construct and provide configuration:</p>
<pre><code class="lang-plaintext">import { Cluster } from '@kubernetes-cdk/cdk-core';

//...

new Cluster(this, 'MyCluster', {
  version: k8s.KubernetesVersion.V1_21,
  subnets: [subnet1, subnet2] 
});
</code></pre>
<p>This will provision a managed EKS cluster running Kubernetes across the specified subnets.</p>
<h2 id="heading-deploying-resources"><strong>Deploying Resources</strong></h2>
<p>Additional Kubernetes resources like Pods, Services, etc. can then be defined and added to the cluster:</p>
<pre><code class="lang-plaintext">const nginxDeployment = new k8s.Deployment(this, 'NginxDeployment', {
  cluster: cluster,
  spec: {
    selector: {
      matchLabels: {
        app: 'nginx',
      },
    },
    //...
  },
});
</code></pre>
<h2 id="heading-deploying"><strong>Deploying</strong></h2>
<p>Finally, synthesize and deploy the CDK app to provision the Kubernetes infrastructure and deploy the resources:</p>
<pre><code class="lang-plaintext">cdk deploy
</code></pre>
<p>The k8s-cdk makes it simple to define Kubernetes clusters and applications using familiar AWS CDK patterns. This allows for infrastructure as code deployments of Kubernetes on AWS.</p>
]]></content:encoded></item><item><title><![CDATA[Domain-Driven Design (DDD) in AWS. Find Your Business Domains.]]></title><description><![CDATA[This article is an introduction to Domain-Driven Design and how it can be used with AWS. I will provide guidance on how to define business domains in legacy monolithic applications and decompose them into a set of microservices step by step. By start...]]></description><link>https://tgaleev.com/domain-driven-design-ddd-in-aws-find-your-business-domains</link><guid isPermaLink="true">https://tgaleev.com/domain-driven-design-ddd-in-aws-find-your-business-domains</guid><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Wed, 29 Mar 2023 15:27:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1680105018922/99b1c0bc-5f47-41fa-8e3e-e2ce9b281cc5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article is an introduction to Domain-Driven Design and how it can be used with AWS. I will provide guidance on how to define business domains in legacy monolithic applications and decompose them into a set of microservices step by step. By starting with Domain-Driven Design for your microservices, you can get the benefits of cloud scaling in your new refactored application.</p>
<p>Is Domain-Driven Design usefull for me?</p>
<p>The purpose of <a target="_blank" href="https://en.wikipedia.org/wiki/Domain-driven_design#:~:text=Domain%2Ddriven%20design%20(DDD),should%20match%20the%20business%20domain.">Domain-Driven Design</a> is to free the domain code from technical details to have more room to work with its complexity. It is well suited to work with very complex domains and projects that are starting to dive into legacy.</p>
<p>Domain-Driven Design requires an understanding of the business idea or understanding of the final 'business product'. It requires time and commitment from both business experts and technical implementers. Domain-Driven Design should not be used in situations where you need 'quick solutions'. Instead, use Domain-Driven Design for software that supports the core business area rather than supporting areas. Running Domain-Driven Design can be achieved through an <strong>event-storming session</strong>. However, as mentioned above, this is a commitment worth making. It will allow you to develop software that is more tailored to the needs of your end clients. It will also help create decoupled services that are more scalable and maintainable. The combination will result in greater business agility.</p>
<h2 id="heading-event-storming">Event Storming</h2>
<p>Event Storming helps teams of business and technical people come to a consensus on what the solution should be. This happens without being distracted by the specific implementation details of how it will be implemented. This means, that it may take longer for the teams to start providing source code. However, all teams will be better aligned as to what each microservice should be responsible for. The event storming workshop is a brainstorming session. In this session all stakeholders in the solution work together to define the business events that correspond to the domains. Suppose, we have a commerce development task where the business event might be a customer who applies for a new product. During the workshop, the group will begin to identify the object that triggered the event, the processes that should occur as a result, and any subsequent event triggered by the original event.</p>
<p>To do this, a team brainstorming session takes place where the event groups can identify areas of their business and then the contexts in which they operate. These can be used to define the usage and the relationship that occurs between each microservice and it’s context. Once the domains have been defined with the help of the business experts, the technical implementers can start designing the solution.</p>
<p>The result of the event-storming session is a domain model for development. The domain model can be used to define a number of <strong>bounded contexts</strong>.</p>
<h2 id="heading-bounded-contexts">Bounded Contexts</h2>
<p>A bounded context is the boundary where each domain applies. The order contract opening example can be thought of as the 'Order Contract Opening Context' in a shop. In a complete system, there may be other contexts such as the product context, the description context and the manufacturing context. Identifying the business events that cause interactions between the different constrained contexts helps to determine how your microservices will interact with each other in the new architecture.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1680103938717/a00a4095-ce52-4b80-bb70-2e2b200360fe.png" alt="Image description" /></p>
<p>The example context map is just a sample of the core domains. There are also a number of supporting and subsequent domains. Although it is necessary to have a service that manages these, this is not part of the core application domain for sending products.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1680103940853/a34c7ddb-6836-455f-b860-31ce1d8064b6.png" alt="Image description" /></p>
<h2 id="heading-core-domains-and-refactoring">Core Domains and refactoring</h2>
<p>When you start defining "Core Domains and Subsequent Domains" the question that usually arises is how to manage requests between domains.To do this, we'll look at options for using AWS services for how domains can be implemented. Containerisation or serverless diagram of such solutions would rather be a 'modern architecture' than a good old-fashioned network and virtual machine deployment diagram. The advantage of these solutions is that the diagram itself helps to actually outline, what the logical functionality is, since we can be more expressive and fine-grained with resource usage. The undisputed king of serverless computing platforms has been <a target="_blank" href="https://aws.amazon.com/lambda/">AWS Lambda</a>AWS Lambda for several years now. It satisfies all the aforementioned conditions and can be used in a number of languages/implementations, including TypeScript. Other viable options might include some of the more well-known container services such as <a target="_blank" href="https://aws.amazon.com/ecs/">AWS ECS</a> or <a target="_blank" href="https://aws.amazon.com/eks/">AWS EKS</a> wrapped Fargate. However, they require considerably more setup and configuration, and also require that containerization actually takes place. It doesn't mean that containerisation is bad, in general containerisation can be good, it all depends on your development idea whether it's refactoring into microservices or starting a new application. If you need Eventing then here it is <a target="_blank" href="https://aws.amazon.com/sns/">Simple Notification Service (SNS)</a>. It is a push-based service, i.e. it automatically handles the distribution of the event to the recipients. SNS uses a pay-per-use model and it is essentially serverless as the only infrastructure you need is the SNS subject. The modern cloud is about using its own API products to expose its applications, rather than building something of its own with Fastify, Kong or the like. The API gateway acts as the only public interface connected to any other infrastructure, in our case primarily our Lambda compute functions, which will respond to paths defined in the gateway. In the case of AWS the service of interest, unsurprisingly, is called the <a target="_blank" href="https://aws.amazon.com/api-gateway/">AWS API Gateway</a>.</p>
<p>The following diagram shows how monolith receives some of the traffic during the gradual addition of new microservices in the example application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1680105796339/c519c53b-80bc-4c39-8f79-e47c43843bf8.png" alt class="image--center mx-auto" /></p>
<p>There is also <a target="_blank" href="https://aws.amazon.com/migration-hub/">AWS Migration Hub</a>, it will help you in finding your domains and even offer AWS services that you can implement. This will help you to plan a refactoring or plan for migrating from your old OnPrem solutions to AWS with all modern solutions.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>To summarise, people just don't tend to talk about 'domains' all day. Most employees do not pay attention to the implementation of domains in the organisation. It is also worth noting that dividing systems into domains after they have been fully designed is also useless. DDD should be done, at least approximately, at the initial design stage. But in any case, DDD is the place to be. In the example of using DDD in AWS it looks simple but when you start to go deeper here you find a lot of services where they all have to be interconnected and here comes the methods and dependencies. That's why it's very important to create a structure at the beginning of the work.</p>
]]></content:encoded></item><item><title><![CDATA[AWS, Terraform ,WordPress. Step-by-Step Guide Example]]></title><description><![CDATA[I've looked at many ways to run Wordpress on AWS but they are all expensive or unstable for me. So I decided to make a stable version of Wordpress installation that suited me.
Infrastructure management has changed a lot over the years. So much so tha...]]></description><link>https://tgaleev.com/aws-terraform-wordpress-step-by-step-guide-example</link><guid isPermaLink="true">https://tgaleev.com/aws-terraform-wordpress-step-by-step-guide-example</guid><category><![CDATA[AWS]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[WordPress]]></category><category><![CDATA[Tutorial]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Tue, 24 Jan 2023 12:52:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1675258745784/5b58e875-7f3f-48d9-be6a-8b75ae983bec.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've looked at many ways to run Wordpress on AWS but they are all expensive or unstable for me. So I decided to make a stable version of Wordpress installation that suited me.</p>
<p>Infrastructure management has changed a lot over the years. So much so that the traditional system administrator managed a rack full of servers. In many cases, the initial setup required manual intervention at the console.</p>
<p><strong>Why AWS for WordPress might be a great choice</strong> To begin with, AWS is a big deal. It's the cloud hosting provider with the largest market share. AWS is such a successful platform that it accounts for half of Amazon's operating income, which is in the billions of dollars.</p>
<p>AWS offers high scalability, making it ideal for sites with thousands of daily visitors. The platform also allows for any server configuration. This is ideal for high-performance sites such as online stores.</p>
<p><em>In this article, I will show you how to use Terraform to install Wordpress on AWS</em>.</p>
<p><strong>Our objectives</strong></p>
<ul>
<li><p>Create a ‘tf’ files which will hold all of our relevant configuration information (main.tf, …)</p>
</li>
<li><p>Define which provider we will be using in the Terraform config. (aws, cloudflare)</p>
</li>
<li><p>Certificate handling (AWS Certificate Manager)</p>
</li>
<li><p>Define security group rules and names.</p>
</li>
<li><p>Define the EC2 instances we want to create. (AWS Auto Scaling Group)</p>
</li>
<li><p>Run Terraform to plan and apply our configuration.</p>
</li>
</ul>
<p>Before you get started, you’ll need to sign up for AWS. During the process, you’ll need to verify your account using a credit card – onto which they’ll charge $1 – and receive a verification code via SMS. When you’re ready, select the Free support plan and you’ll get access to your console, which is where the magic happens</p>
<p><em>So, let's start creating our wordpress in AWS</em></p>
<p>I will assume that you have already configured all the necessary tools to run such as <em>Terraform</em>, <em>aws-cli</em></p>
<p>The first thing you need to do is clone the repository: <code>$ git clone https://github.com/timurgaleev/wordpress-ec2-rds-alb-vpc.git</code></p>
<p>Once you've taken the templates from the repository, you now need to configure the 'tf' files, which will hold all of the relevant Terraform configuration. You can encapsulate it in one file, but for simplicity and convenience we will work within multiple files.</p>
<p>The structure of the repository is as follows:</p>
<p><code>acm.tf</code> - AWS Certificate Manager Terraform module <code>alb.tf</code> - AWS Application and Network Load Balancer Terraform module <code>asg.tf</code> - AWS Auto Scaling Group (ASG) Terraform module <code>cloudflare.tf</code> - Cloudflare Provider <code>efs.tf</code> - Provides an Elastic File System (EFS) File System resource <code>output.tf</code> - Terraform Output Values <code>rds.tf</code> - AWS RDS Terraform module <code>security_group.tf</code> - AWS EC2-VPC Security Group Terraform module <code>vpc.tf</code> - AWS VPC Terraform module <code>variables.tf</code> - variables used in Terraform.</p>
<p>Prerequisites needed for creating a envirenment provider.tf you will need a change.</p>
<pre><code class="lang-plaintext">backend "s3" {
    bucket         = "ecs-terraform-examplecom-state"
    key            = "example/com.tfstate"
    region         = "eu-west-1"
    encrypt        = "true"
    dynamodb_table = "ecs-terraform-remote-state-dynamodb"
  }
</code></pre>
<p>It will also use an S3 bucket that will be used as a remote store for our Terraform state. This allows multiple users to work with one set of Infrastructure as Code without causing conflicts.</p>
<p>Then change the variables according to your needs. <code>variables.tf</code> like environment, Domain, Cloudflare API Token</p>
<pre><code class="lang-plaintext">variable "asg_min_size" {
  description = "AutoScaling Group Min Size "
  default     = 1
}

variable "asg_max_size" {
  description = "AutoScaling Group Max Size "
  default     = 2
}

variable "asg_desired_capacity" {
  description = "AutoScaling Group Desired Capacity"
  default     = 1
}

variable "rds_engine" {
  description = "RDS engine"
  default     = "mariadb"
}

variable "rds_engine_version" {
  description = "RDS engine version"
  default     = "10.6.7"
}

variable "rds_instance_class" {
  description = "RDS instance class"
  default     = "db.t3.micro"
}

variable "site_domain" {
  description = "Domain"
  default     = "example.com"
}

variable "cloudflare_zone" {
  description = "cloudflare Zone Id"
}

variable "dns_ttl" {
  description = "cloudflare for dns = 1 is automatic."
  default     = 1
}

variable "dns_allow_overwrite_records" {
  description = "cloudflare allow overwrite records."
  default     = true
}

variable "cloudflare_api_token" {
  description = "cloudflare api token"
}

variable "ssh_key_name" {
  description = "SSH Key"
}
</code></pre>
<p>Now we can run our WordPress.</p>
<ul>
<li><p>Run <code>terraform init</code></p>
</li>
<li><p>Run <code>terraform plan</code> and review</p>
</li>
<li><p>Run <code>terraform apply</code></p>
</li>
</ul>
<p>Now enter your server domain, and you'll see the following WordPress installation screen. If that's the case, great job! If you don't see that screen, you may need to double-check the steps you followed.</p>
<p>You can destroy this WordPress by running:</p>
<pre><code class="lang-bash">terraform plan -destroy
terraform destroy  --force
</code></pre>
<p><strong>Conclusion</strong> We've only had a superficial look at how Terraform can be used in AWS, but I think a simple introduction is the best part! We created a database in RDS and plugged it into Wordpress as well as created and wired the necessary data into Cloudflare.</p>
]]></content:encoded></item><item><title><![CDATA[AWS Organization: Best Practices]]></title><description><![CDATA[https://youtube.com/watch?v=DxGTWm8dFEo&feature=shares
 
In this video, we explained AWS organization's best practices.
Benefits of using AWS Control Tower and AWS Config.
How to connect Azure AD to AWS IAM Identity Center (AWS SSO) and how you can m...]]></description><link>https://tgaleev.com/aws-organization-best-practices</link><guid isPermaLink="true">https://tgaleev.com/aws-organization-best-practices</guid><category><![CDATA[AWS]]></category><category><![CDATA[Azure]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Thu, 06 Oct 2022 13:16:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1675260943637/4866dfae-358e-40fc-98b0-915215c8a99b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtube.com/watch?v=DxGTWm8dFEo&amp;feature=shares">https://youtube.com/watch?v=DxGTWm8dFEo&amp;feature=shares</a></div>
<p> </p>
<h3 id="heading-in-this-video-we-explained-aws-organizations-best-practices">In this video, we explained AWS organization's best practices.</h3>
<h3 id="heading-benefits-of-using-aws-control-tower-and-aws-config">Benefits of using AWS Control Tower and AWS Config.</h3>
<h3 id="heading-how-to-connect-azure-ad-to-aws-iam-identity-center-aws-sso-and-how-you-can-manage-users-via-ad">How to connect Azure AD to AWS IAM Identity Center (AWS SSO) and how you can manage users via AD.</h3>
]]></content:encoded></item><item><title><![CDATA[Service Mesh Architecture in Action]]></title><description><![CDATA[Today, Service Meshes in the IT world are becoming an integral part of the cloud-native stack. A large cloud application may require hundreds of microservices and serve a million users concurrently. Service Mesh is a low-latency infrastructure layer ...]]></description><link>https://tgaleev.com/service-mesh-architecture-in-action-2719120f768d</link><guid isPermaLink="true">https://tgaleev.com/service-mesh-architecture-in-action-2719120f768d</guid><category><![CDATA[#ServiceMesh]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Timur Galeev]]></dc:creator><pubDate>Fri, 24 Jun 2022 07:26:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1675259085205/28b8e379-d00f-43a7-b9a0-d01cfc0561c8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, Service Meshes in the IT world are becoming an integral part of the cloud-native stack. A large cloud application may require hundreds of microservices and serve a million users concurrently. <strong>Service Mesh is a low-latency infrastructure layer</strong> that allows high traffic communication between different components of a cloud application like backend, frontend and database. Most of this is done via Application Programming Interfaces (APIs).</p>
<p>Some examples of open source Service Meshes like <a target="_blank" href="https://linkerd.io/">Linkerd</a>, <a target="_blank" href="https://istio.io/">Istio</a> and <a target="_blank" href="https://kuma.io/">Kuma</a> are ways to control how different parts of an application share data with each other. Service Meshes provide an overarching view of your service and aid with complex activities like tests, roll-outs, access restrictions and end-to-end authentication.</p>
<p><strong>Service Mesh helps</strong> you push operational issues into the infrastructure so the application code is easier to understand, maintain and adapt. <strong>Service Mesh integration</strong>, explains how to manage traffic management, security and policy to manage a microservice architecture.</p>
<p>So, what does it mean? <strong>Service Mesh</strong> in the real case and what will it solve. Let’s look at some examples:</p>
<p><strong>Applications are monolithic</strong></p>
<p>Typically applications are monolithic, meaning it is one program, built as one binary and ran as one process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1675259081775/5d35eaa8-9d32-4a66-b1ee-00c828f56b03.png" alt /></p>
<p><em>Example of a Monolithic application</em></p>
<p>You got a mesh of services and the following questions that you will have to solve:</p>
<ul>
<li><strong>Scale and Release</strong>. Scaling a monolithic application is difficult and difficult to maintain some autoscaling. Lots of teams trying to make changes at the same time, and get your own version of many bugs.</li>
<li><strong>Reduced flexibility of technology</strong>. Trying new technologies (e.g., a new programming language) without changes the entire code base to them is often problematic.</li>
<li><strong>Sprints into team dynamics</strong>. Last but not least, it’s harder to draw boundaries of responsibility, assign roles and develop a team when you only have one big result.</li>
</ul>
<p>Many of these issues are only a problem when a particular scale is reached. In most cases a monolithic application is a reasonable choice, again, this depends on requirements. However, you have started to scale and this is no longer a choice for you.</p>
<p><strong>Microservices implementation</strong></p>
<p>Microservices architecture helps developers modify application services without redeployment. The difference in the different architectures of the applications being developed is that individual microservices are created by small teams with a choice of their own tools and coding languages. In general, microservices are created independently of each other, communicate with each other, and can individually fail without causing disruption to the entire application.</p>
<p>A typical application is like a monolith, it consists of several logical modules: such as frontend, backend, database and so on. Communication between services is what makes microservices unique. The logic that controls communication can be encoded in each service without a service mesh layer, but as communication becomes more complex, the service mesh becomes more valuable. For cloud applications built in a microservices architecture, service meshing is a way to combine a large number of individual services into a functional application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1675259083463/3879a398-1088-4108-bae4-2e7597bc88bb.png" alt /></p>
<p><em>Example microservices</em> architecture</p>
<p>Moving to microservices, you will get the following benefits:</p>
<ul>
<li>You can develop each service independently of the other.</li>
<li>You can scale each service independently.</li>
<li>You are free to write each service using the technology of your choice, as long as they use the same communication interface between them.</li>
</ul>
<p>Although the Service Mesh architecture is not limited to microservice-based systems, they provide a good example of how the service mesh is used in action.</p>
<p>You could shift the mentioned problems to the microservices by giving them additional, infrastructure-specific logic. This would then mean the following:</p>
<ul>
<li>You will be forced to recreate this logic for every technology stack you use.</li>
<li>The application would expand considerably beyond its business needs. So do developers want to work on the application or on the infrastructure?</li>
</ul>
<p>So you need some sort of software stack that modernizes your deployment and allows your services to discover each other, control traffic and policies, and provide observability, ideally without modifying the services themselves. This infrastructure solution is called the <strong>Service Mesh</strong>.</p>
<p><strong>How can the integration of the service mesh optimize communication?</strong></p>
<p>Each service that is added to an application, or a new instance of an existing service running in a container, complicates the communication environment and creates new points of potential failure. In a complex microservices architecture, it can become nearly impossible to identify where problems are occurring without the Service Mesh.</p>
<p>This happens because the Service Mesh also captures every aspect of communication between services in the form of performance metrics. For example, if a particular service fails, the Service Mesh can collect data on how long it took before a successful retry. As failure time data is accumulated for a given service, rules can be written to determine the optimal wait time before re-requesting that service, so that the system is not overwhelmed by unnecessary re-requests.</p>
<p><strong>Conclusion</strong></p>
<p>Implementing a Service Mesh to manage system communications can accelerate microservice deployment by providing a consistent approach that manages key network functions. The Service Mesh architecture is ideal for systems with a large number of services because it allows network issues to be separated from the application code and policies to be applied from a central source of truth, either universally or selectively based on specified criteria.</p>
<p>I hope this explanation can help you to choose the right mental model for understanding the source of the problems that <strong>Service Mesh</strong> solves. In the next article, I would like to explain how the <a target="_blank" href="https://aws.amazon.com/de/app-mesh/"><strong>AWS App Mesh</strong></a> can be used, as well as the pros and cons.</p>
]]></content:encoded></item></channel></rss>