AWS Cost Optimization: The Complete Guide

By Rajesh Medampudi April 8, 2026 · Edited May 5, 2026

Everything you need to know about reducing your AWS bill by 20-40% — from quick wins to architectural changes.

AWS Cost Optimization: The Complete Guide

The cost optimization framework
Tier 1: Quick wins (do this week)
Tier 2: Commitment discounts (do this month)
Tier 3: Architectural optimization (strategic)
Tools and resources
Building a cost-conscious culture
Common mistakes to avoid
When to get help
Next steps

Every company I’ve audited has the same story: AWS costs grew faster than expected.

It’s not because AWS is expensive. It’s because:

Defaults favor availability over cost — Multi-AZ, On-Demand, S3 Standard
Engineers don’t see the bill — no feedback loop between provisioning and spending
Growth happens faster than cleanup — old resources accumulate
Nobody owns cloud cost — it’s everyone’s job, so it’s nobody’s job
Fear of breaking things — “let’s just leave it running”

The good news: these problems are fixable. This guide shows you how.

The cost optimization framework

Think of optimization in three tiers:

AWS cost optimization 3-tier framework pyramid — Tier 3 Architecture at top (Months, 30-50% savings), Tier 2 Commitments in middle (Days, 20-40% savings), Tier 1 Quick Wins at bottom (Hours, 10-30% savings)

Always start at Tier 1. Work your way up.

Tier 1: Quick wins (do this week)

1.1 Find and terminate idle resources

Idle EC2 instances

#!/bin/bash
# find-idle-ec2.sh
# Find EC2 instances with <5% average CPU over 14 days
# Requires GNU date (Linux). On macOS, install coreutils and use gdate.

THRESHOLD=5
DAYS=14

aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`Name`]|[0].Value]' \
  --output text | while IFS=$'\t' read -r instance_id instance_type name; do

  avg_cpu=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/EC2 \
    --metric-name CPUUtilization \
    --dimensions Name=InstanceId,Value=$instance_id \
    --start-time $(date -d "-${DAYS} days" -u +%Y-%m-%dT%H:%M:%SZ) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
    --period 86400 \
    --statistics Average \
    --query 'Datapoints[].Average' \
    --output text | awk '{sum+=$1; count++} END {if(count>0) print sum/count; else print 0}')

  if (( $(echo "$avg_cpu < $THRESHOLD" | bc -l) )); then
    echo "IDLE: $instance_id ($name) - Type: $instance_type - Avg CPU: ${avg_cpu}%"
  fi
done

Unattached EBS volumes

#!/bin/bash
# find-unattached-ebs.sh

echo "Unattached EBS Volumes:"
echo "========================"

aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table

# Calculate total cost
total_gb=$(aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'sum(Volumes[].Size)' \
  --output text)

echo ""
echo "Total unattached storage: ${total_gb} GB"
echo "Estimated monthly cost: \$$(echo "$total_gb * 0.10" | bc) (gp2/gp3)"

Old EBS snapshots

#!/usr/bin/env python3
# find-old-snapshots.py

import boto3
from datetime import datetime, timezone

ec2 = boto3.client('ec2')
MAX_AGE_DAYS = 90

# Get account ID
sts = boto3.client('sts')
account_id = sts.get_caller_identity()['Account']

# Get all snapshots owned by this account
snapshots = ec2.describe_snapshots(OwnerIds=[account_id])['Snapshots']

old_snapshots = []
total_size = 0

for snap in snapshots:
    age = datetime.now(timezone.utc) - snap['StartTime']
    if age.days > MAX_AGE_DAYS:
        old_snapshots.append({
            'SnapshotId': snap['SnapshotId'],
            'Size': snap['VolumeSize'],
            'Age': age.days,
            'Description': snap.get('Description', 'N/A')[:50]
        })
        total_size += snap['VolumeSize']

print(f"\nSnapshots older than {MAX_AGE_DAYS} days:")
print("=" * 80)
for snap in sorted(old_snapshots, key=lambda x: x['Age'], reverse=True)[:20]:
    print(f"{snap['SnapshotId']} | {snap['Size']:>5} GB | {snap['Age']:>4} days | {snap['Description']}")

print(f"\nTotal old snapshots: {len(old_snapshots)}")
print(f"Total size: {total_size} GB")
print(f"Estimated monthly cost: ${total_size * 0.05:.2f}")

1.2 Right-size over-provisioned instances

Check EC2 recommendations from Compute Optimizer

# Enable Compute Optimizer (one-time)
aws compute-optimizer update-enrollment-status --status Active

# Get EC2 recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[?finding==`OVER_PROVISIONED`].[
    instanceArn,
    currentInstanceType,
    recommendationOptions[0].instanceType,
    recommendationOptions[0].projectedUtilizationMetrics
  ]' \
  --output table

RDS right-sizing analysis

#!/usr/bin/env python3
# analyze-rds-utilization.py

import boto3
from datetime import datetime, timezone, timedelta

rds = boto3.client('rds')
cloudwatch = boto3.client('cloudwatch')

def get_rds_metrics(db_identifier, days=14):
    """Get CPU and memory metrics for RDS instance"""
    end_time = datetime.now(timezone.utc)
    start_time = end_time - timedelta(days=days)

    metrics = {}

    for metric_name in ['CPUUtilization', 'FreeableMemory', 'DatabaseConnections']:
        response = cloudwatch.get_metric_statistics(
            Namespace='AWS/RDS',
            MetricName=metric_name,
            Dimensions=[{'Name': 'DBInstanceIdentifier', 'Value': db_identifier}],
            StartTime=start_time,
            EndTime=end_time,
            Period=3600,
            Statistics=['Average', 'Maximum']
        )

        if response['Datapoints']:
            avg = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
            max_val = max(d['Maximum'] for d in response['Datapoints'])
            metrics[metric_name] = {'average': avg, 'maximum': max_val}

    return metrics

# Analyze all RDS instances
instances = rds.describe_db_instances()['DBInstances']

print("RDS Instance Utilization Analysis")
print("=" * 80)

for instance in instances:
    db_id = instance['DBInstanceIdentifier']
    instance_class = instance['DBInstanceClass']

    metrics = get_rds_metrics(db_id)

    if metrics.get('CPUUtilization'):
        cpu_avg = metrics['CPUUtilization']['average']
        cpu_max = metrics['CPUUtilization']['maximum']

        status = "OK"
        if cpu_avg < 20 and cpu_max < 50:
            status = "OVER-PROVISIONED - Consider downsizing"
        elif cpu_avg > 80:
            status = "UNDER-PROVISIONED - Consider upsizing"

        print(f"\n{db_id} ({instance_class})")
        print(f"  CPU Avg: {cpu_avg:.1f}% | CPU Max: {cpu_max:.1f}%")
        print(f"  Status: {status}")

1.3 Storage class optimization

S3 bucket analysis

#!/usr/bin/env python3
# analyze-s3-storage.py

import boto3
from datetime import datetime, timezone, timedelta

s3 = boto3.client('s3')
cloudwatch = boto3.client('cloudwatch')

def get_bucket_size(bucket_name):
    """Get bucket size from CloudWatch metrics"""
    now = datetime.now(timezone.utc)
    start = now.replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=1)

    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/S3',
        MetricName='BucketSizeBytes',
        Dimensions=[
            {'Name': 'BucketName', 'Value': bucket_name},
            {'Name': 'StorageType', 'Value': 'StandardStorage'}
        ],
        StartTime=start,
        EndTime=now,
        Period=86400,
        Statistics=['Average']
    )
    if response['Datapoints']:
        return response['Datapoints'][0]['Average']
    return 0

buckets = s3.list_buckets()['Buckets']

print("S3 Bucket Storage Analysis")
print("=" * 80)

total_standard = 0
recommendations = []

for bucket in buckets:
    name = bucket['Name']

    try:
        # Check if lifecycle policy exists
        try:
            s3.get_bucket_lifecycle_configuration(Bucket=name)
            has_lifecycle = True
        except s3.exceptions.ClientError:
            has_lifecycle = False

        size_bytes = get_bucket_size(name)
        size_gb = size_bytes / (1024 ** 3)

        if size_gb > 1:  # Only show buckets > 1 GB
            total_standard += size_gb

            if not has_lifecycle and size_gb > 10:
                recommendations.append({
                    'bucket': name,
                    'size_gb': size_gb,
                    'potential_savings': size_gb * 0.023 * 0.7  # Assume 70% can move to IA/Glacier
                })

            print(f"{name}: {size_gb:.2f} GB | Lifecycle: {'Yes' if has_lifecycle else 'NO'}")

    except Exception as e:
        print(f"{name}: Error - {e}")

print(f"\n{'=' * 80}")
print(f"Total Standard Storage: {total_standard:.2f} GB")
print(f"Monthly cost (Standard): ${total_standard * 0.023:.2f}")

if recommendations:
    print("\nBuckets needing lifecycle policies:")
    for rec in sorted(recommendations, key=lambda x: x['size_gb'], reverse=True):
        print(f"  {rec['bucket']}: {rec['size_gb']:.2f} GB - Potential savings: ${rec['potential_savings']:.2f}/mo")

1.4 Set up cost alerts

# Create a budget with alerts
aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Monthly-AWS-Budget",
    "BudgetLimit": {
      "Amount": "10000",
      "Unit": "USD"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {
          "SubscriptionType": "EMAIL",
          "Address": "your-email@company.com"
        }
      ]
    },
    {
      "Notification": {
        "NotificationType": "FORECASTED",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 100,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {
          "SubscriptionType": "EMAIL",
          "Address": "your-email@company.com"
        }
      ]
    }
  ]'

Tier 2: Commitment discounts (do this month)

2.1 Reserved Instances vs Savings Plans

Feature	Reserved Instances	Savings Plans
Discount	Up to 72%	Up to 66%
Flexibility	Instance-type specific	Any instance type
Region-locked	Yes	Compute SP: No
Best for	Predictable, stable workloads	Variable workloads
Recommendation	RDS, ElastiCache	EC2, Fargate, Lambda

2.2 RI / SP decision framework

#!/usr/bin/env python3
# ri-sp-analyzer.py

import boto3

ce = boto3.client('ce')

def get_ri_recommendations(service):
    """Get RI purchase recommendations"""
    response = ce.get_reservation_purchase_recommendation(
        Service=service,
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT',
        LookbackPeriodInDays='SIXTY_DAYS'
    )
    return response.get('Recommendations', [])

def get_sp_recommendations():
    """Get Savings Plans recommendations"""
    response = ce.get_savings_plans_purchase_recommendation(
        SavingsPlansType='COMPUTE_SP',
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT',
        LookbackPeriodInDays='SIXTY_DAYS'
    )
    return response.get('SavingsPlansPurchaseRecommendation', {})

print("Reserved Instance Recommendations")
print("=" * 60)

for service in [
    'Amazon Elastic Compute Cloud - Compute',
    'Amazon Relational Database Service',
    'Amazon ElastiCache',
]:
    recs = get_ri_recommendations(service)
    if recs:
        for rec in recs:
            details = rec.get('RecommendationDetails', [])
            for detail in details[:3]:  # Top 3 recommendations
                print(f"\n{service}")
                print(f"  Instance: {detail.get('InstanceDetails', {})}")
                print(f"  Monthly savings: ${float(detail.get('EstimatedMonthlySavingsAmount', 0)):.2f}")
                print(f"  Upfront cost: ${float(detail.get('UpfrontCost', 0)):.2f}")

print("\n" + "=" * 60)
print("Savings Plans Recommendations")
print("=" * 60)

sp_rec = get_sp_recommendations()
if sp_rec:
    details = sp_rec.get('SavingsPlansPurchaseRecommendationDetails', [{}])[0]
    print(f"Recommended hourly commitment: ${details.get('HourlyCommitmentToPurchase', 'N/A')}")
    print(f"Estimated monthly savings: ${float(details.get('EstimatedMonthlySavingsAmount', 0)):.2f}")

2.3 Implementing commitments safely

Start-small strategy:

Week 1: Buy RIs/SPs for 50% of stable On-Demand usage
Month 1: Monitor utilization, ensure >80% RI/SP usage
Month 2: Increase to 70% coverage
Month 3: Reach target 80-90% coverage

# Monitor RI utilization
aws ce get-reservation-utilization \
  --time-period Start=$(date -d "-30 days" +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --query 'UtilizationsByTime[].Total.[UtilizationPercentage]'

# Monitor Savings Plans utilization
aws ce get-savings-plans-utilization \
  --time-period Start=$(date -d "-30 days" +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY

Tier 3: Architectural optimization (strategic)

3.1 Spot instances for stateless workloads

EKS with Karpenter v1 spot configuration

# karpenter-spot-nodepool.yaml — Karpenter v1 (GA)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workloads
spec:
  template:
    metadata:
      labels:
        workload-type: spot
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: spot-template
      taints:
        - key: spot
          value: "true"
          effect: NoSchedule
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: 500
    memory: 500Gi

---
# Deployment that uses Spot
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  template:
    spec:
      tolerations:
        - key: spot
          operator: Equal
          value: "true"
          effect: NoSchedule
      nodeSelector:
        karpenter.sh/capacity-type: spot
      containers:
        - name: processor
          image: batch-processor:latest
          resources:
            requests:
              cpu: "1"
              memory: "2Gi"
            limits:
              cpu: "2"
              memory: "4Gi"

3.2 Graviton migration

Terraform for Graviton EKS node group

# Graviton node group — 20% cheaper than x86
resource "aws_eks_node_group" "graviton" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "graviton-nodes"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = aws_subnet.private[*].id

  # Graviton instance types
  instance_types = ["m6g.large", "m6g.xlarge", "c6g.large", "c6g.xlarge"]

  # ARM64 AMI
  ami_type = "AL2_ARM_64"

  scaling_config {
    desired_size = 3
    min_size     = 1
    max_size     = 10
  }

  labels = {
    "kubernetes.io/arch" = "arm64"
    "node-type"          = "graviton"
  }

  taint {
    key    = "arch"
    value  = "arm64"
    effect = "NO_SCHEDULE"
  }

  tags = {
    Name        = "graviton-node"
    Environment = var.environment
    CostCenter  = "platform"
  }
}

Multi-arch Docker build

# Dockerfile for multi-architecture support
FROM --platform=$BUILDPLATFORM golang:1.21-alpine AS builder

ARG TARGETARCH
ARG TARGETOS

WORKDIR /app
COPY . .

RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} \
    go build -o /app/server ./cmd/server

# Final image
FROM alpine:3.18
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

# Build and push multi-arch image
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t your-registry/app:latest \
  --push .

3.3 VPC endpoint optimization

# vpc-endpoints.tf

locals {
  gateway_endpoints = ["s3", "dynamodb"]

  interface_endpoints = [
    "ecr.api",
    "ecr.dkr",
    "logs",
    "monitoring",
    "secretsmanager",
    "ssm",
    "ssmmessages",
    "ec2messages",
    "sts"
  ]
}

# Gateway endpoints (free for S3/DynamoDB)
resource "aws_vpc_endpoint" "gateway" {
  for_each = toset(local.gateway_endpoints)

  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.${each.key}"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = aws_route_table.private[*].id

  tags = {
    Name = "${each.key}-endpoint"
  }
}

# Interface endpoints (charged per hour + data)
resource "aws_vpc_endpoint" "interface" {
  for_each = toset(local.interface_endpoints)

  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.${each.key}"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "${each.key}-endpoint"
  }
}

resource "aws_security_group" "vpc_endpoints" {
  name_prefix = "vpc-endpoints-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Tools and resources

Free tools

Tool	Purpose	Link
AWS Cost Explorer	Primary cost analysis	Built into AWS
AWS Compute Optimizer	Right-sizing recommendations	Built into AWS
AWS Trusted Advisor	Best practice checks	Limited free tier
Infracost	Terraform cost estimation	infracost.io
Komiser	Multi-cloud cost dashboard	github.com/tailwarden/komiser

Building a cost-conscious culture

Technical optimization is only half the battle. The other half is organizational.

1. Make costs visible

#!/usr/bin/env python3
# weekly-team-cost-report.py

import boto3
from datetime import datetime, timezone, timedelta

def generate_team_cost_report():
    ce = boto3.client('ce')

    end = datetime.now(timezone.utc).strftime('%Y-%m-%d')
    start = (datetime.now(timezone.utc) - timedelta(days=7)).strftime('%Y-%m-%d')

    response = ce.get_cost_and_usage(
        TimePeriod={'Start': start, 'End': end},
        Granularity='DAILY',
        Metrics=['UnblendedCost'],
        GroupBy=[{'Type': 'TAG', 'Key': 'Team'}]
    )

    # Format and send to Slack/email
    for day in response['ResultsByTime']:
        for group in day['Groups']:
            team = group['Keys'][0].replace('Team$', '') or 'Untagged'
            cost = float(group['Metrics']['UnblendedCost']['Amount'])
            print(f"{day['TimePeriod']['Start']} | {team}: ${cost:.2f}")

2. Set team budgets

Each team should have visibility into their own costs and accountability for staying within budget.

3. Celebrate cost wins

Make cost optimization achievements as visible as feature launches.

Common mistakes to avoid

5 AWS cost optimization mistakes — optimizing too early, over-committing, ignoring data transfer, forgetting dev/staging, not tagging resources

Optimizing too early — Don’t buy 3-year RIs for a startup you can’t predict
Over-committing — Start with 50% coverage, increase gradually
Ignoring data transfer — Often 5-15% of the bill, completely invisible
Forgetting about dev/staging — Often running 24/7 unnecessarily
Not tagging resources — You can’t optimize what you can’t see

When to get help

Consider getting professional help if:

AWS spend is >$10K/month and growing
No one has reviewed the bill in 6+ months
You lack in-house AWS expertise
Previous optimization attempts haven’t stuck
You need to show ROI quickly

AWS Cost Optimization: The Complete Guide

§The cost optimization framework

§Tier 1: Quick wins (do this week)

§1.1 Find and terminate idle resources

§1.2 Right-size over-provisioned instances

§1.3 Storage class optimization

§1.4 Set up cost alerts

§Tier 2: Commitment discounts (do this month)

§2.1 Reserved Instances vs Savings Plans

§2.2 RI / SP decision framework

§2.3 Implementing commitments safely

§Tier 3: Architectural optimization (strategic)

§3.1 Spot instances for stateless workloads

§3.2 Graviton migration

§3.3 VPC endpoint optimization

§Tools and resources

§Building a cost-conscious culture

§Common mistakes to avoid

§When to get help

§Next steps

How to Make Money on DEG Mods

🔄Recent Milestones & Updates | 2026-04-27🌟⬆️

Amazon Just Bet $75 Billion on Two Rivals. Only One of Them Can Win.