Over time, AWS accounts can accumulate resources that are no longer necessary but continue to incur costs. One common example is orphaned EBS snapshots left behind after volumes are deleted. Managing these snapshots manually can be tedious and costly.
This guide shows how to automate the cleanup of orphaned EBS snapshots using Python (Boto3) and Terraform in an AWS Lambda function, which is then triggered using AWS EventBridge on a schedule or event.
By the end, you’ll have a complete serverless solution to keep your AWS environment clean and cost-effective.
First, let’s ensure the essential tools are installed.
AWS CLI
The AWS CLI allows command-line access to AWS services. Install it according to your operating system:
macOS: brew install awscli
Windows: AWS CLI Installer
Linux: Use the package manager (e.g., sudo apt install awscli for Ubuntu).
Verify installation:
aws --version
Terraform
Terraform is a popular Infrastructure as Code (IaC) tool for defining and managing AWS resources.
macOS: brew install terraform
Windows: Terraform Installer
Linux: Download the binary and move it to /usr/local/bin.
Verify installation:
terraform -version
Configure your AWS CLI with access keys to allow Terraform and Lambda to authenticate with AWS services.
Get Access Keys from your AWS account (AWS IAM Console).
Configure AWS CLI:
aws configure
Follow the prompts to enter your Access Key, Secret Access Key, default region (e.g., us-east-1), and output format (e.g., json).
Step-by-step instructions to create a Lambda function is provided here.
This Lambda function uses Boto3, AWS’s Python SDK, to list all EBS snapshots, check their associated volume status, and delete snapshots where the volume is no longer available. Here’s the complete function code:
import boto3 import logging logger = logging.getLogger() logger.setLevel(logging.INFO) def lambda_handler(event, context): ec2_cli = boto3.client("ec2") response = ec2_cli.describe_snapshots(OwnerIds=["self"], DryRun=False) snapshot_id = [] for each_snapshot in response["Snapshots"]: try: volume_stat = ec2_cli.describe_volume_status( VolumeIds=[each_snapshot["VolumeId"]], DryRun=False ) except ec2_cli.exceptions.ClientError as e: if e.response["Error"]["Code"] == "InvalidVolume.NotFound": snapshot_id.append(each_snapshot["SnapshotId"]) else: raise e if snapshot_id: for each_snap in snapshot_id: try: ec2_cli.delete_snapshot(SnapshotId=each_snap) logger.info(f"Deleted SnapshotId {each_snap}") except ec2_cli.exceptions.ClientError as e: return { "statusCode": 500, "body": f"Error deleting snapshot {each_snap}: {e}", } return {"statusCode": 200}
Using Terraform, we’ll create a Lambda function, IAM role, and policy to deploy this script to AWS. Additionally, we’ll set up an EventBridge rule to trigger Lambda on a regular schedule.
Terraform Setup and Provider Configuration
This section configures Terraform, including setting up remote state management in S3.
Note: Change the required_version value as per the terraform -version output.
aws --version
IAM Role and Policy for Lambda
This IAM configuration sets up permissions for Lambda to access EC2 and CloudWatch, enabling snapshot deletion and logging.
terraform -version
Packaging and Deploying the Lambda Function
Here, we package the Python code and deploy it as a Lambda function.
aws configure
EventBridge Rule for Lambda Invocation
AWS EventBridge allows you to create scheduled or event-based triggers for Lambda functions. Here, we’ll configure EventBridge to invoke our Lambda function on a schedule, like every 24 hours. You can learn more about EventBridge and scheduled events in AWS documentation here.
import boto3 import logging logger = logging.getLogger() logger.setLevel(logging.INFO) def lambda_handler(event, context): ec2_cli = boto3.client("ec2") response = ec2_cli.describe_snapshots(OwnerIds=["self"], DryRun=False) snapshot_id = [] for each_snapshot in response["Snapshots"]: try: volume_stat = ec2_cli.describe_volume_status( VolumeIds=[each_snapshot["VolumeId"]], DryRun=False ) except ec2_cli.exceptions.ClientError as e: if e.response["Error"]["Code"] == "InvalidVolume.NotFound": snapshot_id.append(each_snapshot["SnapshotId"]) else: raise e if snapshot_id: for each_snap in snapshot_id: try: ec2_cli.delete_snapshot(SnapshotId=each_snap) logger.info(f"Deleted SnapshotId {each_snap}") except ec2_cli.exceptions.ClientError as e: return { "statusCode": 500, "body": f"Error deleting snapshot {each_snap}: {e}", } return {"statusCode": 200}
After defining the infrastructure, initialize and apply the Terraform configuration:
terraform { required_version = ">=1.5.6" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.72.0" } } backend "s3" { bucket = "terraform-state-files-0110" key = "delete-orphan-snapshots/terraform.tfstate" region = "us-east-1" dynamodb_table = "tf_state_file_locking" } } provider "aws" { region = "us-east-1" }
To verify that the solution works:
Wrapping Up
By combining Python (Boto3), Terraform, and AWS EventBridge, we’ve created a fully automated, serverless solution to clean up orphaned EBS snapshots. This setup not only reduces cloud costs but also promotes a tidy, efficient AWS environment. With scheduled invocations, you can rest assured that orphaned resources are consistently removed.
Try this solution in your own AWS account and experience the benefits of automation in cloud resource management!
The above is the detailed content of Say Goodbye to Orphaned Snapshots: Automate Cleanup with Serverless, Terraform, and AWS EventBridge!. For more information, please follow other related articles on the PHP Chinese website!