Different Ways of Listing the Size of Each S3 Bucket

Andy Rea
3 min readNov 7, 2023

When it comes to managing AWS S3 storage, keeping track of the size of your buckets is crucial for maintaining cost efficiency and operational performance. With some handy scripting, you can easily keep tabs on your data storage. Here, we discuss three different approaches to listing the size of each S3 bucket.

Approach 1: The Simple ls and Recursive Technique

The first approach uses the AWS CLI’s ls command with the --recursive and --summarize flags. This method is straightforward and effective for buckets with a smaller number of objects. However, it can be slow for buckets that contain a large number of files due to the need to list every object.

#!/usr/bin/env bash

set -e

aws s3api list-buckets | jq -r '.Buckets[] | .Name' | \
while read -r bucketName;
do
echo "$bucketName,$(aws s3 ls "s3://$bucketName" --recursive --summarize --human-readable | grep "Total Size" | cut -d: -f2)"
done

Approach 2: Fetching Average Size from CloudWatch Metric Statistics

The second approach taps into the CloudWatch metric statistics to fetch the average size of each bucket over a period. This method provides a quicker overview and is not dependent on the number of objects in the bucket.

#!/usr/bin/env bash

set -e

aws s3api list-buckets | jq -r '.Buckets[] | .Name' | \
while read -r bucketName;
do
size=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 \
--start-time $(date -d '1 month ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date +%Y-%m-%dT%H:%M:%SZ) \
--period 31536000 \
--statistics Average \
--metric-name BucketSizeBytes \
--dimensions Name=BucketName,Value="$bucketName" Name=StorageType,Value=StandardStorage \
--output json)

size_in_bytes=$(jq 'if .Datapoints == [] then 0 else .Datapoints[0].Average end' <<< "$size")
echo -e "$bucketName,$(echo "scale=2; $size_in_bytes / (1024 * 1024 * 1024)" | bc) GB"
done

Approach 3: Enhancing with GNU Parallel

Building on the second method, this approach incorporates GNU parallel to speed up the processing. It’s particularly useful for accounts with a large number of buckets. By paralleling the requests, you can drastically reduce the time taken to get the size of each bucket.

#!/usr/bin/env bash

set -e

function get_average_bucket_size(){
bucketName=$1

size=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 \
--start-time $(date -d '1 month ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date +%Y-%m-%dT%H:%M:%SZ) \
--period 31536000 \
--statistics Average \
--metric-name BucketSizeBytes \
--dimensions Name=BucketName,Value="$bucketName" Name=StorageType,Value=StandardStorage \
--output json)

size_in_bytes=$(jq 'if .Datapoints == [] then 0 else .Datapoints[0].Average end' <<< "$size")
echo -e "$bucketName,$(echo "scale=2; $size_in_bytes / (1024 * 1024 * 1024)" | bc) GB"
}

export -f get_average_bucket_size

aws s3api list-buckets | jq -r '.Buckets[] | .Name' | sed 's/"//g' | \
parallel --will-cite --jobs 10 --colsep ',' get_average_bucket_size

Example Output

bucket-name-1, 2.3 GB
bucket-name-2, 754.8 MB
bucket-name-3, 68.5 GB
...

Use Cases

  • Cost Management: By regularly monitoring bucket sizes, organizations can better predict and manage AWS costs.
  • Data Lifecycle: Understanding bucket sizes can help in applying data lifecycle policies more effectively.
  • Performance Optimization: Large bucket sizes can affect performance; knowing sizes can be the first step in optimization.
  • Compliance: For regulatory compliance, it’s essential to have an overview of the data footprint.

Conclusion

Monitoring the size of your S3 buckets is key to maintaining a cost-effective and efficient cloud storage solution. Whether you prefer a simple command-line request, a CloudWatch metric, or a parallel processing script, you have options to get the insights you need. By using one of these methods, you can easily integrate bucket size monitoring into your regular AWS maintenance routine, ensuring you stay on top of your storage needs.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Andy Rea
Andy Rea

Written by Andy Rea

Experimenting with Medium to share my AWS CLI queries in combination with other shell utilities and also help from ChatGPT for post and image content

No responses yet

Write a response