Creating a Text Histogram of S3 Bucket Object Sizes Using Bash

3 min readNov 6, 2023

In the world of cloud storage, understanding the distribution of file sizes can be crucial for optimizing performance and costs. For users of Amazon S3, there is no built-in tool to quickly visualize this distribution. However, with a simple Bash script, we can generate a histogram that provides insights into the size categories of the objects stored within an S3 bucket.

The Script

The script takes a single argument, which is the name of the S3 bucket you wish to analyze. By using a combination of AWS CLI commands and AWK, the script outputs a simple text-based histogram representing the distribution of object sizes within the specified bucket.

#!/usr/bin/env bash

set -e

bucket_name="$1"
aws s3api list-objects-v2 --bucket "$bucket_name" --query 'Contents[].Size' --output text | tr '\t' '\n' | \
awk '{
  if ($1 >= 0 && $1 < 1024) bin["0-1KB"]++;
  else if ($1 < 10240) bin["1KB-10KB"]++;
  else if ($1 < 102400) bin["10KB-100KB"]++;
  else if ($1 < 1024000) bin["100KB-1MB"]++;
  else if ($1 < 10240000) bin["1MB-10MB"]++;
  else if ($1 < 102400000) bin["10MB-100MB"]++;
  else bin["100MB+"]++;
}
END {
  for (b in bin) {
    print b ": " bin[b]
  }
}'

How It Works

1. Input: The script expects the bucket name as its first argument.

2. Fetch Object Sizes: It uses the AWS CLI’s `list-objects-v2` command to fetch the sizes of all objects in the bucket.

3. Text Processing: The output is piped through `tr` to replace tabs with newlines, creating a list where each size is on its own line.

4. AWK Script: The sizes are then piped into an AWK script, which sorts them into predefined size bins. Each size is checked against a series of if-else conditions, and a counter for the appropriate size bin is incremented.

5. Output: Once all sizes have been categorized, the AWK script prints the bin labels and counts in no particular order.

Example Output and Interpretation

The example output of the script is straightforward:


0–1KB: 2
10KB-100KB: 112
100KB-1MB: 109
1KB-10KB: 1
1MB-10MB: 12

This output tells us that the bucket contains:

- Two objects of size between 0 and 1KB.
- One object of size between 1KB and 10KB.
- 112 objects of size between 10KB and 100KB.
- 109 objects of size between 100KB and 1MB.
- Twelve objects of size between 1MB and 10MB.

No objects larger than 10MB were found in this particular case.

Use Cases

This histogram can be particularly useful for:

- Cost Analysis: Larger files may cost more to access or transfer, depending on your S3 pricing tier.
- Performance Optimization: Knowing the size distribution can help in optimizing performance, as larger files may take longer to process or transfer.
- Storage Management: Identifying the distribution can aid in storage lifecycle policies, such as moving rarely accessed large files to cheaper storage classes.

Conclusion

With a simple Bash script, S3 users can gain valuable insights into their storage patterns, enabling better decision-making regarding cost, performance, and storage management. The histogram is a powerful visualization tool, even in its text-based form, for quick and actionable analytics.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Andy Rea

4 Followers

1 Following

Experimenting with Medium to share my AWS CLI queries in combination with other shell utilities and also help from ChatGPT for post and image content

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

10.6K

260

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

9.4K

170

Lists

General Coding Knowledge

20 stories1945 saves

Natural Language Processing

1977 stories1620 saves

Productivity

245 stories697 saves

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

CodeX

AI Rabbit

Goodbye Obsidian

Feb 6

1.1K

SpaceX Has Finally Figured Out Why Starship Exploded, And The Reason Is Utterly Embarrassing

Predict

Will Lockett

SpaceX Has Finally Figured Out Why Starship Exploded, And The Reason Is Utterly Embarrassing

This should never have happened.

Mar 1

8.8K

212

Google just confirmed the AI reality many programmers are desperately trying to deny

Coding Beauty

Tari Ibaba

Google just confirmed the AI reality many programmers are desperately trying to deny

AI is slowly taking over coding but many programmers are still sticking their head in the sand about what’s coming…

Feb 20

2.2K

190

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams