Investigating data transfer cost reduction for AWS S3

Trying to find ways to save on data transfer costs was a side quest to my main mission, but it ended up being worthwhile.

Recently, I saw a quote from a Jason Lengstorf article that resonated with me:

"A healthy team is intentionally underutilized. This creates free time for chasing shiny things with low risk."

Earlier this week, I had the opportunity to do some investigative work and was able to determine how to decrease our S3 data transfer costs in AWS by as much as 56%. This is a concrete example of why you shouldn't max out every resource, and this is the story of what happened.

Cost explorer

My employer uses multiple products in the AWS stack, including numerous EC2 instances. These instances are mostly servers that never get turned off. As such, we are on a savings plan with AWS. The savings plan is a yearly agreement that locks us into a specific region, instance type, and tenancy. In exchange for committing to spend a certain amount per month, AWS gives us a hefty discount on the on-demand rate.

Our savings plan was expiring for the year, so I spent a lot of time reviewing instances. Instances that were no longer needed were decommissioned, and a new savings plan for the next year was purchased.

While the main purpose of the exercise was to purchase a new savings plan for EC2, I am a genuinely curious person. I spent a lot of time reviewing AWS Cost Explorer while trying to see how we should tweak our savings plan for the coming year. As I was in cost explorer, I noticed that our data transfer cost was the 3rd highest monthly expense in our bill.

I had a suspicion what was responsible for most of those costs. My assumption was correct, and using the filters in cost explorer, I learned that S3 was largely responsible. Because we serve images directly from S3 buckets, it means we are paying more than we need to.

In a previous post, I mentioned why serving images directly from S3 isn't a good idea for web performance. While I made a plan to address it, the effort keeps getting de-prioritized.

With the savings I found, I am betting my previously de-prioritized project will become a priority because our fiscal year starts in June, and we are doing budget planning right now. It's unfortunate that this is the case, but it is the reality.

Cost optimization for data transfer

The images on our sites are both stored in and served by S3. An excerpt of the current S3 pricing looks like this:

Per month Cost per GB ($)
First 10TB 0.09

In looking at ways to reduce data transfer cost for S3, AWS Cloudfront kept coming up again and again. This isn't revolutionary, new information, but I needed to model the pricing to make sure it was really a good idea.

An excerpt of the current Cloudfront pricing looks like this:

Per month Cost per GB ($)
First 10TB 0.085 - 0.120

Cloudfront gives a range depending on the region closest to the end user. Transfer costs are lowest in North America and Europe, and it is the most expensive in Asia. If a higher percentage of traffic came from Asia, then data transfer would be more expensive byte-for-byte with Cloudfront than it is using S3.

Limiting regions

When using Cloudfront, it's possible to set your distribution to specific price classes. If you set up the distribution to "Price Class 100," it would be limited to the cheapest regions, so there would be guaranteed cost savings.

So for 10TB of data transfer with Cloudfront vs. S3, you are left with this:

Service Monthly Cost for 10TB
S3 $921.60
Cloudfront $870.40

Savings plan

Just like EC2 offers savings plans, so does Cloudfront. Known as the security savings bundle, it lowers your costs by up to 30%.

With a savings plan, the 10TB Cloudfront price from before would become $609.28.

Other optimizations

Here is where it gets interesting. The images served out of the S3 buckets are largely JPEGs. If these images a modern format, the data transfer size would be much smaller. According to a compression study by Google, webp can reduce the size of a JPEG by 25-34%. If the JPEGs were converted to webp, the data transfer size would be reduced to somewhere between 6.76TB to 7.68TB, which means our transfer cost would fall between $402.12 and $456.96.

The service to convert the image and store them would incur some additional fees, so it would be worth exploring how this might impact overall costs.

Overall plan

Breaking it down into smaller steps, the implementation plan looks like this:

  1. Create new Cloudfront distribution and limit it to Price Class 100.
  2. Replace S3 urls with Cloudfront. This will still be better performance than serving everything out of us-east-1 because it's closer to the users and uses HTTP/2
  3. Once images are served with Cloudfront, purchase a security savings bundle to reduce costs.
  4. Evaluate price class options for performance-vs-cost optimization.
  5. Create service for on-the-fly webp conversion with Lambda@edge (optional)

Conclusion

As with anything that is pay-per-use, you should verify this on your own for your exact situation to avoid unexpected expenditures and not take what I found as a universal solution. In my case, it ended up being a worthwhile endeavor because I have a path to reduce the data transfer cost by as much as 56%, while providing a better, more performant service for our users to consume.

The cost savings example above is precisely why giving engineers chances to investigate and experiment is something that every business should do.