I Built My Own Content Delivery Network

I wanted to consolidate content while keeping everything under my control as much as possible. And who doesn’t need an excuse to geek out a little bit?

Background

I use weblog.lol, part of the fabulous omg.lol family of services provided by Neatnik, to host several blogs (including this one). At present I am using three different services to host content online: Flickr for photographs, some.pics for other image files, and Amazon S3 for everything else (PDFs mostly, but also some audio and video).

There are a few problems with this arrangement.

Flickr doesn’t really like to be your CDN. Their terms of service forbid using “Flickr as a generic image hosting service for banner advertisements, graphics, etc.” Does that mean it’s okay to use it as a generic photo hosting service? I don’t know, I’m no lawyer.
Although Flickr does automatically resize photos to a variety of sizes, it’s a pain in the butt to grab all the distinct URLs that Flickr generates for each one.
Using some.pics as a generic image hosting platform violates the spirit of that service, which is to be a sort of lightweight photo sharing service à la Instagram (in the old days).
Managing content on some.pics is clunky, to say the least. File names have abstruse names and there’s virtually no structure to how content is stored. It wasn’t designed for this.
Neato, a new content management system from Neatnik, is coming soon and will succeed weblog.lol. There are no details yet, but I want to be prepared for a significant amount of disruption when it arrives. I’d like my content to be in some neutral place.
I don’t like having content spread out across three different platforms.

Why build your own?

Setting up my own CDN seemed like a good idea. So why build my own when there are services like Cloudflare and Bunny that do this for cheap? I’m a nerd who also happens to be certified as an AWS architect and this is just the kind of thing that’s fun for me.

What does it need to do?

The problem I was trying to solve was pretty simple:

One place to host everything: photographs, other image assets (PNGs, JPGs, SVGs, icons, etc.), audio, video, PDFs, and whatever else I might need
Use a custom domain, in this case cdn.mihobu.lol using SSL/TLS
Dynamically resize image files upon request
High availability and durability

How?

I started with an AWS blog post, which pretty much spells out all the details. I first needed to obtain or create some requisite resources:

An SSL certificate issued by Porkbun
Two S3 Buckets, one for storing the original content objects and a second one to store the transformed (resized) image files
A new DNS subdomain (cdn.mihobu.lol)
An AWS Lambda Function for handling the dynamic image resizing. I didn’t make this from scratch. I pretty much lifted it as-is from Amazon’s Image Optimization GitHub Repo. The code is written in Node.js.

Then there was the matter of creating and configuring the Amazon CloudFront Distribution itself, which includes:

Origins tell the Distribution where the content is stored.
Origin Groups define a failover sequence. If the content we’re looking for doesn’t exist in one Origin, then try a second Origin instead.
Behaviors map request paths to Origins. For example, a request matching the pattern https://example.com/pdf/*) might be mapped to an S3 Bucket called my-pdf-bucket.
Origin Access Identities are used by the S3 service to identify an Origin and authorize access (or not).
Cache Policies tell CloudFront how to cache content at the edge.
Response Header Policies provide a way to configure which HTTP headers are included with the response, and also allow for custom headers
S3 Bucket Policies control which resources (e.g. a CloudFront Distribution) can access content in the buckets.
S3 Bucket lifecycles provide a mechanism for automatically deleting content in an S3 Bucket after a predetermined period of time.
IAM Roles grant permission to resources in AWS. In this case, the image transformer Lambda Function needs permission to read from the original bucket and write to the transform bucket.
An inline URL rewrite function that transforms URLs of the form xyz.png?width=200&height=200 into a path: xyz.png/width=200/height=200. This is how the CDN knows where to look for a previously transformed copy of the image, if it exists.

How does it work?

It starts with an HTTP request for some content, e.g. https://cdn.mihobu.lol/img/demo.png?width=500.
The URL rewrite function converts the requested location into a path the CDN can use: /img/demo.png/width=500.
Since this is a request for an image, a Behavior that maps paths match /img/* to an Origin Group.
The Origin Group tells the CDN to look in the image transform bucket to see whether the requested object (/img/demo.png/width=500) already exists. If it does, the CDN serves it up.
If the transformed image does NOT exist, the Origin Group fails over to the second Origin in the list. This is the image transformer Lamdba Function.
The Lambda Function receives the request and looks in the original content bucket for the original image (/img/demo.png). If found, the image is resized to match the requested dimension(s). The resized image is stored in the transform bucket using the converted path (/img/demo.png/width=500) and then returned to the requester.

Requests for content other than images are straightforward:

HTTP request for a file.
If the file exists in the original content bucket, it is served up.

What will this cost?

As long as my traffic volume remains very small like it is today, this should cost next to nothing. S3 storage is suuuuuuuuuper cheap, and the CloudFront costs don’t really begin to pile up until you get into many thousands of requests and many gigibytes of data transfers. I’ll keep an eye on it an report back in a few months.

What’s next?

I’ve begun the process of migrating content, but it’ll take me some time to update all my blog pages and posts. There’s also the X-factor that is Neato. I expect some change then, so I don’t intend to move too fast until I know more about the extent of the impact.

This is admittedly a pretty advanced (and let's face it, totally unnecessary) thing for me to have done. I didn’t intend this as a tutorial so much as a description. Besides, I don't recommend this to most people. But it was fun and if you’d like to know more I’d be glad to chat with you about it.