Understanding Content Delivery Network (CDN) and How It Works in Practice

Understanding Content Delivery Network (CDN) and How It Works in Practice
Source: https://www.cloudflare.com/learning/cdn/what-is-a-cdn/

Note: This is a short post on the topic of Content Delivery Network (CDN). For thorough and in-depth understanding of the concept, please consult proper primary sources.

I believe whoever reading this must have seen subdomains that start with a "cdn", when loading a website for example. Have you wondered what that stands for?

In this article, I am going to talk about the concept of "Content Delivery Network" (a.k.a. "CDN"), and how it is used in Web development practices.

A Practical Scenario

Suppose that you are building a website. And soon this website has grown in popularity, so that people from all over the world are visiting it. Some users visit the website from the other side of the world, and finds it laggy to load the resources needed with the browser he uses. That is simply because the web server is located too far away from the connecting clients, so the latency increases and effective throughput decreases, sometimes drastically.

Solution

One way to solve the aforementioned problem is to deploy servers world-wide and direct the users to the nearest ones for faster connection. However, oftentimes the resources that cost most of the traffic are the files embedded in the web pages as opposed to the actual responses of the web server. As it would seem to be a waste of resources to duplicate the actual server in multiple locations, people have chosen to only store and serve the so-called "stock resources" I just mentioned, in geographically distributed servers, so that the users get to load the large pieces from the closest relay.

And that is the most primitive evolution of a Content Delivery Network (CDN), that is, to build resource relays at places near the users for faster access.

An Evolution

With the aforementioned solution, webmasters need to manage multiple servers and the contents that are served thereon, which sometimes may be repetitive tasks. So people figured out a better way to deploy a "resource relay".

A CDN "node" can be set up in such a way that it only "proxies" the resources that are loaded from the "origin" server in normal operations. And in the process the server caches everything that passes through itself, including everything that comes from the origin server. Whenever a resource is being requested, the proxy server, instead of simply doing the forwarding, asks the origin server if the requested resource is the same as the one that was cached last time. If the answer is "Yes", then the server simply returns with a copy from its own cache. And if the answer is "No", the server does the forwarding as in normal operations.

In such a setup, the admin doesn't have to manually manage what's stored on individual server "nodes" scattered across the world. He only needs to apply the prerequisite redirections to the nodes, and they are automatically populated with the desired resources to serve as users access the origin server through them. Stock images like those set for the website's banner are almost never changed, so they now are served from the cache of the CDN nodes instead of from that of the origin server.

Miscellaneous features are also provided, such as URL filter rules, analytics and redirection (HTTP 3xx). AWS S3 traditionally only provided storage, but it recently added an "acceleration" feature that works as a built-in CDN.

This more "generic" working principle of the CDN, is now used by major cloud platforms including Microsoft Azure, Amazon Web Services (AWS) and non-cloud service provider Cloudflare.

AWS CloudFront

AWS is the global cloud platform giant, providing a large collection of services. And CloudFront is one of the most commonly used amongst such services.

What CloudFront is, is in fact a combination of CDN content delivery acceleration and a so-called "Application-level Firewall".

When you set up your CloudFront Endpoint, you point the service at your website and configure various other parameters. After that, CloudFront generates a link for you (usually *.cloudfront.net), and you can then access your own proxied website with that link, through CloudFront.

If you happen to have your domain managed by AWS Route53, CloudFront can help you set up HTTPS with AWS's own CA certificates.

Azure CDN is a similar service that does the same job on Microsoft Azure.

The typical setup of a static website all using AWS services, is to use S3 (Simple Storage Service) for the storage of site content, Route53 for domain management, and CloudFront for HTTPS and CDN. If you want to write and deploy a Web App, you can choose to use AWS Elastic Beanstalk plus Route53 and CloudFront. Elastic Beanstalk is basically a managed environment for Web Apps based on traditional EC2 instances.

Cloudflare

Cloudflare works similar to CloudFront but you have to let it manage your domain, whereas CloudFront takes raw IP addresses too. It provides a wider variety of miscellaneous services compared to CloudFront. It has a Free plan and a Paid plan as opposed to the use of CloudFront is charged on demand.

Read more