Hosting static websites with S3, CloudFront and Lambda

Ben New in Perth, Australia

2020-02-26

# Jumping to the solution

I have deployed a lot of static websites (including single-page applications (SPAs)) to AWS using no-VM services over the years, but apparently not frequently enough for me to learn to consistently avoid a set of common and persistent mistakes.

So I have created a serverless static website template, which I plan to use for future static site and SPA deployments. I'm pretty happy with it, and I hope you will find it useful too. It's MIT licensed, so modify it for your purposes, and if you find any issues or add useful features or options, all contributions are welcome.

This post describes the approach I've taken and the reasons for the decisions, as well as explaining some of the pitfalls associated with deploying static websites on these services, and how I have avoided them.

And yes, I could have just used Netlify, but where's the fun in that?!

Spongebob Squarepants: FUN!

# Background

Hosting a static website in the traditional way is quite simple: put the files into a directory and point nginx at it. This is a fully configurable web server, so you can apply HTTP redirects and rewrites, offload processing to CGI applications, and so on. But a virtual machine has an operating cost, and is a single point of failure for your website. Multiple VMs and load balancing can solve this problem, but the cost increases.

AWS S3 is very low cost (potentially free) for the volume required by most websites (in terms of both data storage and data transfer). It has massive availability and resilience SLAs, and it supports hosting a static website, however this has limited configurability and importantly does not support custom domains with secure transport. As the internet moves towards HTTPS everywhere, this is an essential requirement that renders S3 websites fairly unusable in the real world. (S3 websites do support HTTPS, but only the "native" AWS domain name, not with custom domains).

CloudFront is a content distribution network (CDN), which copies your content from S3 to hundreds of locations internationally, making it readily and quickly available everywhere. It also allows you to use custom domain names with SSL certificates from ACM and IAM. With Lambda@Edge, you can create any web server functionality you desire, and as it happens, you will almost certainly need to add some smarts to the CDN when you are using S3 as an origin.

There are some other, similar templates for configuring static websites:

From AWS Labs
This one is quite fully featured
This one supports index documents and monitoring
This one includes CI/CD integration via git

# Architecture

The following diagram is copied from the repository's architecture documentation.

The static website is implemented as two separate logical components: the storage of the website content, and its distribution and delivery to consumers. This separation makes some sense logically, if somewhat of an overkill for a static website. But there is a practical reason why it is required in a lot of cases: CloudFront distributions are always based in the us-east-1 region. This means that any certificates from AWS certificate manager (ACM) used by the distribution must be in us-east-1, and any Lambda@Edge functions must exist in the us-east-1 region. On the other hand, there are reasons (sovereignty requirements, policy requirements, pedantry, etc) why you might want the storage of the content to be in a different region.

If the S3 bucket where the static website's content is stored is defined by one CloudFormation template, and the related CloudFront distribution, certificates and Lambda@Edge functions are defined by a different template, then the former can be deployed to your choice of region, and the latter to us-east-1, and that is the approach I have taken.

# Storage stack

The storage stack is actually quite simple, which you can see from the CloudFormation template. It does have a few noteworthy features:

The bucket name is generated from the Service and Stage parameters
The bucket is encrypted using AES256
The bucket is protected against public ACLs and permissions
The bucket's regional domain name is exported so that the distribution is accessible immediately after creation, otherwise there is an additional propagation time to consider

Any of these or any other properties can of course be customised by editing the template.

# Distribution stack

The distribution stack is more complex than the storage stack. It creates many resources and it uses Conditions and condition functions extensively, as well as a lot of calls to Ref, GetAtt and Sub, all of which you can see in the template.

It uses the AWS::Serverless transform, so that we can make use of the AWS::Serverless::Function resource type in preference to AWS::Lambda::Function; this solves the problem of managing versions of the functions
Those serverless functions run in Lambda@Edge to provide URL rewriting and domain redirection functionality
The functions are written inline in the CloudFormation template, this avoids the need to call aws cloudformation package
The Route53 DNS records are conditionally created if a Zone ID is provided
If no subdomain is being created, then the subdomain's DNS record will not be created, and it will not appear on the SSL certificate or as an alias for the CloudFront distribution
The index document function is only attached if an index document is specified (including the default of index.html)
The domain redirection function is only attached if it is required
The execution role for the Lambda functions is only created if at least one of the functions is being attached
Both IPv6 and the redirect-to-https policy are always enabled
The logging bucket is encrypted with AES256 and has public policy creation blocked

Aside: If you aren't aware of this excellent CloudFormation cheat sheet from @theburningmonk, you should check it out, it will change your life!

# Configuration

The template is configurable, which is described in detail in the documentation.

My goal here was to make the default behaviour sensible enough, with configurability of the stuff that people might not want. For example, by default, it creates a www subdomain and redirects all traffic from the domain you actually supply to that subdomain, e.g. example.org to www.example.org. I prefer this because cookies flow down from parent domains to subdomains, so having a website on www keeps it from interfering with other domains cookies. But some would prefer the redirection the other way around, and there are definitely cases where you only want the one domain and no www (or other) subdomain. These are all supported through the DomainRedirectMode parameter.

# Challenges

If you can't use the template for any reason, but are deploying your own S3 + CloudFront solution, you might encounter some of these common issues and be interested in how I worked around them.

# The distributed nature of CloudFront

Because CloudFront is highly distributed, it takes a long time to complete operations. Creating, updating or deleting a CloudFront distribution can take upwards of half an hour. This is a well-known and well-parodied attribute of CloudFront:

“Waiting for a CloudFront distribution update to finish.”
— Corey Quinn (@QuinnyPig) January 31, 2020

For me, this was the biggest, over-arching challenge while developing this template repository: everything took a long, long time, and there is no workaround.

# The distributed nature of Lambda@Edge

Furthermore, Lambda@Edge functions cannot be deleted until after the related distribution is deleted. If you delete the distribution or disable either domain redirection or index documents, you will get an error message like the following:

An error occurred when deleting your function: Lambda was unable to delete arn:aws:lambda:us-east-1:###:function:... because it is a replicated function. Please see our documentation for Deleting Lambda@Edge Functions and Replicas

The documentation tells you to wait a while before deleting the function. In practice, it can take several hours before the function can be successfully deleted.

CloudFormation templates define Outputs, and these can be exported for use directly within other templates. However, there is a major problem with this: in order to use Exports, the two templates must be defined in the same region, because exports are stored at a regional level, so this is not applicable when our stacks are in different regions.

To work around this, we can use the AWS CLI at deployment time to retrieve those values from one stack and supply them as parameter overrides to the other. The following command retrieves the BucketName output from the example-storage stack in the ap-southeast-2 region:

aws cloudformation describe-stacks --region ap-southeast-2 --output text --stack-name example-storage --query "Stacks[0].Outputs[?OutputKey=='BucketName'].OutputValue"

If this command is run without the --query parameter, it will display a data structure describing the stack. The --query parameter value uses JMES to select the specific output value from that data structure. The --text switch displays the value as a bare string; the default is JSON.

There is a lot of good information about --query in the AWS documentation, for example in the CLI user guide and in the CLI reference, but it is difficult to find because it is a globally available parameter, so it is not described, nor even mentioned, in the documentation for any particular function. There are some more examples here.

Another approach to this problem would be to use CloudFormation custom resources to export and import the values across regions, as outlined in this article.

# Security

While a static website is not generally something with a high security risk, it is nonetheless sensible to add the basic protections.

Both the content bucket and the access logs bucket are encrypted using AES256 encryption. This is not as secure as using AWS KMS to generate a key, however I think standard AES256 encryption is sufficient for general purpose storage. If your website content is more sensitive, then you can modify the CloudFormation template to use stronger encryption.

Additionally, both of these buckets also have blocks on public access, including ACLs and policies that would allow public access. All access to the content is via CloudFront, which accesses the content using an Origin Access Identity as described above.

Something that is easy to overlook is that, when setting up the trust relationship between S3 and CloudFront, you need to create the Origin Access Identity and add a bucket policy allowing access to that OAI. The web console has a button to add this permission for you, but in CloudFormation, you have to write the policy yourself (WebsiteBucketPolicy in the storage template), or you will get Access Denied errors.

# Forget-me-nots

These are a few other things that I can never seem to remember (and which are taken care of by the code in the repository):

When you're using CI/CD (which you should be), you will need a --no-fail-on-empty-changeset flag in your aws cloudformation deploy command(s), otherwise your build will be broken
You need to add --capabilities CAPABILITY_IAM to the aws cloudformation deploy command where you are creating Lambda@Edge functions
When using AWS::Serverless::Function, you need an AutoPublishAlias, otherwise no version is available to reference, like in !Ref TheFunctionResource.Version
Lambda@Edge (currently) supports nodejs10.x but not nodejs12.x, but this is not reported as an error when you deploy through CloudFormation

Blooming forget-me-nots

# Improvement ideas

There are some things that I would like to improve about the repository:

Using the redirect-to-https policy in combination with the domain redirection Lambda function sometimes results in two subsequent HTTP redirections, which is suboptimal
There are no tests! It should be possible to prove that the deployment works by running a test that deploys (and subsequently updates) one or more stacks using npm run deploy ..., then loading the resulting website and confirming correctness, and finally cleaning up by removing the stack
It might be possible to automate the creation of the DNS records to validate the SSL certificate, which would remove the manual step; this would involve starting a second process that polls ACM for the required validation records, and then automatically adds the DNS records (probably as a third CloudFormation stack from a dynamically generated template)
Support for international character sets in domain names; in the meantime, if your domain name is failing due to the AllowedPattern defined for domain name parameters, just remove the AllowedPattern

# Spread the word!

If you want to host a static website, sign up for an AWS account, and start by following the instructions. Customise it and integrate it with your favourite CI/CD tools, and you will have a fully automated static website pipeline. Tell your friends!