Friday deploys: comfort, not pressure

Ben New in Perth, Australia

2020-02-21

# "No deployments please, it's Friday!"

There has been a lot of talk on Twitter recently about Friday deploys; in particular, whether they should be avoided or embraced. A lot of people assert that deploying software on a Friday should be avoided at all costs:

This Valentine's day, show yourself and your coworkers some love -- don't push to prod on a Friday.

--This message brought to you by all the folks that have had to clean up a Friday deploy mess that evening and/or that weekend.
— whiskers (@initinfosec) February 14, 2020

Others are emphatic that deploying on any day should not be a problem, given the right environment:

I can't believe we are still discussing Friday deploys at this point. Deploy constantly, release safely when it's ready.

DEPLOY != RELEASE

If deploying is scary, your next steps are Continuous Delivery + Feature Flags. It's 2020, none of this is experimental.
— xsb (@xavi_xsb) December 24, 2019

Overall there is a lot of debate on the topic:

Don't deploy on Friday afternoon. EVEN if you

- pair program
- use #TestDrivenDevelopment
- integrate continuously
- deploy daily
- backup daily
- can roll back any deployment

Just don't jinx it!

cc @AliceMacs
— Phlip (@Pen_Bird) February 14, 2020

There is even an account, Curator of Friday Deploys, which is dedicated to the topic.

# Why not on Fridays?

This widely-held aversion to deploying on a Friday comes from an understandable place: new code means new bugs, and nobody wants to work over the weekend to fix those bugs, which will nonetheless be affecting users' experience of the product and ultimately, the company's bottom line.

Office Space movie quote: "I'm gonna need you to come in tomorrow, and if you can come in on Sunday too, that'd be great"

Indeed, avoiding this seems pretty reasonable and it is quite natural, but imagine if we solved all of life's problems by completely avoiding the situation that enables it:

We could prevent car crashes by walking everywhere...
We could avoid getting sick by living in a bubble...
We could never create any software bugs by never writing any code...

Safe, but not very practical! Risk management is about understanding your risks and controlling them, not always avoiding them.

So, let's look at some ways that your team can become comfortable with deploying, releasing and operating software, which will improve not only your time to market, but your software's quality, reliability and user experience. The fear of Friday deploys can become a thing of the past.

# Make yourselves comfortable

This is a challenging transition for many teams, because deployments are seen as "big and scary". In many cases, they actually are! As a consultant, freelancer and developer, I have worked with and for organisations at various levels of maturity concerning their software delivery. The teams that have embraced more of the concepts and solutions described in this post are the ones I have found to have the most robust software delivery.

Many of the topics discussed below are interrelated, and changing people's mindsets about this can be very challenging because these linked ideas can form a rigid grid of beliefs. A lot is based on people's past experiences, which are certainly valid and should never be dismissed.

It is important to keep in mind that the goal is to enable teams to feel comfortable deploying at any and all times. The emphasis of some of the conversations on social media has been around people feeling forced to deploy on Fridays, which should never be the case and could certainly be a sign of a toxic workplace. Besides, it's not actually about Fridays.

In that light, treat all of this with an air of experimentation. Two rigid mindsets will only ever conflict where they don't align. Remember that every situation is different, every team is different, and every piece of software is different. The goal is for the team to find for itself what works best for it to deliver functionality painlessly and with a focus on the functionality, not on the act of delivering it.

# Automate deployments

The first step on this journey is to fully automate your software delivery pipelines. This is known as continuous integration and continuous delivery (CI/CD) and is a well-understood practice that should already be enforced across all software for which your team or organisation is responsible (there are many classes of software that have more difficult delivery requirements, to which this might not apply, e.g. embedded software).

CI/CD can take various forms, but it always has one thing in common: the process of building, packaging and deploying software is handled by software and never by humans. The benefits of this typically include:

faster deployments
less effort over the lifetime of most software
effort is spent early in the lifetime of the software
automated quality checks are built in
errors are almost completely eliminated once it is configured

This is a far-from-complete list, and there are many other articles on the subject if you are not sold on the benefits of CI/CD.

Perth-Kalgoorlie water pipeline, near Merredin (CSIRO)

# Automate infrastructure

As well as automating software deployments, you might be able to take advantage of Infrastructure as Code (IaC) to define the compute and storage resources it requires to run, as part of your code repository. In particular, the major cloud vendors provide IaC services, and there are third party tools like Terraform, which works across cloud platforms and container infrastructure. Using IaC automates not only the software itself, but the resources it uses as well, and by integrated it into your CI/CD pipelines, it is fully automated.

# Automate testing

Manual testing has natural limitations in terms of scalability and accuracy, so test automation is well known to be desirable. The key is to automate the testing of the critical functionality of your software, where key business decisions are made, and where value is derived.

Every different type of software has its own challenges in regards to testing. Use integration tests when external systems are critical. Use device testing services where compatibility with many devices is necessary. Use whatever is appropriate.

Your tests should be part of the CI/CD pipeline, along with other code quality tools such as linting, and it should not be possible to accept the code unless the build passes. Many code repositories support protecting branches in this way.

# Automate to production

Most implementations of CI/CD allow for manual decision points, and the majority of teams and organisations that I have come across use this as a gate before production, even if development and test deployments are completely automatic. They will inevitably say:

We want to perform (manual) testing before code reaches production, to make sure that everything works.

"None may pass without MY permission!"

This practice is entwined with the notion that deployments are "big and scary" — if it is perceived this way then organisations will want to ensure it works by throwing QA effort at it, but that effort adds to the very notion of it being "big and scary"! This is circular logic that can be difficult, yet important, to break.

Furthermore, there are whole sets of problems that will never be preemptively tested for by developers, or at least, doing so across the board is not cost-effective, and so these problems will only ever be discovered in production with real, live, production workloads. I have seen teams spend a lot of effort simulating production usage when they have real production usage happening every minute of the day!

So, in a CI/CD setup that is optimised for team performance, code pushed to master (the "main" or "trunk" version of your code) should end up in production, completely automatically and without intervention, and with as little delay as possible (watch out for a billion layers of caching here!).

To those who want to test every build before it is pushed to production, I say:

Deploying continually, with the right environmental support and team buy-in, will unequivocally improve your software and its delivery in the long term.

It might seem counter-intuitive that removing manual "quality gates" will result in higher quality software, and this is one point that will often be met with the most resistance.

However, by directly relating the trunk and production, your development team will take a lot more ownership of the code they ship and the user experience that it presents. The trunk or master is no longer just their "best effort" at getting it right, "but it's okay because bugs can be picked up in test" — that code is in so it had better not crash people's devices or browsers! Testing (both manual and automated) is no longer a separate function, but a continual process.

Under this arrangement, deployments (to production) are more frequent, so the change between each one is smaller. This means that the risk for any given deployment is minimised. Essentially, you're taking that risk of the Friday afternoon deployment, and spreading it thinly across many deployments throughout the week.

Again, remember that this is a journey: your team is not going to move overnight from one deployment a week to multiple automatic production deployments a day. Encourage the team to deploy more frequently at first, understanding that it's totally fine to push only small changes to production. Once the team is pushing most or all builds into production anyway, the manual gate is redundant and can be removed.

But, in order for this to work, you will also need some additional technical tools, which can be introduced over time.

# Separate your users from your code

The keen reader will notice that I have only written software deployment and have not yet mentioned software release. These are not the same thing, they are two separate parts of software delivery:

Deployment is the process of moving code into live environments (e.g. production)
Release is making the features that the code implements available to users

Think of it like the launch of the next generation of smartphone: the shops will have boxes of the new phone model freighted (deployed) to back rooms, but as a customer you don't even know about that until it is released.

A mistake that a lot of teams make is coupling deployment and release. Can you imagine if there was no back room, and the new phone had to go straight from the truck onto the shelves? Chaos! Don't do this with your software!

So another technique you should familiarise yourself with is feature flags. This enables you to safely deploy code at any time to any environment without introducing any behavioural change to the software. You can then switch that feature flag on and off to change the software's behaviour, completely independently of any code deployment. In fact, you can release functionality to only a small number of users as a pilot phase first, and if it causes problems or errors, simply switch it off, all without deploying any code. Then, deploy your changes and switch it back on again.

The key benefit is that you decouple the technical process of deploying code from the business process of putting functionality in the hands of users. Services such as Launch Darkly provide a very non-technical user interface for controlling feature flags.

Of course there is work involved in using feature flags, and the team should be prepared for that. Depending on the nature of your application and the nature of the features you are developing, using feature flags can be very easy through to quite challenging, but the concept is very straightforward.

# Modern architectures

Traditionally, line-of-business software has been centralised and monolithic, residing on a cluster of servers. The code often contains entire business processes in a single codebase.

As these systems have become more and more complex, the problems with this approach have become apparent in terms of limited scalability and overly complex code. This has led to the emergence of approaches like microservice architecture, which describes a software system as a set of discrete, loosely coupled, well-encapsulated components.

This has many advantages, one of which is that complexity is lifted from the code level to the architectural level, which exposes that complexity making it easier to reason about. Concurrently, the individual services are each more simple than the large monolithic system, so they are each also relatively easy to reason about.

Another positive attribute of this approach is that the units of software being deployed are smaller; instead of being a single, large application that needs to be deployed in full whenever any element changes, it is separate units of code that can be deployed and operated independently. This further reduces the potential delta for any given deployment.

Again, by having smaller deployments, the risk is spread more thinly, and the development team is closer to their changes. Put more simply, it's easier for someone (regardless of day or time) to be comfortable deploying a change to a smaller component (e.g. a microservice) than a larger one (e.g. a monolithic application). Not only is the overall risk likelihood reduced, the blast radius is potentially also much smaller.

# Knowledge is power

Despite its benefits, this component-oriented architectural style introduces an additional level of complexity: although each component is simpler, reasoning about these disparate components as a single system is more challenging because there are more moving parts. With a monolith, you can see the whole "machine" in one picture; with a distributed architecture, the overall behaviour of the whole system is more difficult to see.

This is where observability comes in. Its basic tenet is to use a correlation ID to track events (or sequences of events) through their entire journey across your software's landscape, and to record as much information about that event as possible. This data is then stored in raw form and indexed in a number of ways making it easy to discover, collate and reason about.

Charity Majors is the CTO of Honeycomb and describes it this way:

"Why observability? Why *now*?" 👇basically this.

Your process-level debugger no longer works now that functions hop the network. Observability solves this by packing context up with the request as it hops from process to process, forcing an fsync after every successful hop. https://t.co/UJkZqLvk3q
— Charity Majors (@mipsytipsy) January 30, 2020

Note that "observability" explicitly does not mean "monitoring", which is focused on what is happening now, or has just happened; it has tactical objectives. Observability is a much more strategic proposition and looks at trends and outliers as well as allowing deep dives into specific events.

There is a lot more to modern software observability that I won't go into here; I recommend checking out Charity's blog and following her if you are interested in the topic. The key is that by gaining a better understanding of what is really happening inside your software, your development team are in a much better position to continually improve it.

# Making it happen

These elements are all highly inter-related, and shifting thinking about this will be challenging to a lot of organisations. Yet it is possible to do in small pieces. Not every recommendation in this post will make sense for everyone, and there are many variations and totally different approaches to solving these problems. Again, these kinds of changes should be done with an experimental mindset and everyone, both business and technical, should be included on the journey in their own capacity.

Start by making small changes to improve your CI/CD pipeline, or use feature flags to allow business to release a particular feature independently of the release cycle, or improve the quality of your tests. Every small step counts. If can demonstrate a robust pipeline and the benefits of modern tools in small measures, then the organisation is more likely to accept continuous deployments as a goal.

Try things out in areas where the likelihood of success is high and the risk is low, rather than trying to crack full automation of the 20-year legacy software as a starter!

Most importantly, try to get high-level buy in. If you're a CIO, then perfect, you're in the best position to make this happen. If you aren't a C-level, then talk to one, or your development manager or team leader:

Show the value of these techniques where and how you can
Get buy-in from your peers, team members, product owners, and anyone else involved
Push the message up the organisation chart, it will roll back down
Be persistent!

One client that I have worked with already had automated pipelines (with manual gates), but they initially had a two-week release cycle that was coupled to their Scrum process, and at the end of every sprint was a two-day manual testing exercise that involved the entire team, and then "the deployment" of the fortnight's final build into production. This is at the extreme end but within the typical range of cadences that I have come across.

It was a team effort that took many months, and involved C-level direction-setting, but we eventually moved them to smaller, more frequent deployments, with the use of feature flags to separate business and technical processes. Other changes were made around team structure and areas of responsibility, and together they have seen their delivery rates increase with fewer incidents with delivery. They are still on a journey, but they have already seen improvements in other areas such as developer engagement with the user experience, and fewer error reports in production.

# Conclusion

As I mentioned at the top of this post, the key here is to make people feel comfortable deploying any time and all the time. Pushing people to step outside of their comfort zone is alienating and unnecessary; better to give them the safe environment and encourage them to try for themselves.

I have outlined some broad practices and tools that will set your team or organisation on the path towards this goal: CI/CD, automated testing, feature flags, architectural styles and observability.

The goal of all of this is to bring the development team closer to the product, making small changes that land in production with high frequency, and watching and improving those changes with high agility due to the removal of unnecessary process coupling and the correct use of the right tools.

When this becomes the standard way of working, a discussion about whether or not we should deploy on Fridays will seem unnecessary.

"It's Friday, Friday, gotta deploy on Friday"

All links are provided for information reasons; I have no financial or commercial relationship with any of the companies or people mentioned.