I've been writing a lot about building continuous delivery pipelines with various tools and the whole pattern in the recent past. It's becoming more and more of a mainstream pattern that many development teams now follow. And the benefits are really clear, teams can deploy software faster in smaller change batches and thus more safely. Now when you deploy your services many times a day, one thing becomes a necessity in most of the cases: zero downtime deployments.

Zero downtime?

Zero downtime really means that even during a deployment process, your service is responsive the whole time. Typical services in today's world are some HTTP services, so in practice this means that no requests are dropped at any point during the deployment process. During a typical deployment process we usually change our software to a newer version on the servers.

Containers to the rescue?

This is another good example of where using containers can really make a difference. In my opinion zero downtime deployments are much more easily achieved by using containers and especially a container management platform. By all means, this is 100% fully doable without containers too. And there are lots of organisations who have pulled it off in the pre-container era.

One of the reasons why containers make this a lot easier is the fact that we can change a container running our software to a new container with a newer version of our software so easily. In many cases it's really two commands if done manually on the command line:

docker rm -f my-service
docker run -d --name my-service my_service:v2

Of course things are much more complex than that in real life. :)

Requirements for the deployment process

Let's look at the deployment process from the requirements point of view, and what it needs to do in order for us to see our software up-and-running 100% during the deployment.

Multiple instances of the software running

It's pretty obvious that in 2018 there needs to be many instances of the software running, not only for zero downtime deployment purposes but for general availability and scalability. Instance in many cases nowadays does mean a container, or a set of containers, but it could mean something else too. It could mean a virtual machine if you'd want to use those as a deployment vessel of sorts.

Rolling deployment

Running the software in many instances means that we need to do the update so that we will not bring everything down and then up with the newer version. Instead we need to run a gradual upgrade, updating the software one instance at a time. This ensures there's some instances serving requests while others are being updated.

Seamless load balancer updates

Running the software in many instances also means there needs to be some sort of load balancer in front of the service. The load balancer will distribute the incoming requests among online instances of the software we're running. During the deployment process, something or someone needs to make sure load balancing is kept in sync with the deployment process. In practice it means that when shutting down an instance, it's taken out of load balancing rotation and only put back when it's running again with the newer version of the software we're deploying. One thing is of the uttermost importance: Load balancing updates must be done in a way that the load balancer itself will be online 100% during all changes.

If you are usingHAProxy, for example, as the load balancer, it would mean that we need to leave the "old" processes running with the old config until there's no connections on those anymore.

Graceful shutdown of service instances

Perhaps the most important requirement is that your software needs to work in sync with the deployment process. In practice we need to be able to tell our software it's time to go down gracefully, not to accept any more requests. This needs to also be in sync with the load balancer, when our software is going down, load balancers should not give more traffic to it. Also we need to be able to tell our automation (more on that next) how long to wait for ongoing requests to finish up, before terminating the old instance and spinning up the new version of the software.

100% automation

There are many things that have to work as a well conducted-orchestra during the deployment, so it's pretty obvious that this whole process must be 100% automated. When we're doing multiple deployments per day, we have tens of servers and tens of instances of our software components running and there's no way a human can manage all of this.

On a high level the automated process looks something like this in pseudo-code:

trigger deployment
for_each instance
  send shutdown signal
  remove from loadbalancer
  wait for shutdown
  remove old version
  spin up new version
  wait for startup
  add to loadbalancer

One big reason to use containers is the fact that pretty much all container orchestrators have a built-in process of rolling deployment.

Example application

Let's look how all this ties in together with an example application. As usual, the whole app is available in a GitHub repo here

The app

The application is really the Simple Go app that just serves two static HTTP routes, /hello and /ping. The most interesting bit is how we handle the shutdown process in order to be able to go down gracefully and in a way that load balancers also can understand.

When the app is going down in a container, container runtimes send a SIGTERM signal to the processes. So in the app we must catch this signal and somehow make the application go down gracefully.

On the other hand, when we're going down, load balancers should not give us any more traffic and thus allow us to drain the request queue. For this reason I made the /ping endpoint which will be used by load balancers to see if the app is healthy or not. So upon receiving SIGTERM the app will start to send an un-healthy status on the /ping endpoint and thus load balancers will not give any more traffic to it.

The rolling deployment part is handled by the Kontena container platform with a configuration that ties in the application to both the deployment process and to load balancers automatically:

I'll walk you through the most interesting bits.

links: ingress-lb/lb

This tells Kontena to always configure load balancing for the application. And not only to configure it, it also keeps it in sync during the rolling deployment process.


How big a portion of the application instances, in practice containers, to keep up-and-running during the deployment. In this case it means that 80% of my instances shall be kept running during the deployment. As I've specified I want to have 3 instances, it means that the rolling deployment swaps one (round(3 * (1 - 0.8)) = 1) container at a time for a new version.


During deployment, Kontena will wait for the application instance to start listening on the given port before continuing the process to the next instance. Allows sufficient time for the app to actually boot up and to be able to serve traffic.


The defined health check also makes the load balancers to use this endpoint for checking the application status. So now when the app instance is going down, the /ping endpoint will respond with 503 - Service Unavailable status to the load balancers and thus the stopping instance will be able to drain its requests.


How long to wait between sending the app SIGTERM and SIGKILL, i.e. how much time to allow for draining all the requests.


So now with our test app handling signals properly and with proper deployment configurations for Kontena we should see 0 failed request during a deployment. Let's test this:

Forcing a deployment

We must be able to force a deployment and change of containers during the deployment so we can observe the correct behaviour. Luckily we have the needed commands available out of the box:

$ kontena service deploy --force graceful/stop
 [done] Deploying service graceful/stop      
⊛ Deployed instance demo-grid/graceful/stop-1 to node little-frost-55
⊛ Deployed instance demo-grid/graceful/stop-2 to node late-sound-19
⊛ Deployed instance demo-grid/graceful/stop-3 to node twilight-lake-92

The --force flag instructs Kontena to go and change all the containers to new ones, regardless if the image or service configuration is changed or not.

To see what happened during the deployment, we can see the service related events log:

What we see is the changing of the containers as expected.

At the same time while running the deployment, we want to make sure we do not drop any requests. For that I use a tool called ApacheBench.

The tool is running 100 request at a time, 10000 requests in total.

As seen in the results, 0 requests have failed.

Mission accomplished.


Getting your application to zero downtime deployments is a goal worth pursuing. It really enables you to do deployments multiple times a day without sacrificing your end-users' experience. To get you there, containers and container management platforms can make things a lot easier for you. You still need to make the app work in sync with the deployment process, but it's actually easier than many think.

About Kontena

Kontena provides the most easy-to-use, fully integrated solution for DevOps and software development teams to deploy, run, monitor and operate containers on the cloud. The underlying Kontena Platform technology is open source and available under Apache 2.0 license. It is used by hundreds of startups and software development teams working for some of the biggest enterprises in the world. Find out more at kontena.io.

Image Credits: Train Railway Station by Harald Landsrath.