wagl: Service Discovery for Docker Swarm

wagl is a DNS server which allows microservices running as containers on a distributed Docker Swarm cluster to find and talk to each other. It is minimalist and works as a drop-in container in your cluster.

This article is intended to describe inner workings of wagl and give a broader sense of the state of service discovery problem in today’s container clusters.

I have presented wagl and the Service Discovery topic at Docker Seattle Meetup last month and at DockerCon EU 2015 last week. I hope you find this article interesting.

wagl is open source on GitHub: https://github.com/ahmetb/wagl

If you use swarm you *need* wagl! Nice Kubernetes style, based on labels, service discovery for swarm. #dockercon https://t.co/DWAFm1oZPs
— Kelsey Hightower (@kelseyhightower) November 16, 2015

# Is Service Discovery still a problem?

Yes! I gave a talk about how we have been doing service discovery in the microservices era at DockerCon EU 2015 in Barcelona —and it surprisingly turned out to be highest rated talk of the conference.

In a nutshell, there are many methods out there and none actually solves the problem fully or reliably —neither does wagl. However, it makes several good points about how the ideal solution should look like.

Many service discovery methods people have blogged about and caused other people to deploy them to their container clusters have far too many moving parts to rely on in a production environment. More scary point is some of these components are not proven in production for high scale workloads.

Some methods I found require changes to the application code. This is the worst kind: it closely couples the service discovery concern with the service itself.

Service discovery is an infrastructure-level problem and it should not change your application code.

Some methods I came across are just too tedious to install and maintain or has a closely coupled to certain tool you need to install and maintain inside your cluster.

Various solutions I reviewed and compared in my DockerCon EU talk are:

Interlock + nginx/haproxy
registrator + consul/etcd + confd/consul-template + nginx/haproxy
DNS-based solutions: Mesos-DNS, SkyDNS
Port scanning in overlay networks with nmap

If you want to find out more about this I suggest just watching my talk.

# Motives for developing wagl

In my opinion, Docker Swarm is the most easy-to-install cluster manager out there capable of managing Docker containers at scale (as of writing). Therefore, any tool or plugin written for Swarm must have the same property.

My criteria for developing wagl were:

Application code should not change to use service discovery.
Installing the tool should be as simple as one command.
There should be no maintenance cost.
There should be no configuration files.
The tool should have good defaults and should not make people read the docs to get started with it.

If any of these does not hold true, then it means I have failed. ☺︎

# Introduction to wagl

wagl is a DNS Server. It responds to DNS A/SRV queries just like a normal DNS server and listens on port 50/udp.

wagl is specifically designed for Docker Swarm. It speaks the Docker language, understands your containers, figures out ports.

You install wagl on your Docker Swarm cluster (single command) and forget about it.

Normally, with Docker, you would run a web server container such as:

docker run -d -p 80:80/tcp nginx

However this would normally land to a random machine in the cluster and you have no easy way of finding out what IP:port this container landed on.

In wagl, you use “Docker labels” to name your microservices:

docker run -d -p 80:80/tcp -l dns.service=api nginx

and all your containers can reach to this container as http://api.swarm :80 no matter where they are. Easy as that!

If you are interested in learning more please check out the documentation.

# Installing wagl

As promised, installing wagl is an one-time operation consists of execution of a single command on Swarm Manager node(s):

$ docker run -d --restart=always --name=dns \
    -p 53:53/udp \
    --link swarm_manager:swarm \
    ahmet/wagl wagl --swarm tcp://swarm:2375

Ta-da! It’s done.

# How is it developed?

The inspiration came from reading source code of Mesos-DNS. It is a well- designed DNS server for Apache Mesos and does the job just fine. Other examples I could find, such as SkyDNS are very complicated and has a lot of features.

Perhaps I could have developed a plugin for SkyDNS, but instead I have decided to start from scratch (rarely a good idea). Fun fact is, we are all using the same Go DNS package and it is not very hard to develop a DNS server in Go, after all.

After a couple days of coding I was able to get something up and running and the code was very much functional —and that is s where I stopped, I had a minimalist DNS server that was letting me do service discovery.

The reason I started from scratch and not wrote a plugin for SkyDNS or Interlock is simply because I wanted a minimal feature set and fewer moving parts.

By minimal, I mean, really, really minimal. It just barely works and yet it is good enough to accomondate most of the use cases.

wagl - inspired by mesos-dns provides service discovery for Swarm. #dockercon pic.twitter.com/yp4YpOpmw2
— Kelsey Hightower (@kelseyhightower) November 16, 2015

# Now what?

Many experts in the area I spoke to think that Domain Name System (DNS) the right way to go for service discovery problem. It has its own shortcomings, such as connection draining, lack of port information in A records (and the fact that nobody uses SRV records), languages (Java) not obeying to TTL information of the records, and such.

I think combined with Docker 1.9 overlay networks, wagl will provide a solution that is very close to seamless and frictionless.

wagl is not meant to change the Service Discovery scene dramatically. It is more a proof-of-concept that actually works. It proves that a simple drop-in tool that utilizes DNS protocol is good enough to accomodate service discovery needs of most applications.

The service discovery problem, however, remains unsolved for many users of container clusters out there. Companies like Google, Microsoft and many others have been rolling out their own solutions and projects we see such as kube-proxy is just the tip of an iceberg.

Expect more changes in this area as this is the next problem most engineers working on containers/microservices area will tackle next.

# What is ahead?

I am hoping to develop wagl further, the project already started to get some usage and feature requests. I intend to support Docker 1.9 multi-host overlay networks out-of-the-box with wagl to solve the static-port allocation problem of DNS-based service discovery.

Combined with what is already out there and what is next for Docker Networking, I think wagl can continue to provide a good solution and with its simplicity it will be a great tool for beginners as well.

# Learn more and contribute

wagl on GitHub: https://github.com/ahmetb/wagl
wagl website: https://ahmetalpbalkan.github.io/wagl/

You can find the source code of wagl on GitHub and visit its website.

The project is a little over 1,000 lines of Go and (in my opinion) is pretty readable. Feel free to check it out and “★” the repo as well.

# Trivia: Why the name?

While I was on a quest for a name that is about bees and discovery, our very own Ross Gardler told me about the Waggle Dance of the Honeybee.

Turns out honeybees coming from foods sources to the bee hive tend to dance around by “waggling” and the shape in which they dance, combined with radius, frequency and many other parameters, is a way to tell exact location of the food source to the other honeybees.

Sounds a lot like wagl, ha? ︎☺︎