Getting started with HashiCorp Serf

In my post about Apache Mesos I briefly mentioned Serf.

Serf (from Hashicorp, who also make Vagrant and Packer) is a decentralised service discovery tool with support for custom events.

By installing a Serf agent on each node in a network, and (maybe) bootstrapping each agent with the IP address of another agent, you are quickly provided with a scalable membership system with the ability to propagate events across the network.

Once it’s installed and agents are started, running serf members from any node will produce output similar to this:

vagrant@master10:~$ serf members
master10    alive    role=master
zk10    alive    role=zookeeper
slave10    alive    role=slave
mongodb10    alive    role=mongodb

Which is when you realise you’ve still got a Mesos cluster running that you’d forgotten about…

The output from Serf shows the hostname, IP address, status and any tags the Serf agent is configured with. In this case, I’ve set a role tag which lets us quickly find a particular instance type on the network:

vagrant@master10:~$ serf members | grep mongodb
mongodb10    alive    role=mongodb

This, with its event system, makes Serf ideal for efficiently maintaining cluster node state, and reducing or eliminating application configuration.

In a Mesos cluster, it lets us make core parts of the infrastructure (like ZooKeepers and Mesos masters) simple to scale with no manual configuration.

Serf has lots of other potential uses too, some of them documented on the Serf website.

Getting started with Serf


Installing Serf couldn’t be easier.

You should be able to run serf from the command line and see a list of available commands.

Trying it out

To try Serf, you need two console windows open and you’re ready!

  • Run serf agent from one console
  • Run serf members from another

You should see something like this:

vagrant@example:~$ serf members
example    alive

That output shows a cluster containing your local machine (with a hostname of ‘example’), available at, and that the node is alive.

It’s that simple!

Starting Serf automatically

This isn’t much use on its own – we need Serf to start every time a node boots up. As soon as a new node comes online, the cluster finds out about it immediately, and the new node can configure itself.

Serf provides us with example scripts for upstart and systemd.

For Ubuntu, copy upstart.conf to /etc/init/serf-agent.conf then run start serf-agent (you might need to modify the upstart script if Serf isn’t installed to /usr/local/bin/serf).


Now we’ve got our Serf agent running, we need to configure it so it knows what to do.

You can configure Serf using either command line options (useful if you’re talking to a remote Serf agent or using non-standard ports), or you can provide configuration files (which are JSON files, loaded from the configuration directory in alphabetical order).

If you’ve used the Ubuntu upstart script, creating config.json in /etc/serf will work.

All of the configuration options are documented on the Serf website.

The examples below are in JSON, but they can all be provided as command line arguments instead.

IP addresses

This caught me out a few times – Serf, by default, will advertise the bind address (usually the IP address of your first network interface, e.g. eth0).

In a Vagrant environment, you will always have a NAT connection as your first interface (the one Vagrant uses to communicate with the VM). This was causing my agents to advertise an IP which other nodes couldn’t connect to.

To fix this, Serf lets us override the IP address it advertises to the cluster:

    "advertise": ""

Setting tags

Serf used to provide a ‘role’ command line option (it still does, but its deprecated). In its place, we have tags, which are far more flexible.

Tags are key-value pairs which provide metadata about the agent. In the example above, I’ve created a tag named role which describes the purpose of the node.

    "tags": {
        "role": "mongodb"

You can set multiple tags, but there is a limit – the Serf documentation doesn’t specify the limit, except to say

There is a byte size limit for the maximum number of tags, but in practice dozens of tags may be used.

You can also replace tags while the Serf agent is still running using the serf tags command, though changes aren’t persisted to configuration files.


You shouldn’t need to set the protocol – it should default to the latest version (currently 3).

It does, when started from the command line. But it didn’t seem to when started using upstart and a configuration directory. Easy to fix though:

    "protocol": 3

This might not be a bad practice anyway, you can update Serf on all nodes without worrying about protocol compatibility.

Forming a cluster

I’ll cover this in more detail later, but you can set either start_join or discover to join your agent to a cluster.

Scripting it

Since Serf is ideal for a cloud environment, its useful to script its installation and configuration.

Here’s an example using bash similar to the one in my vagrant-mongodb example. It installs Serf, configures upstart, and writes an example JSON configuration file.

Because its from a Vagrant build, it uses a workaround to find the correct IP.

mv serf /usr/local/bin
mv upstart.conf /etc/init/serf-agent.conf
mkdir /etc/serf
ip=`ip addr list eth1 | grep "inet " | cut -d ' ' -f6 | cut -d/ -f1`
echo { \"start_join\": [\"$1\"], \"protocol\": 3, \"tags\": { \"role\": \"mongodb\" }, \"advertise\": \"$ip\" } | tee /etc/serf/config.json
exec start serf-agent

Forming a cluster

So far we have just a single Serf agent. The next step is to setup another Serf agent, and join them together, forming a (small) cluster of Serf agents.

Using multicast DNS (mDNS)

Serf supports multicast DNS, so in a contained environment with multicast support we don’t need to provide it with a neighbour.

Using the discover configuration option, we provide Serf with a cluster name which it will use to automatically discover Serf peers.

    "discover": "mycluster"

In a cloud environment this removes the need to bootstrap Serf, making it truly autonomous.

Providing a neighbour

If we can’t use multicast DNS, we can provide a neighbour and Serf will discover the rest of the cluster from there.

This could be problematic, but if we’re in a Mesos cluster it becomes easy. We know at least one Zookeeper must always be available, so we can give Serf the hostnames of our known Zookeeper instances:

    "start_join": [ "zk1", "zk2", "zk3" ]

Or, if we get the Zookeepers to update a load balancer (using Serf!) when they join or leave the cluster, we can make our configuration even easier:

    "start_join": [ "zk" ]

We can also use the same technique to configure Serf on the Zookeeper nodes.

How clusters are formed

A cluster is formed as soon as one agent discovers another (whether this is through multicast DNS or using a known neighbour).

As soon as a cluster is formed, agents will share information between them using the Gossip Protocol.

If agents from two existing clusters discover each other, the two clusters will become a single cluster. Full membership information is propagated to every node.

Once two clusters have merged, it would be difficult to split them without restarting all agents in the cluster or forcing agents to leave the cluster using the force-leave command (and preventing them from discovering each other again!).

Nodes leaving

If a node chooses to leave the cluster (e.g. scaling down or restarting nodes), other nodes in the cluster will be informed with a leave event.

It’s membership information will be updated to show a ‘left’ state:

example    left    role=mongodb

A node leaving the cluster is treated differently to a failure.

This is determined by the signal sent to the Serf agent to terminate the process. An interrupt signal (Ctrl+C or kill -2) will tell the node to leave the cluster, while a kill signal (kill -9) will be treated as a node failure.

Node failures

When a node fails, other nodes are informed with a failed event.

It’s membership information will be updated to show a ‘failed’ state:

example    failed    role=mongodb

Knowledge of the failed node is kept by other Serf agents in the cluster. They will periodically attempt to reconnect to the node, and eventually remove the node if further attempts are unsuccessful.


Serf uses events to propagate membership information across the cluster, either member-join, member-leave, member-failed or member-update.

You can also send custom events (which use the user event type), and provide a custom event name and data payload to send with it:

serf event dosomething "{ \"foo\": \"bar\" }"

Events with the same name within a short time frame are coalesced into one event, although this can be disabled using the -coalesce=false command line argument.

This makes Serf useful as an automation tool – for example, to install applications on cluster nodes or configure ZooKeeper or Mesos instances.

Event handlers

Event handlers are scripts which are executed as a shell command in response to the events.

Shell environment

Within the shell created by Serf, we have the following environment variables available:

  • SERF_EVENT – the event type
  • SERF_SELF_NAME – the current node name
  • SERF_SELF_ROLE – the role of the node, but presumably deprecated
  • SERF_TAG_${TAG} – one for each tag set (uppercased)
  • SERF_USER_EVENT – the user event type, if SERF_EVENT is ‘user’
  • SERF_USER_LTIME – the Lamport timestamp of the event, if SERF_EVENT is ‘user’

Any data payload given by the event is piped to STDIN.

Creating event handlers

Serf’s event handler syntax is quite flexible, and lets you listen to all events or filter based on event type.

  • The most basic option is to invoke a script for every event:

        "event_handlers": [
  • You can listen for a specific event type:

        "event_handlers": [
  • You can specify multiple event types:

        "event_handlers": [
  • You can listen to just user events:

        "event_handlers": [
  • You can listen for specific user event types:

        "event_handlers": [

Multiple event handlers can be specified, and all event handlers which match for an event will be invoked.

Reloading configuration

Serf can reload its configuration without restarting the agent.

To do this, send a SIGHUP signal to the Serf process, for example using killall serf -HUP or kill -1 PID.

You could even use custom user events to rewrite Serf configuration files and reload them across the entire cluster.

Bookmark and Share

A quick introduction to Apache Mesos

Apache Mesos is a centralised fault-tolerant cluster manager. It’s designed for distributed computing environments to provide resource isolation and management across a cluster of slave nodes.

In some ways, Mesos provides the opposite to virtualisation:

  • Virtualisation splits a single physical resource into multiple virtual resources
  • Mesos joins multiple physical resources into a single virtual resource

It schedules CPU and memory resources across the cluster in much the same way the Linux Kernel schedules local resources.

A Mesos cluster is made up of four major components:

  • ZooKeepers
  • Mesos masters
  • Mesos slaves
  • Frameworks


Apache ZooKeeper is a centralised configuration manager, used by distributed applications such as Mesos to coordinate activity across a cluster.

Mesos uses ZooKeeper to elect a leading master and for slaves to join the cluster.

Mesos masters

A Mesos master is a Mesos instance in control of the cluster.

A cluster will typically have multiple Mesos masters to provide fault-tolerance, with one instance elected the leading master.

Mesos slaves

A Mesos slave is a Mesos instance which offers resources to the cluster.

They are the ‘worker’ instances – tasks are allocated to the slaves by the Mesos master.


On its own, Mesos only provides the basic “kernel” layer of your cluster. It lets other applications request resources in the cluster to perform tasks, but does nothing itself.

Frameworks bridge the gap between the Mesos layer and your applications. They are higher level abstractions which simplify the process of launching tasks on the cluster.


Chronos is a cron-like fault-tolerant scheduler for a Mesos cluster.

You can use it to schedule jobs, receive failure and completion notifications, and trigger other dependent jobs.


Marathon is the equivalent of the Linux upstart or init daemons, designed for long-running applications.

You can use it to start, stop and scale applications across the cluster.


There are a few other frameworks:

You can also write your own framework, using Java, Python or C++.

The quick start guide

If you want to get a Mesos cluster up and running, you have a few options:

Using Vagrant

Vagrant and the vagrant-mesos Vagrantfile can help you quickly build:

  • a standalone Mesos instance
  • a multi-machine Mesos cluster of ZooKeepers, masters and slaves

Unfortunately, the network configuration is a bit difficult to work with – it uses a private network between the VMs, and SSH tunnelling to provide access to the cluster.

Using Mesosphere and Amazon Web Services

Mesosphere provide Elastic Mesosphere, which can quickly launch a Mesos cluster using Amazon EC2.

This is far easier to work with than the Vagrant build, but it isn’t free – around $1.50 an hour for 6 instances or $4.50 for 18.

A simpler Vagrant build

I’ve put together some Vagrantfiles to build individual components of a Mesos cluster. It’s a work in progress, but it can already build a working Mesos cluster without the networking issues. It uses bridged networking, with dynamically assigned IPs, so all instances can be accessed directly through your local network.

You’ll need the following GitHub repositories:

At the moment, a cluster is limited to one ZooKeeper, but can support multiple Mesos masters and slaves.

Each of the instances is also built with Serf to provide decentralised service discovery. You can use serf members from any instance to list all other instances.

To help test deployments, there’s also a MongoDB build with Serf installed:

Like the ZooKeeper instances, the MongoDB instance joins the same Serf cluster but isn’t part of the Mesos cluster.

Once your cluster is running

You’ll need to install a framework.

Mesosphere lets you choose to install Marathon on Amazon EC2, so that could be a good place to start.

Otherwise, manually installing and configuring Marathon or another framework is easy. The quick and dirty way is to install them on the Mesos masters, but it would be better if they had their own VMs.

With Marathon or Aurora, you can even run other frameworks in the Mesos cluster for scalability and fault-tolerance.

Bookmark and Share

The automation what-for

Today, our developers and testers were asked to justify the use of test automation – a surprising question after we’ve invested 5 years in writing automated test cases.

The challenge was to prove the value in continuing to automate our test cases, on the basis that it should be up to scrutiny if the value really does exist.

So we tried…

  • Automated tests are repeatable and consistent
  • Automated tests and testing platforms can be easily scaled as the code base grows
  • Automated tests can be executed concurrently against many environments
  • Automated tests can provide rapid feedback on system/code changes

Of course, the same isn’t true for manual testing

  • Manual tests can be unpredictable
    • different testers may produce different results
    • testers may use workarounds to avoid some bugs
  • Manual testing isn’t scalable – employing more people is the only option
  • An individual tester can only (sensibly) test one environment at a time
  • Manual tests are slow – feedback might take days, weeks or months

But these weren’t the right answers.

We couldn’t understand why.

So, lets have a closer look at automation…

What is automation?

Automation, by definition, is:

the technique, method, or system of operating or controlling a process by highly automatic means, as by electronic devices, reducing human intervention to a minimum

Originally used circa 1940, the word was an irregular formation combining “automatic” and “action”, but process automation had become a well established practice long before then.

Everything from manufacturing and agriculture to construction and transportation – in modern history, humans have automated nearly every aspect of their lives.

On an industrial scale the benefits are immediately obvious. A farmer wouldn’t employ manual labour to plough fields any more than a car manufacturer would employ manual labour to assemble cars.

The return on investment (ROI) for the automation of these processes is clear – though of course it wasn’t in the decades leading up to the industrial revolution.

Why do we automate things?

There are many reasons to automate processes – some are purely economic while others are psychological.

  • Simple or boring tasks (paying bills)
  • Time consuming tasks (washing dishes)
  • Beyond our physical capability (lifting shipping containers)
  • To reduce cost (human labour)
  • To reduce risk (bomb disposal robots)

In all cases, our ability to automate something is limited by our mental capacity to perform that task. We can only automate the things we understand, that are simple enough and repeatable enough. We can’t easily automate tasks requiring creativity or emotion.

For example, we can easily automate opening a shop door (we could do it manually), but we would find it difficult to automate brain surgery (most humans couldn’t do it even manually) or software engineering (many have tried).

But sometimes, even when a process can be automated, we decide not to.

Why do we choose not to automate things?

Just as there are some processes we would like to automate, but can’t – there are some processes we could automate, but don’t:

  • Things we enjoy doing
  • Economic cost, e.g. R&D investment is too high or unpredictable
  • Social cost, e.g. unemployment and poverty

Like our ability to automate is limited by our mental capacity to perform a task, our ability to choose not to automate is equally limited by our physical limitations in performing a task (we wouldn’t even consider using human labour to lift a shipping container).

How we decide what to automate

Deciding whether we automate a process comes down to a cost-benefit analysis, determining if the investment required (whether an economic, physical or psychological investment) is worth the benefit we get in return.

As with all cost-benefit analysis, the time-frame over which we calculate the costs and benefits can have a considerable impact on the ROI.

For example, if Ford had only planned to make 1000 cars over a 2 year time frame, then it would be obvious that the ROI on designing, building, testing and deploying an automated car manufacturing process would be terrible, and would probably result in a net loss (or bankruptcy) for the company.

But if Ford wanted to continue producing cars – maybe another 350 million cars over a 109 years – then the ROI becomes far more appealing.

Although the up-front investment in research and development is high, the long-term benefit of this is exponentially higher, ultimately making Ford one of the world’s leading car manufacturers and forging the modern automotive industry.

Why software testing is no different

Just like agriculture and manufacturing, automating software testing comes with a high initial (and sometimes on-going) cost:

  • Developers and testers need to learn how to write automated tests
  • Test suites need to be written and maintained
  • An automated testing platform must be created

And just like agriculture and manufacturing, some of it doesn’t need automation (or can’t be automated):

  • If it’s throwaway/one-use code
  • Exploratory testing which requires creativity
  • Visual testing (does it look/feel right)

But in most cases, well written automated tests provide a level of confidence unmatched by manual testing:

  • Entire system components can be updated or replaced efficiently
  • Codebases can be safely refactored
  • Integration and release can be automated
  • Fixed defects can’t regress
  • More platforms can be tested (desktop, web, mobile, etc)

By developing an automated testing suite, testing resources can then be reallocated to more productive work:

  • Improving test coverage
  • Collaborating with developers
  • Exploratory and visual testing
  • Accessibility testing

So, what was the answer?

It certainly won’t be “because we should” or “it’s the right thing to do”, or even “it’ll reduce defects” or “it’ll improve code quality”.

It will come down to proving, through cost-benefit analysis, that the investment in automated testing provides a strong enough ROI. This will largely depend on the time frame used for the ROI calculation.

If the focus is short-term (“we want a great product now”) then any further investment in test automation will yield no value, and manual testing is the only choice.

But if the focus is long-term (“we still want a great product in 5 years”) then test automation is invaluable (supplemented with appropriate manual testing), and provably so in any cost-benefit analysis.

Is there a middle ground?

The middle ground does seem attractive:

  • Manual testing to get a quick delivery
  • Automate tests longer-term

It seems to promise a good short and long term ROI. We get our quick delivery, to an acceptable standard. We also get our test automation. And eventually we get a high quality product.

But until the test automation happens, developers are constrained by the existing codebase:

  • refactoring becomes difficult or impossible
  • updating components carries significant risk
  • minor changes, bug fixes and features take an inordinate amount of time to develop, test and deploy

This has a substantial consequence for the product or service being delivered:

  • If test automation never happens (no time is made available), the entire product will suffer and eventually adding new features or fixing bugs will become impossible
  • If test automation happens (quickly), new features will be held up while test suites are automated, delaying the creation of business value

Either way, the middle ground eventually becomes technical debt, and the short term business value gained through a reduction in the initial investment must eventually be repaid (through reduced longer-term value).

A cost-benefit analysis

Many cost-benefit analysis of test automation have already been carried out, so I’m not going to write “Yet Another Cost-Benefit Analysis” – but here’s a few links instead:


Given the historical importance of process automation throughout the industrial revolution, the rapid improvement to standards of living that we’re still benefiting from today, and the significant expansion of the human race as a result of the earliest technological automation, it seems counter-intuitive to even question the value in automating software testing.

Though I agree that some tests shouldn’t be automated (or can’t be), when products or services are expected to have a long “shelf-life”, test automation becomes the only sensible solution.

It’s also important to consider the human element in any cost-benefit analysis.

Testers and developers, like anyone else, get bored easily when tasked with simple and repetitive work. If we have the opportunity to automate this work, we leave humans with the more complex and creative work – the stuff we’re really good at, the stuff that we can’t automate, and the stuff that’s just more satisfying to do.

Bookmark and Share

Building a scalable sequence generator (in Scala)

Building a scalable sequence generator was more difficult than I’d anticipated.

The challenge

  • Build a scalable sequence generator (must scale out and provide resilience)
  • Master sequence number is stored in MongoDB, updated atomically using find and modify
  • Sequence numbers must never be repeated (but strict ordering isn’t required)

The problem

Since the sequence number is a single value stored in a single document in a single collection, the document gets locked on every request. MongoDB can’t help with scaling:

  • Starting multiple instances of our sequence generator doesn’t help, they all need to lock the same document
  • Multiple MongoDB nodes doesn’t help – we’d need replica acknowledged write concern to avoid duplicate sequence numbers

The solution

The solution is to take batches of sequence numbers from MongoDB, multiplying the scalability – for example, using a batch size of 10 means we can run (approximately) 10 instances of our sequence generator to our 1 MongoDB document, though any instance failure could waste up to 10 sequence numbers.

Using batches also dramatically improves our performance – we make far fewer MongoDB requests, generating less network traffic and reducing service response times.

The unscalable sequence generator

Building an unscalable sequence generator is easy. We can just find and modify the next sequence, MongoDB takes care of the rest.

An implementation might look a bit like this:

object UnscalableSequenceGenerator extends App {
  // the master sequence number
  var seq = 0

  def nextSeq : Future[Int] = future { blocking {
    // pretend we're doing a find and modify asynchronously
    this.synchronized {
      seq = seq + 1
  } }

  // simulate calling our HTTP service 100 times
  for(i <- 1 to 100) {
    nextSeq map { j =>
      // pretend we're doing something useful with the sequence number
      print(s"$j ")
      if(i % 10 == 0) println


Running that example produces output like this (the exact ordering of numbers may be different):

2 3 1 4 5 7 6 8 9 10 11 12 
14 13 16 17 15 19 18 21 22 20 24 23 25 
26 27 28 30 29 
31 32 34 33 36 35 37 38 39 40 41 43 42 
44 46 45 47 48 49 50 51 52 
53 55 54 56 57 58 60 59 
62 61 63 64 65 66 67 68 69 70 
71 72 74 73 75 76 78 77 80 79 81 
82 83 85 84 86 87 89 88 90 91 
93 95 92 96 97 94 99 98 100 

No duplicates, but it’s not scalable, and the performance is terrible.

Making it scalable

To make it scalable (and get a performance boost), we can use sequence number batches. But that turned out to be more difficult than I’d expected.

The first attempt looked a bit like this:

object BatchedSequenceGenerator extends App {
  // the master sequence number and batch size
  var seq = 0
  val batch_size = 10

  // our current sequence and maximum sequence numbers
  var local_seq = 0
  var local_max = 0

  def newBatch : Future[Int] = future { blocking {
    // pretend we're doing a find and modify asynchronously
    this.synchronized {
        seq = seq + 10
  } }

  def nextSeq : Future[Int] = {
    if(local_seq >= local_max) {
      // Get a new batch of sequence numbers
      newBatch map { new_max =>
        // Update our local sequence
        local_max = new_max
        local_seq = local_max - batch_size
        local_seq = local_seq + 1
    } else {
      // Use our local sequence number
      val next_seq = local_seq
      local_seq = local_seq + 1
      future { next_seq }

  // simulate calling our HTTP service 100 times
  for(i <- 1 to 100) nextSeq map { j =>
    // pretend we're doing something useful with the sequence number
    print(s"$j ")
    if(i % 10 == 0) println


While it does at least take batches of sequence numbers, we get the following unexpected but understandable output:

11 1 41 61 71 91 21 31 121 181 191 131 141 151 161 171 201 211 221 
101 81 111 51 
231 251 241 261 271 281 291 301 
311 321 331 341 351 361 371 381 
391 401 411 421 441 431 451 461 471 
481 491 501 511 521 531 541 551 
561 571 581 591 601 611 621 631 641 651 661 671 681 701 701 731 731 721 
741 751 761 791 781 771 801 811 821 831 841 
861 871 851 881 891 911 
901 921 931 951 961 941 971 
991 981

We’re only using 1/10th of each batch, and we get to 991 in only 100 requests. It’s no more scalable than the unbatched version.

It should probably have been obvious, but the problem is caused by requests arriving between requesting a new batch and getting a response:

  • The 10th request gives out the last local sequence number
  • The 11th request gets a new batch asynchronously
  • The 12th request arrives before we get a new batch, and requests another new batch asynchronously
  • We get the 11th request batch, reset our sequence numbers and return a sequence
  • We get the 12th request batch, and again reset our sequence numbers and return a sequence, wasting the rest of the previous batch

To fix it, we need the 12th request to wait for the 11th request to complete first.

Making it work

This was the tricky bit – implementing it led me down a road of endless compiler errors, but the idea was simple.

When we call nextSeq, we need to know if a new batch request is pending. If it is, instead of requesting a new batch, we need to wait for the existing request to complete, otherwise handle the request as normal.

We can do this by chaining futures together, keeping track of whether a batch request is currently in progress.

It’s a fairly simple change to our batched sequence generator (or at least, in hindsight it is):

object BatchedSequenceGenerator extends App {
  // the master sequence number and batch size
  var seq = 10
  val batch_size = 10

  // our current sequence and maximum sequence numbers
  var local_seq = 0
  var local_max = 10
  var pending : Option[Future[Int]] = None

  def newBatch : Future[Int] = future { blocking {
    // pretend we're doing a find and modify asynchronously
    this.synchronized {
      seq = seq + batch_size
  } }

  def nextSeq : Future[Int] = this.synchronized {
    pending match {
      case None =>
        if(local_seq >= local_max) {
          // Get a new batch of sequence numbers
          pending = Some(newBatch map { new_max =>
            // Update our local sequence
            local_max = new_max
            local_seq = local_max - batch_size + 1
          // Clear the pending future once we've got the batch
          pending.get andThen { case _ => pending = None }
        } else {
          // Use our local sequence number
          local_seq = local_seq + 1
          val seq = local_seq
      case Some(f) =>
        // Wait on the pending future
        f flatMap { f => nextSeq }

  // simulate calling our HTTP service 100 times
  for(i <- 1 to 100) nextSeq map { j =>
    // pretend we're doing something useful with the sequence number
    print(s"$j ")
    if(i % 10 == 0) println


And running that example generates output like this:

3 5 6 2 7 8 9 10 
4 1 13 11 12 14 15 17 19 20 
16 18 23 21 24 26 27 28 29 30 22 
25 34 35 31 33 37 38 39 40 
32 36 45 41 44 46 47 48 43 
49 50 42 52 53 55 51 60 54 56 
57 58 59 62 
64 70 63 61 65 66 67 68 69 72 75 71 
73 74 76 77 78 80 79 82 83 85 84 86 87 81 
89 88 90 92 95 93 99 98 100 97 
96 91 94 

The changes we made are straightforward:

  • When we request a new sequence number, check if a pending future exists
    • If it does, wait on that and return a new call to nextSeq
    • If not, check if a new batch is required
      • If it is, store the future before returning
      • It not, use the existing batch as normal

A limitation of this approach – if we have a sufficiently small batch size with a high volume of requests, the considerable number of chained futures could potentially cause out of memory errors.

Getting it to work felt like an achievement, but I’m still not happy with the code. It looks like there should be a nicer way to do it, and it doesn’t feel all that functional, but I can’t see it yet!

Bookmark and Share

Quick start with Perl and Mojolicious

To get started with Mojolicious, just as quick and dirty as with Scala and Play Framework, you need only these:

Once they’re all installed, its this easy:

  • Open Git Bash
  • Clone my Mojolicious/Perl vagrant repository: git clone mojoserver
  • Change directory: cd mojoserver
  • Start the virtual machine: vagrant up (might take a while, installing Perl is slow!)
  • Once it’s complete, connect using SSH: vagrant ssh
  • Create a new Mojolicious app: mojo generate app MyApp
  • Change directory: cd my_app
  • Start your application: ./script/my_app daemon
  • View your new Mojolicious site in a browser: http://localhost:3000

It installs the latest version of Mojolicious and Mango along with Perl 5.18.2 and cpanminus using Perlbrew.

To help you get started, the Vagrantfile also installs MongoDB and sets up port forwarding for port 3000 and 8080 (Mojolicious with Morbo and Hypnotoad) and port 27017 (MongoDB)

Bookmark and Share

Quick start with Scala and Play Framework

For the quick and dirty way to get Play Framework up and running, you need only these:

Once they’re all installed, its this easy:

  • Open Git Bash
  • Clone my Play/Scala vagrant repository: git clone playserver
  • Change directory: cd playserver
  • Start the virtual machine: vagrant up
  • Once it’s complete, connect using SSH: vagrant ssh
  • Create a new play app: play new MyApp
  • Change directory: cd MyApp
  • Start your application: play run
  • View your new Play site in a browser: http://localhost:9000

If you want to edit your Play project in IntelliJ Idea, create the project files from the command line using play gen-idea.

To help you get started, the Vagrantfile also installs MongoDB and sets up port forwarding for port 9000 (Play) and port 27017 (MongoDB)

Bookmark and Share

Globally handling OPTIONS requests in Play Framework

If you’re using AJAX to talk to a Play Framework application, you’ll probably need to respond to OPTIONS requests and might need to return the correct access control (CORS) headers.

In a controller, we can easily define a handler to accept OPTIONS requests:

def headers = List(
  "Access-Control-Allow-Origin" -> "*",
  "Access-Control-Allow-Methods" -> "GET, POST, OPTIONS, DELETE, PUT",
  "Access-Control-Max-Age" -> "3600",
  "Access-Control-Allow-Headers" -> "Origin, Content-Type, Accept, Authorization",
  "Access-Control-Allow-Credentials" -> "true"

def options = Action { request =>
  NoContent.withHeaders(headers : _*)

And we can call our new options handler from our routes file, but this has a few problems. We either need to implement an options handler for every route, or we send the same response whatever route we have, even if it doesn’t exist.


If you want to respond on a per-route basis, that typically requires one additional line in your routes file for every route you define:

GET / controllers.Application.index
OPTIONS / controllers.Application.options


Or, if you don’t mind sending the same headers back for every OPTIONS request (even if the route doesn’t really exist), there’s a cheat:

OPTIONS / controllers.Application.rootOptions
OPTIONS /*url controllers.Application.options(url: String)

and change your controller options handler to:

def rootOptions = options("/")   
def options(url: String) = Action { request =>
  NoContent.withHeaders(headers : _*)

You can still override the global OPTIONS per-route by adding additional routes before the wildcard, for example:

OPTIONS /foo controllers.Application.someCustomOptions
OPTIONS / controllers.Application.rootOptions
OPTIONS /*url controllers.Application.options(url: String)

Or we can abuse Play Framework – the best way!

Play Framework doesn’t like to expose its routing, making it difficult to inspect the routing table once its been created. But it is possible! Doing that, we can globally handle OPTIONS requests but dynamically respond based on URL (or even other request parameters).

For this example, we’ll work out the Allow header so we can return a 204 response if the route would normally exist, but a 404 response if it wouldn’t.

This is the example routes file:

GET /           controllers.Application.index
GET /foo
OPTIONS /       controllers.Application.rootOptions
OPTIONS /*url   controllers.Application.options(url: String)

When sending OPTIONS requests, we want to respond with 204 and Allow: GET, OPTIONS for / and /foo, but respond with 404 for everything else.

Getting the methods available for a URL

Play Framework gives us a convenient function – handlerFor – which is normally used to route requests to a handler. For this to work, you’ll need to add an import:

import play.api.Play.current

We can then define a getMethods function, which given a request will return a list of available methods. It does this by asking Play Framework to route new requests with modified method parameters. If a handler is found, the method is added to the list. The list is also cached for future requests.

val methodList = List("GET", "POST", "PUT", "DELETE", "PATCH")
def getMethods(request: Request[AnyContent]) : List[String] = {
  Cache.getOrElse[List[String]]("options.url." + request.uri) {
    for(m <- methodList; if Play.application.routes.get.handlerFor(new RequestHeader {
      val remoteAddress = request.remoteAddress
      val headers = request.headers
      val queryString = request.queryString
      val version = request.version
      val method = m
      val path = request.path
      val uri = request.uri
      val tags = request.tags
      val id: Long =
    }).isDefined) yield m

We can then update our options action to use the new method list:

def options(url: String) = Action { request =>
  val methods = List("OPTIONS") ++ getMethods(request)
  if(methods.length > 1)
    NoContent.withHeaders(List("Allow" -> methods.mkString(", ")) : _*)

We add OPTIONS back in, and if we have more than one method we return the Allow header, otherwise a 404 response.

We could instead cache the entire response for a given URI, but caching just the method list gives us the flexibility to set other headers which may be more dynamic, for example Last-Modified. Even the current caching might be too restrictive if the available methods depends on other request parameters.

Bookmark and Share

Action composition in Play Framework

Action composition in Play Framework is an incredibly powerful way to enhance or restrict controller behaviour, for example to implement authentication or authorisation controls, set default headers, or handle OPTIONS requests.

But typical action composition can be messy. Using action builders, we can simplify the process – and you may have already used them without realising it!

You’ve probably seen code like this before, it’s pretty standard stuff:

def index = Action { request =>
  Ok(views.html.index("Your new application is ready."))

And if you’ve used Play Framework asynchronously, maybe something like this:

def index = Action.async { request =>
  doSomething map { result =>
    Ok(views.html.index("Your new application is ready."))

You can also easily parse the request using a different content type (or “body parser”), for example using JSON:

def index = Action.async(parse.json) { request =>
  doSomething map { result =>
    Ok(Json.obj(result -> "Your new application is ready."))

All of these use the Action action builder (that is, the Action object, which is an action builder).

By creating a new action builder, we can create a drop-in replacement for the Action calls (both Action and Action.async), while still supporting the body parser parameter.

Creating a new action builder

Since Action is just an implementation of ActionBuilder[Request], we can extend ActionBuilder to use in place of Action.

ActionBuilder requires that we implement invokeBlock, and that’s where the magic happens. This is a bare minimum implementation, and its exactly what Action does for us already.

invokeBlock takes two parameters, the first is the incoming request, and the second is the function body, taking Request[A] as a parameter and returning a Future[SimpleResult]

object Interceptor extends ActionBuilder[Request] {
  def invokeBlock[A](request: Request[A], block: (Request[A]) => Future[SimpleResult]) = block(request)

It doesn’t do much (in fact, nothing different from Action), but now we can use that in our controller instead:

def index = Interceptor.async(parse.json) { request =>
  doSomething map { result =>
    Ok(Json.obj(result -> "Your new application is ready."))

And it works using the same syntax:

def index = Interceptor { request => Ok }
def index = Interceptor.async { request => future { Ok } }
def index = Interceptor(parse.json) { request => Ok }
def index = Interceptor.async(parse.json) { request => future { Ok } }

Intercepting requests

There’s many reasons to intercept a request before it reaches your controller – authentication, authorisation, rate limiting or performance monitoring – for this example, we’ll use authentication.

There’s also many ways to authenticate a request – using headers, cookies, etc. – and while this is one way you certainly wouldn’t do it, it works for a demo:

object Authenticated extends ActionBuilder[Request] {
  def invokeBlock[A](request: Request[A], block: (Request[A]) => Future[SimpleResult]) = {
      future { Results.Status(Status.UNAUTHORIZED) }

This very simple example checks for an Authorization header.

If it’s there, it calls block(request) and request processing continues as expected (don’t confuse the word “block” to mean the request gets blocked, we’re actually executing the code block or function body we were passed earlier).

If the Authorization header isn’t found, it returns an Unauthorized (401) response, using Results.Status().

At this point we could have returned any Future[SimpleResult] we like. We could look up data in memcached, MongoDB or call a remote API using OAuth2 – and either let the request continue, or return an appropriate response instead.

But this isn’t ideal – we’ve got our action builder sending a response to the client. We need to pass that responsibility back to the controller.

Passing user context

So far, we’ve intercepted the request using action composition, but once we get to the controller code we don’t know who the user is, and no way to find out if we couldn’t establish the users identity.

There’s plenty of ways to fix this – some of them documented in the Play Framework action composition documentation – but we’ll go with wrapping the request class.

This has the advantage of keeping all existing code ‘compatible’ – we can simply search and replace Action for Authenticated and every endpoint is protected.

For a comparison, here’s what the controller would have looked like if we’d wrapped the action:

def index = Authenticated { user => 
  Action { request =>

And here’s what we’re going to create with our custom request class:

def index = Authenticated { request =>

Wrapping the request class

To wrap the request class, and be able to access the user object from our controllers without casting the request first, we need to create a new trait and a new object.

The trait AuthenticatedRequest simply extends Request and adds a user value:

trait AuthenticatedRequest[+A] extends Request[A] {
  val user: Option[JsObject]

The object AuthenticatedRequest is similar to play.api.mvc.Http.Request – except we copy the existing request, and add the user value:

object AuthenticatedRequest {
  def apply[A](u: Option[JsObject], r: Request[A]) = new AuthenticatedRequest[A] {
    def id =
    def tags = r.tags
    def uri = r.uri
    def path = r.path
    def method = r.method
    def version = r.version
    def queryString = r.queryString
    def headers = r.headers
    lazy val remoteAddress = r.remoteAddress
    def username = None
    val body = r.body
    val user = u

Next, we need to change our call to block(request) to pass through our new AuthenticatedRequest object.

To make this work, we also need to change some of the Request types to AuthenticatedRequest in our Authenticated object. We’ve also let the request continue even without a valid user – we can use this from a controller to know the users identity couldn’t be established.

Here it is in full:

object Authenticated extends ActionBuilder[AuthenticatedRequest] {
  def invokeBlock[A](request: Request[A], block: (AuthenticatedRequest[A]) => Future[SimpleResult]) = {
      block(AuthenticatedRequest[A](Some(Json.obj()), request))
      block(AuthenticatedRequest[A](None, request))

Notice that invokeBlock still expects a Request[A] as its request parameter, but now the block parameter defines a function with a AuthenticatedRequest[A] parameter instead.

For now, our user is nothing more than an empty JsObject. In a real application, it could be any type (not just JsObject), and the value for user could come from anywhere (e.g. a database, session, OAuth2, etc).

When authentication fails, we pass through None, letting the controller know that no user could be found.

From our controller, we can now access the user object with request.user:

def index = Authenticated.async { request =>
  future { Ok(request.user.get) }


So far we’ve seen how to require authentication using an action builder. While this is useful if your app uses an ‘all or nothing’ security model, this isn’t particularly useful for authorisation, for example in role based security.

There’s two easy ways to solve this problem:

Create another action builder to wrap our Authenticated builder

We can wrap our Authenticated action builder with another builder, giving us controller code that might look something like this:

def index = Authorised(roles = List("")) { request =>

This is very straightforward, but creates an unnecessary dependency between your authorisation and authentication code.

Use normal action composition to require authorisation

Using action composition, we might end up with code that looks like this:

def index = Authenticated { request =>
  Authorised(roles = List("")) {

And we can also do this:

def index = Authorised(roles = List("")) { request =>

The downside to normal action composition is obvious once you start using different body parsers or asynchronous operations:

def index = Authenticated(parse.json).async { request =>
  Authorised(parse.json).async(roles = List("")) {
    future { Ok(request.user.get) }

On every nested action we are required to redeclare the body parser and call async.

But this is Scala – there’s a nicer way!

Neither of those examples are particularly suitable. Neither will cleanly handle a negative outcome (either needing some messy code or hiding the unauthorised response away in a helper class), and neither of them nest well.

Whenever you need to check authorisation, the outcome is normally one of two things – in the case of a web application, its likely that both of them will end in returning some content to the user.

In true MVC style, this shouldn’t be the responsibility of the authorisation code. It should be in the controller.

We could just use an if/then/else statement, but I like something a bit cleaner:

def index = Authenticated { request =>
  Authorised(request, request.user) {
  } otherwise {

And our Authorised implementation is simple. We provide Authorised and Authorised.async functions, and return an instance of our Authorised class providing an otherwise method.

object Authorised {
  def async[T](request: Request[T], user: Option[JsObject]) = {
    (block: Future[SimpleResult]) => new Authorised[T](request, user, block)
  def apply[T](request: Request[T], user: Option[JsObject]) = {
    (block: SimpleResult) => new Authorised[T](request, user, future { block })

class Authorised[T](request: Request[T], user: Option[JsObject], success: Future[SimpleResult]) {
  def authorised = {
    if(user.isDefined) true else false
  def otherwise(block: Future[SimpleResult]) : Future[SimpleResult] = authorised.flatMap { valid => if (valid) success else block }
  def otherwise(block: SimpleResult) : SimpleResult = if(authorised.value.get.get) success.value.get.get else block

As with the Authentication action builder, this implementation isn’t particularly secure. As long as the user is defined (which it will be if the Authorization header is set), then authorisation is successful.

In this example we’ve also passed the request object to the authorisation layer. It would be cleaner to abstract the request from our authorisation code using named roles or permissions.

Action builder vs helper object

There’s a reason we’ve created authentication as an action builder but kept authorisation as a helper object.

Authentication is a one-time process (for each request) – it rarely needs to be done multiple times, and it rarely changes once the request has been received. Even if we can’t establish the identity of the user, that doesn’t necessarily mean authentication has failed or that the user has no permissions.

Authorisation can happen multiple times within one request, in one controller, and with distinct outcomes – for example, rendering a dashboard might perform multiple unique authorisation checks to determine appropriate components for a single page.

By keeping the authorisation check as a helper object, we can use it mutliple times within a controller action, passing different parameters each time, and with the flexibility to control the outcome of each check.

For example, we could return NotFound for one level of authorisation failure, but Unauthorized for another:

def index = Authenticated(parse.json) { request =>
  Authorised(request, request.user) {
    Authorised(request, Some(Json.obj())) {
    } otherwise {
  } otherwise {


We’ve now created an authentication and authorisation layer for our web application, which supports the same syntax as the built in Play objects. For a simple application, that’s all you need – just wire in MongoDB or OAuth2!

The same idea can be applied to any other type of wrapper. You can even create action builders and helper objects which combine other wrappers, for example automatically applying authentication, authorisation and rate limiting through a single builder.

Going a bit further with Akka

While it would be easy to extend that example to lookup users, roles and permissions in MongoDB, or restrict actions based on IP address, in Scala (and Play Framework) we can do things a little differently.

There’s no reason for authentication or authorisation to be tied to your application – they aren’t really a direct responsibility of your web application anyway, but they’re often left there for convenience.

Akka provides us with a framework to build distributed and concurrent applications – and we can keep our application concurrent and distributed right through to the authentication and authorisation layers. And better yet, it’s already used internally by Play Framework.

We’ll extend this example to use an actor for authentication and authorisation, giving us the flexibility to move our authentication code anywhere – even to a remote service with Akka remoting.

Creating an actor

For now, we’ll just keep our actor next to our action builder. Moving it around later is easy.

First we’ll create some basic case classes to communicate using Akka:

case class Authenticate[A](request: Request[A])
case class AuthenticationResult(valid: Boolean, user: Option[JsObject] = None)

We can extend these classes if we want to provide additional context or return additional information.

Here’s our basic actor – it implements a receive function to handle incoming messages:

class Authenticator extends Actor {
  def receive = {
    case Authenticate(request) => 
        sender ! AuthenticationResult(valid = true, user = Json.obj())
        sender ! AuthenticationResult(valid = false)

It’s only job is to return an AuthenticationResult depending on the request. It provides the same super strength foolproof security we had with our earlier example.

Using the actor for authentication

We need to get an instance of our actor for our Authenticated object. We’ll send Authenticate requests to the actor, and get an AuthenticationResult back:

val authenticationActor = Akka.system.actorOf(Props[Authenticator], name = "authentication")
implicit val timeout = Timeout(1 second)

We can now update our invokeBlock function to use the actor instead of including its own authentication logic. If authentication was successful, the actor returns the JsObject representing the user.

def invokeBlock[A](request: Request[A], block: (AuthenticatedRequest[A]) => Future[SimpleResult]) = {
  (authenticationActor ask Authenticate(request)).mapTo[AuthenticationResult] flatMap { result =>
      block(AuthenticatedRequest[A](Some(result.user.get), request))
      block(AuthenticatedRequest[A](None, request))
  } recover {
    case e => Results.Status(Status.INTERNAL_SERVER_ERROR)

We can extend our authorisation class in exactly the same way, again sending the authorisation request to an actor. I’ll skip it here since the code is so similar to the authentication actor, but you can find it in the full source code on GitHub.


Action composition in Play Framework is surprisingly easy, and very powerful.

Using action builders combined with Scala’s concurrency and Akka’s distributed framework, it’s simple to keep your controller code clean without sacrificing security or scalability across your application.

Full source code for the Akka example can be found on GitHub.

Bookmark and Share