Ido BarkanUsing vagrant and fabric for integration tests

At cloudshare, our service is composed of many components. When we change the code of a given component we always test it. We try to walk the thin line of balancing the amount of unit tests and integration tests (or system tests) so that we can achieve reasonable code coverage, reasonable test run time and, most importantly, good confidence in our code.

Not long ago, we completely rewrote a component called gateway. The gateway runs on a Linux machine, and handles many parts of our internal routing, firewalling, NAT, network load balancing, traffic logging and more. It’s basically a router/firewall that gets configured dynamically to impose different network related rules according to the configuration it knows about. This is done by reconfiguring its (virtual) hardware and kernel modules. It is a python application packaged and deployed using good old debian packaging.

Before the rewrite, the gateway was ‘frozen’. By frozen I mean that nobody had the guts to change the code. It had no testing code, hence each change required a full manual, painful regression testing which could take a week at best.
We sat down and defined our goals. We wanted that all the developers can would be able to ran all integration tests locally on their machine, easily. Easily means that they don’t need to deploy anything after they change the code. Just edit it in the IDE and rerun the tests. No committing of the code, no repackaging, no deployment (we develop on windows machines)/
When running tests is not easy, you know what happens.


Improving integration testing:

We already knew that we need to improve our unit testing. But integration testing? That’s a different story. How do you test what you configured in the hardware and kernel and ensure this accomplishes the network magic you asked it to?
Let’s think about how we would do it manually. Easy- use the whole assortment of networking tools Linux provides: ping, traceroute, tcpdump, netcat etc. Naturally, this is what is our QA engineers would do:
Deploy the new code.
Create a testbed composed of several machines connected to a gateway.
For every possible configuration the gateway gets test the entire networking features of what is supposed to flow / be blocked / got through NAT / routed etc.
This is quite a nightmare.
I would even dare to say: No QA engineer, as talented as he may be, could possibly cover all scenarios without leaving some areas uncovered or running out of (a reasonable amount of) time. Not to mention that we expect them to be a little more creative than just testing for regression.  This is exactly the work for a machine, not a person.


We finally picked up the gauntlet and started thinking how this was possible. Vagrant, which was fairly new back then, came as a natural fit. It allows us to create an environment consisting of virtual machines connected by different virtual LANs. Vagrant also allows you to mount your host folders directly into the virtual machines you manage and that also fulfilled our ‘testing easily’ requirement. If the code is already mounted on the vagrant vm, there is no need to deploy it.

Here is the vagrant file, defining our virtual environment: do |config|
    config.vm.define :gateway do |gateway_config| = "gateway"
        gateway_config.vm.host_name = "gateway"
        gateway_config.vm.box_url = "http://FQDN…./" :hostonly, "", { :adapter => 2, :netmask => '' } :hostonly, "", {:adapter => 3, :auto_config => false}
        gateway_config.vm.share_folder "code", "/code", "../../..", :mount_options => ["dmode=755", "fmode=755"]

    config.vm.define :tester1 do |config| = "tester"
        config.vm.host_name = "tester1"
        config.vm.box_url = "http://FQDN…../" :hostonly, "", {:adapter => 2, :netmask => '' } :hostonly, "", {:adapter => 3, :auto_config => false}

        config.vm.share_folder "code", "/code", "../tests"

    config.vm.define :tester2 do |config| = "tester"
        config.vm.host_name = "tester2"
        config.vm.box_url = "http://FQDN…./" :hostonly, "", {:adapter => 2, :netmask => '' } :hostonly, "", {:adapter => 3, :auto_config => false}

        config.vm.share_folder "code", "/code", "../tests"
… more testers machines defined here ...


As you can see the local sources are mounted into /code in the vagrant virtual machines. There is also networking defined here. One network is used as the ‘physical’ network the integration tests use to configure VLANs on (notice the :auto_config => false option) and the other is for test code communication.

What happens when the developer runs a test?
The test actually runs on the gateway virtual machine. It uses the locally mounted sources to create the application objects, calls them, and then using fabric it runs networking tools remotely on the ‘tester’ machines to ping/sniff/trace/accept all sorts of traffic which travels back and forth through the gateway machine.

a simple test example, simplified a lot:

class TestVlansBase(unittest.TestCase):
    def setUp(self):

    def tearDown(self):
        for tester_name, vlan in self.dct_interfaces_to_remove.iteritems():
            self._remove_interface_from_host(tester_name, vlan)

class TestVlans(TestVlansBase):
    def _test_connection(
            self, server_name, server_vlan, server_ip, server_port, protocol, client_name,
            client_dst_ip=None, client_dst_port=None):
        self.assertTrue(protocol in ('tcp', 'udp'), 'protocol should be tcp or udp')
        client_dst_ip = client_dst_ip or server_ip
        client_dst_port = client_dst_port or server_port
        if protocol == 'tcp':
            server = self._create_server(server_name, server_ip, server_port)
        elif protocol == 'udp':
            filter_exp = '{0} port {1}'.format(protocol, server_port)
            server = self._create_sniffer_on_host(server_name, server_vlan, filter_exp, 1)
        client = self._connect_to_host(client_name, client_dst_ip, client_dst_port, protocol)
        if server.runner.exitcode is None:
            # this means that the process did not exit hence no packets were seen
        self.assertEqual(client.runner.exitcode, 0)
        self.assertEqual(server.runner.exitcode, 0)

    def test_reroute_http_traffic(self):
            'tester3', 93, '', 88, 'tcp', 'tester2', client_dst_ip='', client_dst_port=80)
            'tester3', 93, '', 88, 'tcp', 'tester2', client_dst_ip='', client_dst_port=44444)
            'tester3', 93, '', 88, 'tcp', 'tester2', client_dst_ip='', client_dst_port=88) 


All remote calls from the gateway (where the test is running) to the testers machines are performed using fabric.

An example to a simple command being run on a test:

class FabricProcessProxy(object):
    __metaclass__ = ABCMeta

    def __init__(self, *args, **kwargs):
        self.kwargs = kwargs
        self.args = args
        self.out_q = multiprocessing.Queue()
        # self.runner = multiprocessing.Process(target=lambda: execute(, *self.args, **self.kwargs))
        self.runner = multiprocessing.Process(
            target=lambda: self.out_q.put(execute(, *self.args, **self.kwargs)))

    def execute(self, hosts):
        self.kwargs['hosts'] = hosts
        return self.runner

    def run(self):
        raise NotImplementedError()

class Ping(FabricProcessProxy):
    def run(self, target, iface, count):
        str_iface = '-I {0} '.format(iface) if iface else ' '
        return run('ping -c {count}{iface}-W 1 {target}'.format(count=count, iface=str_iface, target=target))

def _ping_from_host(self, host, dst_ip, through_iface=None, num_pings=1, b_verify_success=True):
        ping = Ping(dst_ip, through_iface, num_pings)
        if b_verify_success:
            self.assertEqual(ping.runner.exitcode, 0)
        return ping.runner.exitcode

_ping_from_host('tester2', '')

Ever since this infrastructure was created, we never looked back. The gateway we have today is a first-class citizen and as long as the tests pass, we have no fear of refactoring it, adding features and making any other changes.

Ido Barkan

Ido Barkan

Backend Team Leader@CloudShare.

2 thoughts on “Using vagrant and fabric for integration tests

    1. Hey Regis,
      I’ll try to answer shortly without throwing too many details on you.

      TLDR- don’t do it. Use dedicated python packaging tools (pip, cheeseshop, virtualenv, docker containers etc.). We will throw away our Debian packages for Python source code as soon as we can.

      We use Debian packages to package and deploy our Python code. This is a very heavyweight tool for solving the relatively easy problem of Python packages deployment. Since we use chef to configure and deploy our machines, most of the debian packages we use is an empty skeleton, only dealing with which source files to package. Each code version has it’s number (the git commit hash) and when we need to deploy, we just change the required version on the chef server. Then, chef uses it’s apt cookbooks to upgrade the debian packages. This implies that we need to maintain our own debian repository and continuously build packages after we test the code (this is done with Jenkins).

Leave a Reply