Pros and Cons of Different Approaches to Unit Testing External APIs

There’s three basic approaches to writing unit tests for python code that interact directly with external API’s. Typically, that code involves functions that hide the details of a HTTP call.

I will present a simple example of one of those functions (our System Under Test (SUT)) and then discuss pros and cons of each of the following approaches:

  • Doing it live (no call interception)
  • Mocking - front door and back door
  • Fake server

I will do my best to steer away from prescribing how you should test and focus squarely on the technical trade offs.

Note: In real systems, there may be any combination of these approaches in a test case. For example, consumer driven contract testing sometimes make use all three approaches in the test infrastructure. By keeping my idealized HTTP function example simple, I can focus on the tradeoffs of each approach.

Here’s a function which does a basic GET against a hacker news API endpoint:

import urllib.request
import json

base_url = "https://hacker-news.firebaseio.com"
def get_item_by_id(item_id):
    url = "{}/v0/item/{}.json".format(base_url, item_id)
    with urllib.request.urlopen(url) as f:
        return json.loads(f.read().decode('utf-8'))

And here’s the current (at time of writing) JSON response from hacker news:

{
    "by": "pg",
    "descendants": 15,
    "id": 1,
    "kids": [15,234509,487171,454426,454424,454410,82729],
    "score": 57,
    "time": 1160418111,
    "title": "Y Combinator",
    "type": "story",
    "url": "http://ycombinator.com"
}

Approach 1: Do it live

This can also be referred to as black box testing, or round trip testing. If this function represents the outermost layer of your system, you can also refer to it as an end to end integration testing.

The main idea here is that you are exercising every layer of your system under test (SUT) and its interaction with the real external API through the SUT’s public interface.

class HTTPTest(unittest.TestCase):
    def test_get_item_by_id(self):
        expected_response = {
            'by': 'pg',
            'descendants': 15,
            'id': 1,
            'kids': [15, 234509, 487171, 454426, 454424, 454410, 82729],
            'score': 57,
            'time': 1160418111,
            'title': 'Y Combinator',
            'type': 'story',
            'url': 'http://ycombinator.com'
        }
        self.assertEqual(get_item_by_id(1), expected_response)

Pros:

  • Low coupling to implementation details. You do not need to worry about the details of how this request happens inside your test, just that you get the expected response.
  • Less test code setup since you’re really only invoking the SUT through it’s public interface.
  • High release confidence. If this test passes, you can be pretty certain nothing is terribly wrong with this function.

Cons:

  • It’s fragile or brittle. There are multiple dependencies outside of your control. As a result, test can fail for reasons that have nothing to do with wrong code or broken API contracts. For example, maybe your internet is down or maybe the API server is down. Maybe the response structure is the same but the values have changed.
  • Potentially slow. Network calls carry latency. This is even worse if the external API is a service that needs to be spun up or woken up for the sole purpose of running your test.

There are some situations when the cons are more acceptable. Just to list a couple:

  • When the number of tests making actual network calls are small and the environments you need to run your tests in have fast, reliable internet. Lots of personal side projects fit the bill.
  • When the tests are infrequently run. This is typical for end to end test suites on a project - they are usually run outside of the unit test loop. If you’re running these tests as part of your unit test suite, the speed hit you take may be hard to swallow.
  • When the API is very stable. Although even with stable API’s, there are typically variable, non-static data in the response. For example, maybe someone upvotes the hacker news item we’re doing a GET request on and now the score value is different even though the rest of the JSON is the same. However, this is an easier problem to solve than having the actual structure (keys) of the response change frequently.

Approach 2: Mocking

In this approach, we step in while the SUT is being exercised and provide a canned response for our test. In other words, we never actually let the network call happen at all because we’re mimicing some part of our program to pretend like we already received a response from the network.

While mocking is only one of many ways of mimicing real objects, I picked it because it’s probably the most familiar word and similar enough to its cousins (spies, dummies, stubs) in function for the purposes of communicating tradeoffs.

There are always two distinct abstraction layers where we can step in using mock objects and interrupt our code:

  1. Top most layer or the level of the public API we are invoking. Also known as the front door.
  2. Lower layer or some level beneath the public API we are invoking. Also known as the back door.

Front door mocking

class HTTPTest(unittest.TestCase):
    def test_get_item_by_id(self):
        expected_response = b'{
            'by': 'pg',
            'descendants': 15,
            'id': 1,
            'kids': [15, 234509, 487171, 454426, 454424, 454410, 82729],
            'score': 57,
            'time': 1160418111,
            'title': 'Y Combinator',
            'type': 'story',
            'url': 'http://ycombinator.com'
        }
        with patch('main.urllib.request.urlopen') as mock_url_open:
            mock_url_open.return_value.__enter__.return_value.read.return_value = expected_response

            result = get_item_by_id(1)

            mock_url_open.assert_called_once_with("https://hacker-news.firebaseio.com/v0/item/1.json")
            self.assertEqual(result, expected_response)

Pros:

  • Fast. No more network calls therefore no latency.
  • Stable. You have full control over the canned network response. There’s no dependence on either the network or the external API.

Cons:

  • Lower confidence to release. You’re the one making up the canned response! You don’t really know for sure if it’s consistent with the actual response from the API.
  • More test code setup. Almost by definition of mocking, you need to introduce more testing code to run your test.
  • Fragile. There’s tighter coupling between the test code and the SUT. You care more about the details of the call. If the details change even though the SUT behavior remains unchanged (refactoring), this test will fail.

But wait - how can an approach be both stable and fragile at the same time?

It’s stable in respect to the external API, but fragile in respect to your actual code or SUT.

This is a common trade-off with mocking. You decouple the test code from the unpredictable network and external API, but couple it to your own code.

Sometimes this is an acceptable tradeoff if you think you don’t think your SUT is going to experience much interface change. For example, maybe it’s a one-off project that’s unlikely going to see any refactors pass it’s use.

In most circumstances, however - especially when writing software for real, changing businesses - betting on no change is unwise.

This brings us to our next mocking subapproach: back door mocking.

Back door mocking

One way around the interface coupling with front door mocking is to mock at a lower level than the interfaces used by the SUT.

In this example, I’m going to lean on a library called HTTPretty that mocks the python sockets library. This will reduce the coupling between the test function and the public interfaces used by our SUT such as `urllib.request.urlopen’.

class HTTPTest(unittest.TestCase):
    def test_get_item_by_id(self):
        httpretty.enable()

        expected_response = b'{
            'by': 'pg',
            'descendants': 15,
            'id': 1,
            'kids': [15, 234509, 487171, 454426, 454424, 454410, 82729],
            'score': 57,
            'time': 1160418111,
            'title': 'Y Combinator',
            'type': 'story',
            'url': 'http://ycombinator.com'
        }
        httpretty.register_uri(
            httpretty.GET,
            "https://hacker-news.firebaseio.com/v0/item/1.json",
            body=expected_response
        )
        result = get_item_by_id(1)
        self.assertEqual(result, expected_response)

        httpretty.disable()
        httpretty.reset()

Note: you can substitute HTTPretty with any other back door mocking library. For example, users of the requests library may want to use requests-mock which mocks the adapter layer of requests. It’s a higher abstraction level than the sockets library that HTTPretty mocks, but it’s the same idea because it’ll be beneath the public API of requests.

Pros and cons?

This approach shares many of the same pros and cons as front-door mocking. As far as pros go, our test is still fast and stable relative to the external API. As far as cons go, we still don’t have great confidence.

The key difference between this back-door mocking approach with the front-door mocking approach is in the area of fragility relative to the SUT. This is the interface coupling problem I mentioned before with front door mocking.

By using a backdoor mocking approach, whether with a library like HTTPretty or rolling your own, you are in some ways making a bet. You’re betting that by shifting the mocking to a lower level, you gain more stability in your test and that you’re able to more freely refactor without having to change your test code in lockstep.

The bet can pay off for you under a couple of conditions:

  1. The lower level code (such as the sockets library which HTTParty mocks) undergoes less change relative to your top level interfaces used by your SUT. If you’re maintaining the sockets mocking code and for some reason it changes every day, you’re gonna be pissed.
  2. Updating lower level mocks is someone elses problem. There’s quite a bit of code that Garbriel Falcao needs to maintain for HTTPretty.

If you are rolling your own back door mocking, then picking the appropriate level is key to improving the stability of your code rather than making it worse. In reality though, most of us will reach for an open source library for backdoor HTTP mocking of some kind and delegate the challenges of picking the appropriate sub layer to mock and keeping it up to date to other people.

In practice, from my own experience, I’ve typically reached for libraries to aid with back-door mocking when it comes to unit testing my HTTP functions and can say that more often than not these open source maintainers have made my work easier. That said, my goal in highlighting these tradeoffs is to remind you that there are no free lunches or silver bullets.

Approach 3: Fake Server

This time, you step in as late as possible in the request-response chain by interrupting the flow after your application code has made its request in the local network.

Here’s a snippet of a bare bones fake server:

class FakeServerRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if re.search(re.compile(r'\/v0\/item\/{d}.json'), self.path)
            self.send_response(200)
            self.send_header('Content-Type', 'application/json; charset=utf-8')
            self.end_headers()
            response_content = json.dumps({
                'by': 'pg',
                'descendants': 15,
                'id': 1,
                'kids': [15, 234509, 487171, 454426, 454424, 454410, 82729],
                'score': 57,
                'time': 1160418111,
                'title': 'Y Combinator',
                'type': 'story',
                'url': 'http://ycombinator.com'
            })
            self.wfile.write(response_content.encode('utf-8'))

def start_server(port):
    fake_server = HTTPServer(('localhost', port), FakeServerRequestHandler)
    fake_server_thread = Thread(target=fake_server.serve_forever)
    fake_server_thread.setDaemon(True)
    fake_server_thread.start()

And here’s how it gets used in a test:

class HTTPTest(unittest.TestCase):
    def test_get_item_by_id(self):
        port = get_free_port()
        start_server(port)
        url = 'http://localhost:{port}'.format(port=port)
        expected_response = {
            'by': 'pg',
            'descendants': 15,
            'id': 1,
            'kids': [15, 234509, 487171, 454426, 454424, 454410, 82729],
            'score': 57,
            'time': 1160418111,
            'title': 'Y Combinator',
            'type': 'story',
            'url': 'http://ycombinator.com'
        }

        with patch('main.base_url', url):
            result = get_item_by_id(1)
            self.assertEqual(result, expected_response)

Pros:

  • Stable relative to SUT. You’re introducing a new server dependency but it’s still more predictable than the internet or the real API. There’s also low interface coupling to SUT because a fake server has no dependency on the SUT code, unlike mocks which require you to specify your calling interface in the test code.
  • Fast. You’re not communicating with the internet.

Cons:

  • Low confidence to release. Higher confidence compared to mocking but still lower than end to end testing since you’re not actually allowing our code to interact with the internet. You’re still making up responses yourself.
  • Fragile relative to the stability of our fake server. In addition to more code, it’s more code that’s running in a separate thread or process. This is another dependency that can break.

In my experience, using fake implementations can be a strong substitute for the more nuanced tradeoffs of mocking HTTP calls. You pretty much completely circumvent the “front door” vs “back door” debate inherent in mocking. That said, I’ve only ever used one or two fake servers so I’m not sure how difficult they can become to run locally in situations where a single function or code base deals with many web services.

Programmers have come up with many ways to deal with these tradeoffs in testing API’s. I hope I’ve helped introduce or clarify some of the common trade offs you may encounter day to day. In a future article, I’ll lay out more complex solutions to testing that leverage the strengths of each approach.