Get All Digits From a Number [Performance Benchmarking]

It’s important to know how to run basic benchmarks on your code so that you know if something is actually fast. The code I’m benchmarking in this post is an algorithm that extracts the digits from a number.

Let f(n) be the function that takes a number as input and returns a list of the numbers digits as output.

For example, given the digit: 123456, f(123456) will return [1,2,3,4,5,6].

Order doesn’t matter, so [6,5,4,3,2,1] is equally valid.

Approaches

I’ll first go over 3 possible implementations of f(n) and then I’ll share the benchmark results.

Approach #1: Convert to sequence and iterate

def digits1(num):
  return [int(x) for x in str(num)]

Since strings are sequences and sequences are interable, we can convert our number into a string and iterate through each character in the string and cast it into an integer. This is a pretty concise readable implementation and likely one that I’ll reach for first if asked to do this task under time constraint (such as an interview).

Approach #2: Use divmod builtin function

def digits2(num):
  res = []
  num, d = divmod(num, 10)
  res.append(d)
  while num > 0:
    num, d  = divmod(num, 10)
    res.append(d)
  return res

divmod returns a tuple of the result of % (modulo operator) which gives you the value of the right most digit and the result of // (integer division) which gives you the value of the number without the right most digit. Repeating this operation basically lets you slice off the digits in a number from right to left.

Approach #3: Use % and //

def digits3(num):
    res = []
    res.append(num % 10)
    num = num // 10
    while num > 0:
        res.append(num % 10)
        num = num // 10
    return res

The logic in this implementation is equivalent to our previous divmod implementation. The main difference is that instead of calling the divmod built-in, I’m performing the two operations (mod and integer division) directly in the function.

Before reading on, which one of these do you think will run the fastest?

Benchmarking Results

I ran five sets of tests using the builtin timeit module1.

FunctionRun 1Run 2Run 3Run 4Run 5
digit1 (sequence)16.0826959617.5350837712.5137460212.9273850912.57732296
digit2 (divmod)8.136286028.4031109818.0410280238.0131959928.013605118
digit3 (% and //)8.0716710099.6098308567.9621911057.736943968.207738876

During each run, each function was executed on a sample input a million times. The numbers you see are the number of seconds it took to execute that function a million times.

Here’s the benchmarking source code:

import timeit
for _ in range(5):
  print(timeit.timeit("digits1(num)", setup="from __main__ import digits1;num=1234567324235435346536245", number=1000000))
  print(timeit.timeit("digits2(num)", setup="from __main__ import digits2;num=1234567324235435346536245", number=1000000))
  print(timeit.timeit("digits2(num)", setup="from __main__ import digits2;num=1234567324235435346536245", number=1000000))

The string iteration implementation (digit1) is almost twice as slow as the direct arithmetic approaches. I knew it was probably going to be slower, but frankly this was a surprise.

So, why is the sequence iteration approach so much slower? Is it the initial conversion into a string? The actual iteration process? The conversion of each character back into an integer? All of the above?!

One approach to understanding the why of language performance is to look at a lower level of abstraction. For Python, one such level is the bytecode representation of source programs.

Disassembly

Python comes with a module dis2 that outputs the sequence of bytecodes used by the CPython interpreter. I wanted to see if there was anything obvious in the bytecode that would point to this large execution time difference.

This is the bytecode for digit1 (the slow, sequence iteration code):

 0 BUILD_LIST               0
 3 LOAD_GLOBAL              0 (str)
 6 LOAD_FAST                0 (num)
 9 CALL_FUNCTION            1
12 GET_ITER
13 FOR_ITER                18 (to 34)
16 STORE_FAST               1 (x)
19 LOAD_GLOBAL              1 (int)
22 LOAD_FAST                1 (x)
25 CALL_FUNCTION            1
28 LIST_APPEND              2
31 JUMP_ABSOLUTE           13
34 RETURN_VALUE

This is the bytecode for digit2 (the faster arithmetic code):

  7           0 BUILD_LIST               0
              3 STORE_FAST               1 (res)

  8           6 LOAD_GLOBAL              0 (divmod)
              9 LOAD_FAST                0 (num)
             12 LOAD_CONST               1 (10)
             15 CALL_FUNCTION            2
             18 UNPACK_SEQUENCE          2
             21 STORE_FAST               0 (num)
             24 STORE_FAST               2 (d)

  9          27 LOAD_FAST                1 (res)
             30 LOAD_ATTR                1 (append)
             33 LOAD_FAST                2 (d)
             36 CALL_FUNCTION            1
             39 POP_TOP

 10          40 SETUP_LOOP              50 (to 93)
        >>   43 LOAD_FAST                0 (num)
             46 LOAD_CONST               2 (0)
             49 COMPARE_OP               4 (>)
             52 POP_JUMP_IF_FALSE       92

 11          55 LOAD_GLOBAL              0 (divmod)
             58 LOAD_FAST                0 (num)
             61 LOAD_CONST               1 (10)
             64 CALL_FUNCTION            2
             67 UNPACK_SEQUENCE          2
             70 STORE_FAST               0 (num)
             73 STORE_FAST               2 (d)

 12          76 LOAD_FAST                1 (res)
             79 LOAD_ATTR                1 (append)
             82 LOAD_FAST                2 (d)
             85 CALL_FUNCTION            1
             88 POP_TOP
             89 JUMP_ABSOLUTE           43
        >>   92 POP_BLOCK

 13     >>   93 LOAD_FAST                1 (res)
             96 RETURN_VALUE

I’m not familiar with all of the CPython opcodes so I couldn’t really tell at a glance if there was a specific opcode that was the culprit for digit1’s slow performance in comparison.

If we isolate the parts of the bytecode that are being executed in loops for both algorithms, they both have a series of LOAD calls followed by a CALL_FUNCTION.

This is interesting for a few reasons:

  1. Function calls are expensive!
  2. Both implementations are using function calls to global built-in functions (via LOAD_GLOBAL)
  3. Both implementations (digit1 and digit2) perform roughly K function calls such that K is the number of digits in a number

In computer science terms, these two functions have pretty similar Big O. The answer probably lies in the details of the actual C implementation of the functions.

To test this, I removed the integer conversion call to the int builtin for the string sequence (digit1) implementation and re-ran our tests.

Here are the new results from a single new run:

functionrun
digit1 (without conversion)1.820481062
digit28.126174927
digit38.007368088

Woah, our slowest algorithm sped up by nearly 90%! In other words, if we don’t bother re-converting our characters back into integers, our first approach is way faster.

C Implementation Details

I dug around the C implementations (the functions that the CPython interpreter actually executes when interpreting the CALL_FUNCTION opcodes) and found the most recent source code for both divmod3 and int4 python builtin functions.

I didn’t get a chance to look too deeply, but one thing that sticks out for me is the sheer number of branching statements in PyLong_FromString (the C implementation of int). This isn’t too surprising, given how much work it takes to convert a single character not only to its numeric representation, but the numeric representation of the character digits value (which I think will make for a pretty interesting post on its own!).

Anyway, I hope that was informative.

PS: I didn’t speak too much on digit3 in this post. I wonder … Is divmod faster than using operators?

References


  1. https://docs.python.org/3/library/timeit.html ↩︎

  2. https://docs.python.org/3/library/dis.html ↩︎

  3. https://github.com/python/cpython/blob/master/Objects/longobject.c#L4108 ↩︎

  4. https://github.com/python/cpython/blob/master/Objects/longobject.c#L2233 ↩︎