Cracking the Coding Interview - Basic String Compression

1.5 Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabccccaaa would become a2b1c5a3. If the “compressed” string would not become smaller than the original string, your method should return the original string.

This question is asking for a basic implementation of run-length encoding.

String concatenation

def compress_string(value):
    if not value:
        return value
    result = ""
    current_char = value[0]
    current_count = 1
    for i in range(1, len(value)):
        char = value[i]
        if char == current_char:
            current_count += 1
        else:
            result += f"{current_char}{current_count}"
            current_count = 1
    result += f"{current_char}{current_count}"
    if len(result) >= len(value):
        return value
    return result

Gayle considers this approach inefficient both in time and space because every time a new character is encountered and we append both the current_char and the current_count values to the resulting string, the concatentation operation takes O(K^2) time such that K is the size of our character sequence (the count of unique characters).

Why does the concatenation take O(K^2) time?

Consider the example given in the prompt: a2b1c5a3. Lets say we’re at the end of the compression in our function above and result equals a2b1c5 and we have a and 3 left to add. For each of those last two characters (a and 3), we need to copy all of the left hand side (a2b1c5) and the new character into a new string.

Since we need to iterate through the entire string as well, the runtime complexity is O(p + k^2) such that p is the length of the string and k is the length of our character sequence.

Small aside: in the book, gayle makes this runtime estimate based on java, not python. Newer versions of java actually compile string concatenation in certain situations to use string buffers which do not take quadratic time for concatentatios

Allocate a result list to be exact size of final compression

Avoid the overhead of string concatenation by pre-allocating a list. Find the size by running an intial run that only computes the number of characters we’ll be introducing. If it’s less than the original string, then we return the original. Otherwise, we’ll run the compression.

def compress_string(value):
    if not value:
        return value
    # find final result list size
    current_char = value[0]
    current_count = 1
    total_size = 0
    for i in range(1, len(value)):
        char = value[i]
        if char == current_char:
            current_count += 1
        else:
            # note that we have to stringify the count value to count the characters
            total_size += (1 + len(str(current_count)))
            current_char = char
            current_count = 1
    total_size += (1 + len(str(current_count)))

    if total_size >= len(value):
        return value
    # calculate compression
    result = [None] * total_size
    current_char = value[0]
    current_count = 1
    index = 0
    for i in range(1, len(value)):
        char = value[i]
        if char == current_char:
            current_count += 1
        else:
            result[index] = current_char
            index += 1
            for c in str(current_count):
                result[index] = c
                index += 1
            current_char = char
            current_count = 1
    result[index] = current_char
    index += 1
    for c in str(current_count):
        result[index] = c
        index += 1
    return ''.join(result)