# Cracking the Coding Interview - Basic String Compression

1.5 Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabccccaaa would become a2b1c5a3. If the “compressed” string would not become smaller than the original string, your method should return the original string.

This question is asking for a basic implementation of run-length encoding.

## String concatenation

``````def compress_string(value):
if not value:
return value
result = ""
current_char = value
current_count = 1
for i in range(1, len(value)):
char = value[i]
if char == current_char:
current_count += 1
else:
result += f"{current_char}{current_count}"
current_count = 1
result += f"{current_char}{current_count}"
if len(result) >= len(value):
return value
return result
``````

Gayle considers this approach inefficient both in time and space because every time a new character is encountered and we append both the `current_char` and the `current_count` values to the resulting string, the concatentation operation takes O(K^2) time such that K is the size of our character sequence (the count of unique characters).

Why does the concatenation take O(K^2) time?

Consider the example given in the prompt: `a2b1c5a3`. Lets say we’re at the end of the compression in our function above and `result` equals `a2b1c5` and we have `a` and `3` left to add. For each of those last two characters (`a` and `3`), we need to copy all of the left hand side (`a2b1c5`) and the new character into a new string.

Since we need to iterate through the entire string as well, the runtime complexity is O(p + k^2) such that `p` is the length of the string and `k` is the length of our character sequence.

Small aside: in the book, gayle makes this runtime estimate based on java, not python. Newer versions of java actually compile string concatenation in certain situations to use string buffers which do not take quadratic time for concatentatios

## Allocate a result list to be exact size of final compression

Avoid the overhead of string concatenation by pre-allocating a list. Find the size by running an intial run that only computes the number of characters we’ll be introducing. If it’s less than the original string, then we return the original. Otherwise, we’ll run the compression.

``````def compress_string(value):
if not value:
return value
# find final result list size
current_char = value
current_count = 1
total_size = 0
for i in range(1, len(value)):
char = value[i]
if char == current_char:
current_count += 1
else:
# note that we have to stringify the count value to count the characters
total_size += (1 + len(str(current_count)))
current_char = char
current_count = 1
total_size += (1 + len(str(current_count)))

if total_size >= len(value):
return value
# calculate compression
result = [None] * total_size
current_char = value
current_count = 1
index = 0
for i in range(1, len(value)):
char = value[i]
if char == current_char:
current_count += 1
else:
result[index] = current_char
index += 1
for c in str(current_count):
result[index] = c
index += 1
current_char = char
current_count = 1
result[index] = current_char
index += 1
for c in str(current_count):
result[index] = c
index += 1
return ''.join(result)
``````