Cracking the Coding Interview - Determine if a string has all unique characters

1.1 Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional datastructures?

Note: this is part of a series where I expand on solutions in python to questions from the book “Cracking the Coding Interview (5th Edition)” by Gayle Laakman Mcdowell. I’ll be porting over the code provided in the book over to python as well as adding my own solutions and even solutions based on alternative solutions suggested by Gayle. Hope you find it helpful!

This post will cover three key areas:

  • Solutions using O(N) space. Solve the problem by ignoring space constraints.
  • Optimized solutions using O(N) space.
  • Solution using no additional space and other alternative solutions.

Solutions using O(N) space and O(N) time

Lets first look at some solutions that use additional datastructures.

One idiomatic way in python of solving this problem is to make use of the set datastructure. We know that if there’s at least one duplicate in our string, it would be eliminated in a set and the sets size will be one less than the original string. One drawback with this approach is that it always looks at every character in our string - even if duplicates exist in the beginning of the string!

def all_unique_chars(x):
    return len(set(x)) == len(x)

A similar solution that doesn’t have the same drawback is to perform explicit testing against a hash map. The function returns early if a duplicate is found. This is also a more common pattern for users of languages that don’t support native sets out of the box (such as JavaScript).

def all_unique_chars(input):
    seen = set()
    for x in input:
        if x in seen:
            return False
    return True

You can also use the specialized Counter container in Python to count all the frequencies of letters. We can then iterate through the container to check for values that have a frequency value greater than 1. Whenever I need frequency counts of keys, I reach for Counter as much as possible. However, it has the same drawback as our first solution where we convert our string into a set.

from collections import Counter
def all_unique_chars(input):
    counter = Counter(input) 
    for values in counter.itervalues(): 
        if values > 1: 
            return False
    return True

Optimize Space Usage

If we assume only ASCII characters in the input, dictionaries and sets won’t always make the most efficient use of the extra space. Knowing the type of data in our input, we can “allocate” exactly the space we need.

def all_unique_chars(input):
    if len(input) > 128:
        return False
    char_map = [False] * 128
    for i in input:
        ascii_idx = ord(i)
        if char_map[ascii_idx]:
            return False
        char_map[ascii_idx] = True
    return True

In the fifth edition, Gayle uses the integer 256 to represent the ASCII size (probably to account for extended ASCII) but this is corrected to be 128 in the newer editions which I reflect here.

To further optimize on space, Gayle uses a bit vector and claims a reduction in space usage by a factor of eight. In Java, the size of a boolean is estimated to be about a byte or eight bits. Therefore, using a single bit to represent the membership of an ASCII character will be a factor of eight reduction in space per character.

Here’s an example using an int and bit-shifting in python:

def all_unique_chars(input):
    bit_vector = 0
    for i in input:
        m = 1 << ord(i)
        if bit_vector & m:
            return False
        bit_vector |= m
    return True

Note: In the book solution, Gayle also used an integer datatype in Java to represent the bit vector and said that was only possible by assuming that the input only contained lower case ASCII letters. This makes sense because integers in Java are represented by 32 bits and there are 26 letters in the alphabet. This same constraint, however, doesn’t exist in python because python 3 integers are objects. This is why you don’t see me do the equivalent of charAt(i) - 'a' as shown in the book.

No additional space and alternative solutions

Here’s a double loop solution without additional space but O(N^2) running time.

def all_unique_chars(input):
    size = len(input)
    for i in range(size - 1): 
        c = input[i]
        for j in range(i + 1, size):
            if input[j] == c: 
                return False
    return True

Gayle also mentions the possibility of using sorting but cautions that some languages do not offer in-place sorts. While python does support in-place sorting with the method .sort() on mutable iterables such as lists, it doesn’t for immutable datatypes such as strings.

That said, here’s the solution for the sorting approach where you can just pretend that the sort is in-place (it’s not). Once the string is sorted, all duplicate characters will be next to each other. If there’s a duplicate, then it’s guaranteed that the other identical value is either immediately before or after.

This function loops through the values and checks if the current character equals the previous character.

def all_unique_chars(input):
    s = sorted(input)
    x = input[0]
    for i in range(1, len(input)):
        if input[i] == x:
            return False
        x = input[i]
    return True

That’s all folks!