1.1 Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional datastructures?
Note: this is part of a series where I expand on solutions in python to questions from the book “Cracking the Coding Interview (5th Edition)” by Gayle Laakman Mcdowell. I’ll be porting over the code provided in the book over to python as well as adding my own solutions and even solutions based on alternative solutions suggested by Gayle. Hope you find it helpful!
This post will cover three key areas:
- Solutions using O(N) space. Solve the problem by ignoring space constraints.
- Optimized solutions using O(N) space.
- Solution using no additional space and other alternative solutions.
Solutions using O(N) space and O(N) time
Lets first look at some solutions that use additional datastructures.
One idiomatic way in python of solving this problem is to make use of the
set datastructure. We know that if there’s at least one duplicate in our string, it would be eliminated in a set and the sets size will be one less than the original string. One drawback with this approach is that it always looks at every character in our string - even if duplicates exist in the beginning of the string!
def all_unique_chars(x): return len(set(x)) == len(x)
def all_unique_chars(input): seen = set() for x in input: if x in seen: return False seen.add(x) return True
You can also use the specialized
Counter container in Python to count all the frequencies of letters. We can then iterate through the container to check for values that have a frequency value greater than 1. Whenever I need frequency counts of keys, I reach for
Counter as much as possible. However, it has the same drawback as our first solution where we convert our string into a set.
from collections import Counter def all_unique_chars(input): counter = Counter(input) for values in counter.itervalues(): if values > 1: return False return True
Optimize Space Usage
If we assume only ASCII characters in the input, dictionaries and sets won’t always make the most efficient use of the extra space. Knowing the type of data in our input, we can “allocate” exactly the space we need.
def all_unique_chars(input): if len(input) > 128: return False char_map = [False] * 128 for i in input: ascii_idx = ord(i) if char_map[ascii_idx]: return False char_map[ascii_idx] = True return True
In the fifth edition, Gayle uses the integer 256 to represent the ASCII size (probably to account for extended ASCII) but this is corrected to be 128 in the newer editions which I reflect here.
To further optimize on space, Gayle uses a bit vector and claims a reduction in space usage by a factor of eight. In Java, the size of a boolean is estimated to be about a byte or eight bits. Therefore, using a single bit to represent the membership of an ASCII character will be a factor of eight reduction in space per character.
Here’s an example using an int and bit-shifting in python:
def all_unique_chars(input): bit_vector = 0 for i in input: m = 1 << ord(i) if bit_vector & m: return False bit_vector |= m return True
Note: In the book solution, Gayle also used an integer datatype in Java to represent the bit vector and said that was only possible by assuming that the input only contained lower case ASCII letters. This makes sense because integers in Java are represented by 32 bits and there are 26 letters in the alphabet. This same constraint, however, doesn’t exist in python because python 3 integers are objects. This is why you don’t see me do the equivalent of
charAt(i) - 'a' as shown in the book.
No additional space and alternative solutions
Here’s a double loop solution without additional space but O(N^2) running time.
def all_unique_chars(input): size = len(input) for i in range(size - 1): c = input[i] for j in range(i + 1, size): if input[j] == c: return False return True
Gayle also mentions the possibility of using sorting but cautions that some languages do not offer in-place sorts. While python does support in-place sorting with the method
.sort() on mutable iterables such as lists, it doesn’t for immutable datatypes such as strings.
That said, here’s the solution for the sorting approach where you can just pretend that the sort is in-place (it’s not). Once the string is sorted, all duplicate characters will be next to each other. If there’s a duplicate, then it’s guaranteed that the other identical value is either immediately before or after.
This function loops through the values and checks if the current character equals the previous character.
def all_unique_chars(input): s = sorted(input) x = input for i in range(1, len(input)): if input[i] == x: return False x = input[i] return True
That’s all folks!
Notes / Links / Resources
- According to High Performance Python book, hashtables are initialized with a minimum of eight and requires additional resizings as capacity is exceeded.
- What is the size of a boolean variable in Java? https://stackoverflow.com/questions/383551/what-is-the-size-of-a-boolean-variable-in-java