Cracking the Coding Interview - Remove Duplicates From Unsorted Linked List

2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this if a temporary buffer is not allowed?

In this post, I’ll cover deduping both sorted and unsorted linked lists.

First, here’s a node class that I’ll use:

class Node:
def __init__(self, v=None, next_node=None):
self.v = v
self.next_node = next_node

def __repr__(self):
return f"{self.v} -> {repr(self.next_node)}"

def __eq__(self, other):
return repr(self) == repr(other)

Deduping a sorted list

If a list is sorted, the duplicate elements will be grouped together.

For example:

1 -> 1 -> 2 -> 3 -> 3 -> 3

With the solution below, we maintain two pointers (prev_node and curr_node). prev_node is our left pointer and will represent the value of the current “run”. The right pointer curr_node acts as a runner that we’ll use to compare with the left. If they are duplicates, remove the dupe and only move the current pointer. Once there are no more dupes in the current “run”, we can safely update the prev_node value.

while curr_node:
curr_value = curr_node.v
next_node = curr_node.next_node
if curr_value == prev_node.v:
prev_node.next_node = next_node
else:
prev_node = curr_node # only update the left pointer if the right pointer is a different element
curr_node = next_node

This only works for sorted lists, because in an unsorted list, a duplicate of a left pointer can appear anywhere in the list (and not just in a consecutive sequence of nodes). The extra pointer, however, is unecessary.

Here’s the same approach without using two pointers. Rather than having a separate left pointer, we can just keep a single pointer in place when there are dupes.

while curr_node and curr_node.next_node:
value = curr_node.v
next_node = curr_node.next_node
if curr_node.v == next_node.v:
curr_node.next_node = next_node.next_node
else:
curr_node = next_node

Deduping an unsorted list

Here’s two versions: one that uses a temporary “visited” structure and one that doesn’t (in order to meet the “FOLLOW UP” requirements).

Here’s an example using a set datastructure to remember visited nodes:

visited = {curr_node.v}
while curr_node and curr_node.next_node:
next_node = curr_node.next_node
next_value = next_node.v
visited.add(next_value) # we can also optimize this by only calling it if the next_value is not visited yet
if next_value in visited:
curr_node.next_node = next_node.next_node
else:
curr_node = next_node

If we want to do this without any additional memory, we need to have an inner loops / runner that checks for dupes: