Problem
Imagine that you have a TreeSet
of numbers, and you copy them into an array, but you remove one element of it. You want to know the element removed.
import java.util.Arrays;
import java.util.Set;
import com.google.common.collect.Sets;
public class Question {
private static final Set<Integer> SET = Sets.newTreeSet(Arrays.asList(1, 3, 5, 7));
public static void main(String[] args) {
int[] unsorted = {1, 5, 3};
int elementRemoved = checkElementRemoved(unsorted);
System.out.println("Element removed " + elementRemoved);
}
private static int checkElementRemoved(final int[] unsorted) {
Integer elementRemoved = null;
for (Integer number : SET) {
if (!arrayContains(unsorted, number)) {
// O(n^2) complexity
elementRemoved = number;
break;
}
}
return elementRemoved;
}
private static boolean arrayContains(final int[] unsorted, final int number) {
for (Integer inArray : unsorted) {
if (inArray == number) {
return true;
}
}
return false;
}
The problem is that this solution has O(n2) complexity and I would like to improve the performance.
I think that the best option would be to sort the array, am I right?
Solution
Sum all the elements in the array. Sum all the elements in the treeset. The difference in sums is the missing element.
I think that the best option would be to sort the array, am I right?
Not exactly. You could sort it and reduce the complexity using a binary search to O(n log(n))
. But you could much better by copying the array into a HashSet
and getting O(n)
assuming no perverted distribution (the worst case hash table access is O(n)
, but occurs very rarely).
The search itself could be even faster, when using two sorted arrays. As all the data come from a Set
, there are duplicates. So whenever the element of the first array at a given position equals to the element of the other array at the same position, you know that you’re before the removed element. Using a binary search you can get a complexity of O(log(n))
. However, producing the the second array costs O(n)
, so you don’t gain anything w.r.t. the O-notation.
Review
private static final Set<Integer> SET = Sets.newTreeSet(Arrays.asList(1, 3, 5, 7));
This is fine for the test data, but then you should define another constant for unsorted
, too.
private static int checkElementRemoved(final int[] unsorted) {
Integer elementRemoved = null;
for (Integer number : SET) {
if (!arrayContains(unsorted, number)) {
// O(n^2) complexity
elementRemoved = number;
break;
}
}
return elementRemoved;
}
You’re not checking anything. Call it find...
or alike. Don’t use useless variables. It all can be reduced to
private static int checkElementRemoved(int[] unsorted, Iterable<Integer> set) {
for (Integer number : set) {
if (!arrayContains(unsorted, number)) {
// O(n^2) complexity
return number;
}
}
throw new WhateverException();
}
Your code secretly throws a NullPointerException
when unboxing elementRemoved
. This is pretty bad. Throwing NPE explicitly would be much better, but NPE is not the right exception here. Letting your method return an Integer
would be better, but only postpone the problem. Actually, IllegalArgumentException
sounds right, as it’s not allowed to pass an array missing no element.
In any case, you should pass SET
as an argument, so that your method is more usable.
private static boolean arrayContains(final int[] unsorted, final int number) ...
This is no better than Arrays.asList(unsorted).contains(number)
. But as already said, create a HashSet
and use it for speed.
I have a solution, which takes O(n) time and O(1) space.
The algorithm is as follows:
def find_removed (set, unsorted):
a <- as_array (set)
r <- 0
for c in [1 ..> set.length]:
r <- bitwise_or (r, a[c])
for c in [1 ..> unsorted.length]:
r <- bitwise_xor (r, unsorted[c])
return r
The Java code w.r.t your snippet may be like this (I didn’t test it):
public class Question {
private static final Set<Integer> SET = Sets.newTreeSet (Arrays.asList (1, 3, 5, 7));
public static void main (String[] args) {
int[] unsorted = {1, 5, 3};
int r = 0;
for (Integer e: SET) r |= e;
for (Integer e : unsorted) r ^= e;
System.out.println ("the element removed: " + r);
}
}
If you always expect that the only one number was removed, why you don’t want to sort unsorted array and compare only numbers on the same positions?
Something like this, in this case you don’t need to search items in SET:
private static int checkElementRemoved(final int[] unsorted) {
Integer[] setArray = (Integer[]) SET.toArray();
Arrays.sort(setArray);
Arrays.sort(unsorted);
int i;
for (i = 0; i < unsorted.length; i++) {
if (unsorted[i] != setArray[i])
return setArray[i];
}
return setArray[i];
}