# Removing duplicates from an unsorted Array

Posted on

Problem

I have written this program to remove the duplicates from an unsorted array. Please review and give me your input on how I can improve it a little more.

public static void main(String[] args) {

int[] arr = {4, 3, 4, 2, 6, 1, 1, 7, 6, 8, 9, 9, 1, 1};

// perform quick sort for o(n log n) time.
quickSort(arr, 0, arr.length - 1);

System.out.println(Arrays.toString(arr));
// now remove the duplicate elements.
arr = removeDuplicates(arr);
System.out.println("Unique elements in Array :" + Arrays.toString(arr));
}
private static int[] removeDuplicates(int[] arr) {
// if array length is one, return the array
if (arr.length <= 1) {
return arr;
}

// new array keep the length equal to current array.
int[] uniqueArray = new int[arr.length];
// keep the lastfound number to be 0th index.
int lastFound = arr[0];
// copy the 0th index value to new arrays 0th.
uniqueArray[0] = arr[0];

// keep track of number of duplicates found.
int totalDuplicates = 0;

// current index position to put the unique number
int currPos = 1;
for (int i = 0; i < arr.length; i++) {

if (lastFound == arr[i]) {
totalDuplicates++;
} else {
lastFound = arr[i];
uniqueArray[currPos] = arr[i];
currPos++;
}
}

// at this point array wil have unique numbers along with empty
// slots at the end in array, lets remove those.

int newLength = arr.length - totalDuplicates;

uniqueArray = Arrays.copyOf(uniqueArray, newLength + 1);
return uniqueArray;
}

private static void quickSort(int[] arr, int s, int e) {

if (s < e) {
int pivot = findPivot(arr, s, e);
System.out.println(pivot);
// sort left
quickSort(arr, s, pivot - 1);
// sort right
quickSort(arr, pivot + 1, e);
}

}

private static int findPivot(int[] arr, int s, int e) {
int p = arr[e];
int i = s;
for (int j = s; j < e; j++) {
if (arr[j] <= p) {
swap(arr, i, j);
i = i + 1;
}
}
swap(arr, e, i);
return i;
}

private static void swap(int[] arr, int i, int j) {
int t = arr[i];
arr[i] = arr[j];
arr[j] = t;
}

Solution

### Improving removeDuplicates

It’s recommended to use an enhanced for-each loop instead of a counting loop when possible:

for (int num : arr) {
if (lastFound == num) {
totalDuplicates++;
} else {
lastFound = num;
uniqueArray[currPos] = num;
currPos++;
}
}

And you can get rid of totalDuplicates, as currPos already contains that same information:

int currPos = 1;
for (int num : arr) {
if (lastFound != num) {
lastFound = num;
uniqueArray[currPos] = num;
currPos++;
}
}

return Arrays.copyOf(uniqueArray, currPos);

I noticed later that your original code does an unnecessary comparison for the first element or arr: it would be enough to iterate from the 2nd element. Unfortunately, for that we need to bring back the counting loop:

int currPos = 1;
for (int i = 1; i < arr.length; ++i) {
int num = arr[i];
if (lastFound != num) {
lastFound = num;
uniqueArray[currPos] = num;
currPos++;
}
}

return Arrays.copyOf(uniqueArray, currPos);

### Reduce allocations

Note that removeDuplicates allocates an array to collect the unique elements (uniqueArray),
and finally returns a clone of that array, with the appropriate size,
thus allocating for one more array.
If you don’t mind modifying the input array, then you can avoid the allocation of uniqueArray by overwriting the content of the input array:

private static int[] removeDuplicates(int[] arr) {
if (arr.length <= 1) {
return arr;
}

int lastFound = arr[0];

int currPos = 1;
for (int i = 1; i < arr.length; ++i) {
int num = arr[i];
if (lastFound != num) {
lastFound = num;
arr[currPos++] = num;
}
}

return Arrays.copyOf(arr, currPos);
}

### Unit testing

To verify the implementation works,
and for safe refactoring,
it’s good to have some unit tests around, for example:

@Test
public void test_1_2_3() {
int[] orig = {1, 2, 3};
assertArrayEquals(orig, removeDuplicates(orig.clone()));
}

@Test
public void test_empty() {
int[] orig = {};
assertArrayEquals(orig, removeDuplicates(orig.clone()));
}

@Test
public void test_single() {
int[] orig = {3};
assertArrayEquals(orig, removeDuplicates(orig.clone()));
}

@Test
public void test_1_1_1() {
int[] orig = {1, 1, 1};
assertArrayEquals(new int[]{1}, removeDuplicates(orig.clone()));
}

@Test
public void test_1_1_1_2_2_3_3_3() {
int[] orig = {1, 1, 1, 2, 2, 3, 3, 3};
assertArrayEquals(new int[]{1, 2, 3}, removeDuplicates(orig.clone()));
}

@Test
public void test_1_2_3_3_3() {
int[] orig = {1, 2, 3, 3, 3};
assertArrayEquals(new int[]{1, 2, 3}, removeDuplicates(orig.clone()));
}

@Test
public void test_1_2_2_2_3() {
int[] orig = {1, 2, 2, 2, 3};
assertArrayEquals(new int[]{1, 2, 3}, removeDuplicates(orig.clone()));
}

You can simply use a HashSet instead:

int[] arr = {4, 3, 4, 2, 6, 1, 1, 7, 6, 8, 9, 9, 1, 1};
Set<Integer> set = new HashSet<Integer>();
for (int n : arr)