Problem
Challenge:
Find the duplicated entry.
Specifications:
Your program should accept as its first argument a path to a filename.
Each line in this file is one test case.Each line begins with a positive integer(N), the size of the array,
then a semicolon followed by a comma separated list of positive numbers
which range from 0 to N-2, inclusive.The array contains exactly one duplicated entry which appears exactly twice.
Print out the duplicated entry, each one on a new line.
Solution:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
public class ArrayAbsurdity {
public static void main(String[] argument) throws FileNotFoundException {
Scanner input = new Scanner(new File(argument[0]));
String[] args;
while (input.hasNextLine()) {
args = input.nextLine().split(";");
printDuplicate(args[0], args[1].split(","));
}
}
private static void printDuplicate(String size, String[] nums) {
boolean[] absurd = new boolean[Integer.parseInt(size)];
int value;
Arrays.fill(absurd, false);
for (int i = 0; i < absurd.length; i++) {
value = Integer.parseInt(nums[i]);
if (absurd[value]) {
System.out.println(value);
break;
}
absurd[value] = true;
}
}
}
Sample Input:
5;0,1,2,3,0
20;0,1,10,3,2,4,5,7,6,8,11,9,15,12,13,4,16,18,17,14
Sample Output:
0
4
I usually write two auxiliary methods, but this time I did some of the work within main’s loop, and only wrote one print method. Does the print method do too much? With and without considering its name.
I was going to use two arrays and/or two loops at first, but considering all that was given the question’s purpose surely is to create the most efficient solution and thanks to learning of Arrays.fill
on one of my previous answers, courtesy of @rolfl, I came up with this. Is it optimal?
For anyone interested, I maintain my current implementation here, and the implementation of the accepted answer here, also the source for the title since a few people asked in chat.
Solution
@tim suggested a way to do this with less memory by sorting the array. There’s another way to do it with less memory without needing to sort.
Let’s say that k is the duplicate value in the array. Then we know that the sum of the elements in the array is equal to
where we’re using the identity
So if sum
is the sum of the values in the array, the duplicate value will be sum - (n - 2) * (n - 1) / 2
.
Does the print method do too much?
Technically? Yes. The idea that a method should only do one thing exists because it makes methods reusable. So in this case, if you would extract the finding of the duplicate (eg by returning it), you could use the method again. Additionally, your method also somewhat transforms a string array to an int array, which might also be extracted to an extra method.
But how often do you need a method that accepts an array with exactly one duplicate value and values ranging up to the size – 1 of the array that you want to find? Probably not all that often.
As you said, this is just a small exercise, and in this case I would say that it’s fine as it is (although I would probably return the value and print in in main).
Is it optimal?
Right now, you take two times the memory of the original array (your boolean array plus the original array), and in the worst case, you are running through the array-size twice (once to fill the boolean array, once over the original array).
You actually don’t need Arrays.fill(absurd, false)
at all, because the default value will be false. So if you remove it, you will at least save on time.
If space instead of time is the issue, you could also sort the array, and then find the two equal values that way without needing any extra memory.
Misc
- you should define variable names in as small a scope as possible to enhance readability.
value
isn’t actually needed outside of the for loop. - your loop runs up to
absurd.length
instead ofnums.length
. If there always is a duplicate, this doesn’t matter, as the loop is exited before, but otherwise, you will get a nullpointer exception.
Instead of combining the finding of the duplicate with the outputting of the result, you should extract the finding of the duplicate to a separate method.
Also @tim stated the valid point
But how often do you need a method that accepts an array with exactly
one duplicate value and values ranging up to the size – 1 of the array
that you want to find? Probably not all that often.
you could create a method returning the first duplicate without the need of the size
. Such a method could be quite helpful someday.
There is no need to pass the size
to the method at all, because the array is already passed from which you can take the length
if you want to stick to the boolean array and the default for
loop.
Instead of using a boolean array which maybe is faster, you could simply use a Set<T>
doing the same thing.
For such a programming question, it doesn’t really matter if you take the values as String
or int
. There won’t be the need to parse the items to int
.
private static String getFirstDuplicate(String[] values) {
Set<String> set = new HashSet<>(values.length);
for (String value : values) {
if (!set.add(value)) {
return value;
}
}
return "";
}
The intention of using a Set<T>
in this method is clear and one doesn’t need to figure out why and how the boolean array is used.
If it would matter to use int
, the method can be changed to a generic version like
public static <T> T getFirstDuplicate(T[] values) {
Set<T> set = new HashSet<>(values.length);
for (T value : values) {
if (!set.add(value)) {
return value;
}
}
return null;
}
but this method would have the need that you pass in a Integer[]
which would need to create from the String[]
. So just using int
by parsing the values would result in
private static int getFirstDuplicate(String[] values) {
Set<Integer> set = new HashSet<>(values.length);
for (String value : values) {
int current = Integer.parseInt(value);
if (!set.add(current)) {
return current;
}
}
return Integer.MIN_VALUE;
}
@mjolka suggested an ingenious method to solve this problem. In its comments there was some discussion about using arrays and complexity. To settle this I’d like to share some code that solves the problem – without using any arrays, and of obvious complexity O(n) time and O(1) memory.
public class FindDuplicates {
public static void main(String[] args) throws Exception {
Scanner input = new Scanner(new File(args[0]));
input.useDelimiter("[;,]|r?n");
while (input.hasNext()) {
int n = input.nextInt();
long sum = 0;
for (int i = 0; i < n; ++i) sum += input.nextInt();
System.out.println(sum - (n - 2) * (n - 1) / 2);
}
}
}
Please note that I’m not suggesting this is good code – it’s just the minimal thing to do to prove this point. In a real program this would check the input format, contain some documentation, and using arrays or collections might even be the better thing to do to have readable and maintainable code.