Finding the maximum inclusive distance between two duplicate numbers in an array

Posted on

Problem

The prompt that made me put this function together was sourced from CodingBat.

I was curious about cleaner, more correct methods of doing putting this together such as finding a cleaner way to tell if there is or is not a duplicate in the array. I am also trying to be as efficient as possible. I believe this is an O(n2)$O\left({n}^{2}\right)$$O(n^2)$ function. Let me know what I should change.

The code works just fine. I just feel like there is quite a lot I can streamline and I would like to be pointed in the right direction.

 public int maxSpan(int[] nums) {
int highestSpan = 0;
int span;
boolean duplicate = false;
if (nums.length == 0)
return highestSpan;
for (int i = 0; i < nums.length; i++) {
for (int j = 0; j < nums.length; j++){

if ((nums[i] == nums[j])&& j != i){ //if duplicate
duplicate = true;
//get the absolute value of j - i
span = j - i + 1; //Add 1 because it needs to count itself
//if it is larger than the highestSpan then record it
if (span > highestSpan)
highestSpan = span;
}
}
}
if (duplicate)
return highestSpan;
else
return 1;
}


Solution

Getting rid of duplicate as Janos proposed and then QPaysTaxes wrote is a good step, but this useless variable introduced quite some useless code we should also get rid of. The following code builds up on QPaysTaxes’ answer; I’m commenting on changed things and presenting my variant:

public int maxSpan(int[] nums) {
int highestSpan = 0;


This is not the place where span should be defined. You don’t need it here.

The condition nums.length == 0 can go.

    for (int i = 0; i < nums.length; i++) {
for (int j = i; j < nums.length; j++) {


Changed the lower bound from i+1 to i to cover span equal to 1. Below removed the i != j test and used max to make the code a bit shorter.

            if (nums[i] == nums[j]) {
int span = j - i + 1; // Add 1 to count itself
highestSpan = Math.max(highestSpan, span);
}
}
}


No need for a conditional here.

    return highestSpan;
}


I guess, I saved some 6 lines without making it any more complicated. Now can also span be inlined to save 1 more line (but that’s not the objective).

Now it works exactly according to the explanation by QPaysTaxes in comment without any special tests:

An empty array doesn’t have any spans, so clearly it has to return 0. An array of unique elements is rather more confusing, but since the prompt states that a single (i.e. unduplicated) element has a span of 1, then an array of unique (i.e. unduplicated [i.e. single]) elements has a maximum span of the span of every unique number

An O(n) solution could look like this (untested):

public int maxSpan(int[] nums) {
int highestSpan = nums.length == 0 ? 0 : 1;
Map<Integer, Integer> firstOccurrenceMap = new HashMap<>();
for (int i = 0; i < nums.length; i++) {
Integer firstOccurrence = firstOccurrenceMap.get(nums[i]);
if (firstOccurrence == null) {
firstOccurrenceMap.put(nums[i], i);
} else {
highestSpan = Math.max(highestSpan, i - firstOccurrence + 1);
}
}
return highestSpan;
}


As the first occurrence of a number gets a special treatment, the empty array has to be handled specially, too (see the declaration of highestSpan).

Avoid unnecessary flag variables

The variable duplicate is unnecessary, because you could drive the same meaning from highestSpan > 0. It’s an extra hurdle to keep this flag, I suggest to drop it.

Improving the algorithm

I believe this can be solved in O(N)$O\left(N\right)$$O(N)$ time and space, by using a map to record the position of the first occurrences of numbers. If a number was seen already, check the distance and update the max distance if necessary. Otherwise record the current position.

First: Always use brackets around conditionals, even one-liners. It makes it easier to update later. That’s not a big factor with this code, but it may well be with code you write in the future, and it’s a good habit to get in to.

Add a comment before return 1; explaining that 1 is the default amount because… [your reasoning here].

//if it is larger than the highestSpan then record it


right before code that explains itself are entirely unnecessary.

Always put spaces both before and after every binary operator — things like &&.

if ((nums[i] == nums[j])&& j != i){ //if duplicate


The first set of parentheses is unnecessary, but doesn’t harm anything. However, you should always try to be consistent — if you parenthesize one side, parenthesize the other side, too. I decided to remove them altogether.

Newlines should separate logically separate code — I like to separate the variable declarations from the calculations bits, unless the variables are one-off little things that are just to make things readable. I also like to separate the bit that returns things from the rest. See the changed code that the bottom for what I mean.

As far as your actual algorithm: Just about the only improvement I could offer is that you should initialize j to i + 1 rather than 0 to avoid extra traversing that you already did, albeit backwards. To clarify, this is what your for blocks should look like (with the insides snipped):

for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++){
// [snip!]
}
}


If you want, you can put a ternary instead of the if block at the end; something like this:

return duplicate ? highestSpan : 1;


(Note: I also stole something from janos’ answer — I got rid of duplicate.)

public int maxSpan(int[] nums) {
int highestSpan = 0;
int span;
if (nums.length == 0) {
return highestSpan;
}
for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++){
if (nums[i] == nums[j] && j != i){ // if duplicate
//get the absolute value of j - i
span = j - i + 1; //Add 1 because it needs to count itself

if (span > highestSpan) {
highestSpan = span;
}
}
}
}

return highestSpan > 1 ? highestSpan : 1;
}


According to CodingBat, it still works.

You can cut the computational complexity nearly in half.

Your nested for-loop is actually checking each pair of objects twice. Try unwrapping the loop and write out each combination of i and j to see why.

Consider this loop:

for (i = 0 to nums.length) {
for (j = i+1 to nums.length) {
...
}
}


If you unwrap that one, you will see we end up with only unique combinations of i and j, and it eliminates the possibility that i>j (which could make your span variable negative).

You can clearly do this in linear time by writing the array to a multimap. Basically we are going to have a map from value -> list of places where it appears in the array.

So {1, 4, 2, 1, 4, 4, 4} from coding bat would become the map:

1 -> 0,3
2 -> 2
4 -> 1,4,5,6


The span is obviously largest – smallest + 1. You could write a custom map object which just stores the largest and smallest value on a given key.

This is obviously linear. This is similar to the classic trick for finding duplicates, you create a HashSet, and exploit the behaviour of HashSet.add(), which returns true only if add increases the size of the set, so it will return false at the first duplicate.