# Finding the maximum inclusive distance between two duplicate numbers in an array

Posted on

Problem

The prompt that made me put this function together was sourced from CodingBat.

I was curious about cleaner, more correct methods of doing putting this together such as finding a cleaner way to tell if there is or is not a duplicate in the array. I am also trying to be as efficient as possible. I believe this is an O(n2)$O\left({n}^{2}\right)$$O(n^2)$ function. Let me know what I should change.

The code works just fine. I just feel like there is quite a lot I can streamline and I would like to be pointed in the right direction.

 public int maxSpan(int[] nums) {
int highestSpan = 0;
int span;
boolean duplicate = false;
if (nums.length == 0)
return highestSpan;
for (int i = 0; i < nums.length; i++) {
for (int j = 0; j < nums.length; j++){

if ((nums[i] == nums[j])&& j != i){ //if duplicate
duplicate = true;
//get the absolute value of j - i
span = j - i + 1; //Add 1 because it needs to count itself
//if it is larger than the highestSpan then record it
if (span > highestSpan)
highestSpan = span;
}
}
}
if (duplicate)
return highestSpan;
else
return 1;
}


Solution

Getting rid of duplicate as Janos proposed and then QPaysTaxes wrote is a good step, but this useless variable introduced quite some useless code we should also get rid of. The following code builds up on QPaysTaxes’ answer; I’m commenting on changed things and presenting my variant:

public int maxSpan(int[] nums) {
int highestSpan = 0;


This is not the place where span should be defined. You don’t need it here.

The condition nums.length == 0 can go.

    for (int i = 0; i < nums.length; i++) {
for (int j = i; j < nums.length; j++) {


Changed the lower bound from i+1 to i to cover span equal to 1. Below removed the i != j test and used max to make the code a bit shorter.

            if (nums[i] == nums[j]) {
int span = j - i + 1; // Add 1 to count itself
highestSpan = Math.max(highestSpan, span);
}
}
}


No need for a conditional here.

    return highestSpan;
}


I guess, I saved some 6 lines without making it any more complicated. Now can also span be inlined to save 1 more line (but that’s not the objective).

Now it works exactly according to the explanation by QPaysTaxes in comment without any special tests:

An empty array doesn’t have any spans, so clearly it has to return 0. An array of unique elements is rather more confusing, but since the prompt states that a single (i.e. unduplicated) element has a span of 1, then an array of unique (i.e. unduplicated [i.e. single]) elements has a maximum span of the span of every unique number

An O(n) solution could look like this (untested):

public int maxSpan(int[] nums) {
int highestSpan = nums.length == 0 ? 0 : 1;
Map<Integer, Integer> firstOccurrenceMap = new HashMap<>();
for (int i = 0; i < nums.length; i++) {
Integer firstOccurrence = firstOccurrenceMap.get(nums[i]);
if (firstOccurrence == null) {
firstOccurrenceMap.put(nums[i], i);
} else {
highestSpan = Math.max(highestSpan, i - firstOccurrence + 1);
}
}
return highestSpan;
}


As the first occurrence of a number gets a special treatment, the empty array has to be handled specially, too (see the declaration of highestSpan).

### Avoid unnecessary flag variables

The variable duplicate is unnecessary, because you could drive the same meaning from highestSpan > 0. It’s an extra hurdle to keep this flag, I suggest to drop it.

### Improving the algorithm

I believe this can be solved in O(N)$O\left(N\right)$$O(N)$ time and space, by using a map to record the position of the first occurrences of numbers. If a number was seen already, check the distance and update the max distance if necessary. Otherwise record the current position.

First: Always use brackets around conditionals, even one-liners. It makes it easier to update later. That’s not a big factor with this code, but it may well be with code you write in the future, and it’s a good habit to get in to.

Add a comment before return 1; explaining that 1 is the default amount because… [your reasoning here].

//if it is larger than the highestSpan then record it


right before code that explains itself are entirely unnecessary.

Always put spaces both before and after every binary operator — things like &&.

if ((nums[i] == nums[j])&& j != i){ //if duplicate


The first set of parentheses is unnecessary, but doesn’t harm anything. However, you should always try to be consistent — if you parenthesize one side, parenthesize the other side, too. I decided to remove them altogether.

Newlines should separate logically separate code — I like to separate the variable declarations from the calculations bits, unless the variables are one-off little things that are just to make things readable. I also like to separate the bit that returns things from the rest. See the changed code that the bottom for what I mean.

As far as your actual algorithm: Just about the only improvement I could offer is that you should initialize j to i + 1 rather than 0 to avoid extra traversing that you already did, albeit backwards. To clarify, this is what your for blocks should look like (with the insides snipped):

for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++){
// [snip!]
}
}


If you want, you can put a ternary instead of the if block at the end; something like this:

return duplicate ? highestSpan : 1;


(Note: I also stole something from janos’ answer — I got rid of duplicate.)

public int maxSpan(int[] nums) {
int highestSpan = 0;
int span;
if (nums.length == 0) {
return highestSpan;
}
for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++){
if (nums[i] == nums[j] && j != i){ // if duplicate
//get the absolute value of j - i
span = j - i + 1; //Add 1 because it needs to count itself

if (span > highestSpan) {
highestSpan = span;
}
}
}
}

return highestSpan > 1 ? highestSpan : 1;
}


According to CodingBat, it still works.

You can cut the computational complexity nearly in half.

Your nested for-loop is actually checking each pair of objects twice. Try unwrapping the loop and write out each combination of i and j to see why.

Consider this loop:

for (i = 0 to nums.length) {
for (j = i+1 to nums.length) {
...
}
}


If you unwrap that one, you will see we end up with only unique combinations of i and j, and it eliminates the possibility that i>j (which could make your span variable negative).

You can clearly do this in linear time by writing the array to a multimap. Basically we are going to have a map from value -> list of places where it appears in the array.

So {1, 4, 2, 1, 4, 4, 4} from coding bat would become the map:

1 -> 0,3
2 -> 2
4 -> 1,4,5,6


The span is obviously largest – smallest + 1. You could write a custom map object which just stores the largest and smallest value on a given key.

This is obviously linear. This is similar to the classic trick for finding duplicates, you create a HashSet, and exploit the behaviour of HashSet.add(), which returns true only if add increases the size of the set, so it will return false at the first duplicate.