# Find majority element in an array given constant external space

Posted on

Problem

Given constant external space (yes no HashMap), how would you improve the following code (code-wise or complexity-wise with stress on the latter) to find the majority element in an array?

/**
* Created by mona on 5/27/16.
*/

import java.util.Arrays;

public class FinaMajority {
public static int findMajority(int[] a) {
Arrays.sort(a);
int maxCount = Integer.MIN_VALUE;
int maxIdx = Integer.MIN_VALUE;
for (int i=0; i<a.length; i++) {
int count = 0;
while ( (i+count) < a.length && a[i] == a[i+count]) {
count++;
if (count == a.length/2) {
return a[i];
}
if (count > maxCount) {
maxCount = count;
maxIdx = i;
}
}
}
return a[maxIdx];
}

public static void main(String[] args) {
int[] a = {3,1,2,2,3,2,2,4,4,3,5,2,3,1,3,4,3,3,3};
System.out.println(findMajority(a));
}

}

Solution

### What is a “majority element” ?

It would be good to define this, as there could be multiple interpretations:

• It could be the element that occurs more than any other in the array. This seems the most natural, intuitive interpretation
• It could be the element that occurs more than n / 2 times, as in this online challenge. I think they made a mistake, the common name for this is leader element instead of “majority”.

### Off-by-one error

The program gives incorrect result for the input [3, 2, 3].
The culprit is here:

int count = 0;
while ( (i+count) < a.length && a[i] == a[i+count]) {
count++;
if (count == a.length/2) {
return a[i];
}

The sorted array contains [2, 3, 3].
At i=0, since (i+0) < a.length and a[i] == a[i+0],
count is incremented to 1 on the next line,
and since a.length/2 is 1, the method prematurely returns 2.

The fix is to move count++ to the end of the while loop.

Notice that starting from count = 0 is pointless.
It would be better to start from count = 1.

### Index out of bounds for singleton array

The program will crash for the input [1].
In this case maxIdx will never change,
its value remains at Integer.MIN_VALUE,
and so it will crash on the line return a[maxIdx].

### Slowness

The program is too slow for large arrays, because after counting the consecutive numbers,
it does not skip those,
and continues checking from the next value of i.
Due to this, the time complexity of the loop degenerates to O(n2)$O\left({n}^{2}\right)$$O(n^2)$.

You can improve by adding to i the number of elements that can be skipped.

int count = 1;
while ((i + count) < a.length && a[i] == a[i + count]) {
if (count == a.length / 2) {
return a[i];
}
if (count > maxCount) {
maxCount = count;
maxIdx = i;
}
count++;
}
i += count - 1;

### Alternative implementation

If you’re really looking for the majority element and not the leader element, then your implementation can be written slightly simpler without a nested loop:

public int findMajority(int[] nums) {
Arrays.sort(nums);

int maxCount = 0;
int maxIndex = 0;

int count = 1;

for (int i = 1; i < nums.length; i++) {
if (nums[i - 1] == nums[i]) {
count++;
if (count > maxCount) {
maxIndex = i;
maxCount = count;
}
} else {
count = 1;
}
}
return nums[maxIndex];
}

If you’re actually looking for the leader element,
the element that occurs more than n / 2 times,
then an O(n)$O\left(n\right)$$O(n)$ solution is possible:

• track a candidate and its count
• increment the count when the same candidate is seen
• decrement the count when a different value is seen
• replace the candidate when the count is 0

Like this:

public int majorityElement(int[] nums) {
int candidate = 0;
int count = 0;
for (int num : nums) {
if (count == 0) {
count = 1;
candidate = num;
} else if (candidate == num) {
count++;
// as a further optimization, you can add a condition here to return early if the required count is reached
} else {
count--;
}
}
return candidate;
}

This works when the leader element is guaranteed to exist.
If it’s not guaranteed to exist,
then you need another O(n)$O\left(n\right)$$O(n)$ pass to verify the count of the candidate is indeed greater than half of the number of all elements.

Here’s my solution written in Groovy (for quick coding), which shouldn’t be very different from Java. The actual finding of majority element runs in linear time, so roughly speaking, the code runs in f(n)+O(n)$f\left(n\right)+O\left(n\right)$$f(n) + O(n)$, where f(n)$f\left(n\right)$$f(n)$ is the time complexity of the sorting algorithm. This solution assumes that it’s ok to write into the array, since there’s limited space anyway. However it uses 2 more variables than your original solution.

int findMajorityElement(int[] numbers) {
Arrays.sort(numbers)
int index = 0
int currentNumber = numbers[index]
int majorityElement = currentNumber
numbers[index] = 1
int maxCount = numbers[index]
for (index = index + 1; index < numbers.length; index++) {
if (currentNumber == numbers[index]) {
numbers[index] = numbers[index - 1] + 1
} else {
currentNumber = numbers[index]
numbers[index] = 1
}
if (numbers[index] > maxCount) {
maxCount = numbers[index]
majorityElement = currentNumber
}
}
return  majorityElement
}

What happens here is that, since the array can be sorted, each iteration will increment the count of each value by 1 as long as it is the same value. When it finds a different one, it starts again. While these are happening it checks if a new majority element candidate gets discovered.

As Pyscho Punch suggests, this is pretty simple if you can first sort. I’d add to his answer, that given the constraint of constant external space, you want to use a sort that obeys that constraint, e.g., Heapsort.

One the array is sorted, you just need to walk it fro start to end, finding the length of each run of of repeated values (they’ll be in a contiguous run, because you’ve sorted the array).

As each run length is found, it’s compared to the longest (maximum) run-length found so far, and if the newly found run length is longer, its length replaces that maximum, and its element value replaces the majority element.

In pseudo-code:

Heapsort(array)

maxRun <- 0
majorityElement <- None
index <- 0
runStart <- 0

while index < length(array):
while index < length(array) and array[index] == array[runStart]:
index <- index + 1

if index - runStart > maxRun:
maxRun <- index - runStart
majorityElement <- array[runStart]

runStart <- index

return majorityElement

What’s the space complexity? It’s constant (in the pseudo code above, it’s four variables), as long as Heapsort is correctly implemented as constant space.

Wha’s the time complexity? Heapsort is O(n log n) in the worst case, walking the array is O(n). So the time complexity is O(n log n + n) = O(n log n). The sort dominates the work done.