# Modified Coupon Collector’s Problem in Java

Posted on

Problem

What is Coupon Collector’s Problem?

In probability theory, the coupon collector’s problem describes the “collect all coupons and win” contests. It asks the following question: Suppose that there is an urn of n different coupons, from which coupons are being collected, equally likely, with replacement. What is the probability that more than t sample trials are needed to collect all n coupons? An alternative statement is: Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once? from here.

What is my Modified Coupon Collector’s Problem?

Instead of “at least once”, I check for “at least 2 times”.

I loop for a crore times. I should get 14.7,24.134$14.7,24.134$$14.7,24.134$. My code returns 14.760683$14.760683$$14.760683$ and 23.83069$23.83069$$23.83069$. Fairly correct. Maybe Java is not so randomly accurate?

Here’s is my code:

package test;

import java.util.ArrayList;
import java.util.Random;

public class CouponCollector {
private static final int mMaxCouponNumber = 6;
private static final int mMinCouponNumber = 1;
private static final int EXPERIMENT_REPETITIONS = 10000000;
private static final int EACH_COUPONS_ATLEAST_OCCURENCE = 2;

public static void main(String args[]) {
float mSumOfDraws = 0;
for (int i = 1; i <= EXPERIMENT_REPETITIONS; i++) {
ArrayList<Integer> mCouponList = new ArrayList<Integer>();
int mDrawsRequired = 0;
do {
.nextInt((mMaxCouponNumber - mMinCouponNumber) + 1)
+ mMinCouponNumber);
// mCouponList.get(mDrawsRequired));
++mDrawsRequired;
if (allCouponsCollected(mCouponList)) {
break;
}
} while (true);
if(i%10000==0){
// Enable this if you are too impatient! :D
//DEBUG System.out.println("Process: "+(((float)i)*100/EXPERIMENT_REPETITIONS)+"% complete.");
}
mSumOfDraws += mDrawsRequired;
// DEBUG System.out.println("Draws required in " + i
// + " st/nd/rd/th trial is " + mDrawsRequired);
}
System.out.println("Average value of draws required is "
+ (mSumOfDraws / EXPERIMENT_REPETITIONS));
}

private static boolean allCouponsCollected(ArrayList<Integer> pCouponList) {
int[] mCouponCount = new int[mMaxCouponNumber - mMinCouponNumber + 1];
for (int i = 0; i < pCouponList.size(); i++) {
++mCouponCount[pCouponList.get(i) - 1];
// DEBUG System.out.println("Count of " + pCouponList.get(i) +
// " is " + mCouponCount[pCouponList.get(i) - 1]);
}
for (int i = 0; i < mCouponCount.length; i++) {
if (mCouponCount[i] < EACH_COUPONS_ATLEAST_OCCURENCE) {
return false;
}
}
return true;
}
}


Solution

The deviation you see from the expected results is probably in large part due to the poor use of Random. When I time your code, I get:

Average value of draws required is 23.832188
Complete in 23473.917s


If I use Random the way it should be (re)used, and convert this code:

        mCouponList.add(new Random()
.nextInt((mMaxCouponNumber - mMinCouponNumber) + 1)
+ mMinCouponNumber);


to be:

private static final Random RANDOM = new Random();

....

.nextInt((mMaxCouponNumber - mMinCouponNumber) + 1)
+ mMinCouponNumber);


it completes in half the time:

Average value of draws required is 23.827381
Complete in 11552.228s


Hmm.. I was expecting this to correct the errors in the ‘randomness’ too, but, now I see the problem there…. you are using float, and not double. Float has relatively low accuracy. If I change this line here:

float mSumOfDraws = 0;


to be:

double mSumOfDraws = 0;


My results become:

Average value of draws required is 24.1312123


That’s much closer to the mathematical expectation.

Now, about performance… your code is really slow. There’s four prime reasons:

1. there’s no need to record every draw you make in an ArrayList – it will grow large, and require garbage collection. You only need to count how many times you ‘pull’ each coupon.
2. there is no need to number the coupons from minimum to maximum – you just need to know how many there are.
3. You use Integer values instead of int primitives.
4. You have too much code in your main method – the main method is hard for the compiler to optimize – it is only ever called once.

Additionally, apart from the performance, the code style is… unconventional. You should parameterize a lot of the static values so that you can reuse the code better too.

Here is the results of my simplified code:

Average for 2 sets of 6 coupons is 24.131
Complete in 3.983s


(compared to your results with the improved Random, and double):

Average value of draws required is 24.1291875
Complete in 11.109s


The code I use is:

import java.util.Arrays;
import java.util.Random;

public class CouponClipper {

private final int[] counts;
private int remaining;

public CouponClipper(int couponCount, int sets) {
this.counts = new int[couponCount];
Arrays.fill(counts, sets);
this.remaining = couponCount;
}

public boolean pull(int coupon) {
if (--counts[coupon] == 0) {
remaining--;
}
return remaining == 0;
}

private static final Random RANDOM = new Random();

public static int countRequired(int coupons, int sets) {
CouponClipper clipper = new CouponClipper(coupons, sets);
int count = 0;
do {
count++;
} while (!clipper.pull(RANDOM.nextInt(coupons)));
return count;
}

public static double averageCounts(int coupons, int sets, int iterations) {
long sum = 0;
for (int i = 0; i < iterations; i++) {
sum += countRequired(coupons, sets);
}
return sum / (double)iterations;
}

public static void main(String[] args) {
long nanos = System.nanoTime();
System.out.printf("Average for 2 sets of 6 coupons is %.3fn", averageCounts(6, 2, 10000000));
nanos = System.nanoTime() - nanos;
System.out.printf("Complete in %.3fsn", nanos / 1000000000.0);

}

}