Problem
This is almost exercise 3.2.14. from the book Computer Science An Interdisciplinary Approach by Sedgewick & Wayne (since I am self-studying, I changed it a little bit):
Develop a version of Histogram that uses StdDraw, so that a client can
create multiple histograms. Use a test client that creates histograms
for flipping coins (Bernoulli trials) with a biased coin that is heads
with probability p, for p = 0.2, 0.4, 0.6. and 0.8, taking the number
of trials from the command line.
Here are my programs:
public class Histogram {
private final double[] data;
private final double max;
public Histogram(double[] data, double max) {
this.data = data;
this.max = max;
StdDraw.setXscale(0, data.length);
StdDraw.setYscale(0, max * 3);
}
public double[] getData() {
return data;
}
public int findMax() {
double max = 0;
int dataLength = data.length;
for (int i = 0; i < dataLength; i++) {
max = Math.max(max, data[i]);
}
return (int) max;
}
public void addData(int index) {
data[index]++;
}
public void draw(double xIncrement, double yIncrement) {
StdDraw.enableDoubleBuffering();
StdDraw.setPenColor(StdDraw.BOOK_BLUE);
for (int i = 0; i < data.length; i++) {
StdDraw.filledRectangle(i + 0.5 + xIncrement * data.length, yIncrement * data.length + data[i] / 2, 0.45, data[i] / 2);
StdDraw.show();
}
StdDraw.setPenColor(StdDraw.RED);
StdDraw.line(data.length + xIncrement * data.length + 0.005, 0,
data.length + xIncrement * data.length + 0.025, max * 3);
}
public static void main(String[] args) {
int trials = Integer.parseInt(args[0]);
double[] diceData = new double[6];
Histogram histogram = new Histogram(diceData, (trials / 6) * 2);
StdDraw.setPenColor(StdDraw.BOOK_BLUE);
for (int t = 1; t <= trials; t++) {
double r = Math.random();
if (r < 1.0 / 6.0) histogram.addData(0);
else if (r < 2.0 / 6.0) histogram.addData(1);
else if (r < 3.0 / 6.0) histogram.addData(2);
else if (r < 4.0 / 6.0) histogram.addData(3);
else if (r < 5.0 / 6.0) histogram.addData(4);
else if (r < 6.0 / 6.0) histogram.addData(5);
histogram.draw(0, 0);
}
}
}
public class Histograms {
private final Histogram[] histograms;
private final double max;
public Histograms(Histogram[] histograms, double max) {
this.histograms = histograms;
this.max = max;
StdDraw.setXscale(0, histograms[0].getData().length * histograms.length);
StdDraw.setYscale(0, max);
}
public void draw() {
int rows = histograms.length;
int columns = histograms.length;
for (int i = 0; i < columns; i++) {
if (rows % columns == 0) {
rows = rows / columns;
break;
} else {
rows++;
}
}
int m = 0;
for (int c = 0; c < columns; c++) {
for (int r = 0; r < rows; r++) {
histograms[m].draw(c, r);
m++;
}
}
}
public static void main(String[] args) {
int trials = Integer.parseInt(args[0]);
double max = trials;
double[] probabilities = {
0.2,
0.4,
0.6,
0.8
};
double[][] diceData = new double[4][2];
Histogram[] histograms = new Histogram[4];
for (int i = 0; i < 4; i++) {
histograms[i] = new Histogram(diceData[i], max);
}
for (int t = 1; t <= trials; t++) {
if (Math.random() < probabilities[0]) histograms[0].addData(0);
else histograms[0].addData(1);
if (Math.random() < probabilities[1]) histograms[1].addData(0);
else histograms[1].addData(1);
if (Math.random() < probabilities[2]) histograms[2].addData(0);
else histograms[2].addData(1);
if (Math.random() < probabilities[3]) histograms[3].addData(0);
else histograms[3].addData(1);
Histograms multipleHistograms = new Histograms(histograms, max);
multipleHistograms.draw();
StdDraw.pause(20);
}
}
}
I wanted Histogram
to work also independently of Histograms
and so I was forced to inject some redundancy into these two programs (for example they both use scaling but scaling within Histograms
overrides the scaling within Histogram
).
Here is one instance of Histogram
:
Input: 100
Output:
Here is one instance of Histograms
:
Input: 100
Output:
Is there any way that I can improve my programs?
Thanks for your attention.
Solution
A few remarks, mainly focussing on the Histogram
class.
Separation of concerns
Your application contains different parts: algorithmic parts (collecting histogram data), user interface parts (draw()
methods), and main()
methods.
You already separate these concerns into different methods, which is a good thing. But I recommend to go one step further, and to have different classes, e.g.:
- Histogram (for the algorithmic part),
- StdDrawHistogram (for presenting a histogram using
StdDraw
), - HistogramApp (containing the
main()
method, wiring the parts together).
Histogram class API
I’d change the public API of Histogram a bit:
You’re using a double[]
array for counting. But you never do fractional counts, you always just add an integer 1
to the buckets. So, you should change that to int[]
(or long[]
in the unlikely case you expect more than two billion counts).
Your constructor public Histogram(double[] data, double max)
forces the caller to prepare a double[]
array of the appropriate dimension. This comes as a surprise to the caller. It should be enough to tell the Histogram
the number of buckets, and setting up the counting structure (your array) should be the responsibility of the Histogram constructor.
Encapsulation
You typically want to hide the internals (that you’re using an array instead of some other fancy data structure) from your callers. This way you’re free to change the internals later without impact on code using your class. And you make it impossible for external code to fiddle around with the internals, bypassing the API of your class. E.g. currently, somewhere in your main()
method you can write things like diceData[2] = 7;
, and then the histogram data on that bucket gets overwritten. The only way of changing the data of your histogram should be by calling its methods.
Following this argument, I’d also replace the public double[] getData()
method with two methods:
public int getNumberOfBuckets() {
return data.length;
}
public int getCountInBucket(int i) {
return data[i];
}
You also force the constructor’s caller to provide a max
value that can’t be known at that time, while the histogram is still empty. Maybe the name is misleading, and you don’t mean the maximum count in a bucket, but the drawing height. Then it doesn’t belong in the algorithmic part, but into the drawing part.
Readability
Code is typically read much more often than written, and so it’s important that you can immediately understand what’s going on.
With the example of this max
field, I strongly recommend to document the meaning of your code elements. At least to me, it’s unclear whether max is intended to be the maximum count found in the histogram, or a space to be reserved when drawing the histogram, or something else. Writing Javadoc comments for fields and methods helps you avoid such ambiguities. Especially if you read your code some months after the initial creation, you’ll be glad you documented it. Having to express your concepts in writing forces you to have the concepts clearly settled in your mind.
Another means of helping readability is the naming of code elements. You already did quite a good job there, only the name max
is a little unclear, as discussed earlier.
The max field
You have a field named max
. It’s only used inside the constructor, to set the scale for drawing, and not accessed anywhere else. So, holding it as information in the instance for all the lifetime of the instance after construction is useless, and you can eliminate the private final double max;
field.
In the useful findMax()
method, you have a local variable max
collecting the algorithmic information “highest count in any bucket”. Having the same name as the field max
confuses the reader. I first didn’t notice that these were two different things. I’d recommend not to re-use field names as local variables.