Problem
the One of Unix core commands is yes
which simply prints infinity number of letter y
.
I read a blog post (https://matthias-endler.de/2017/yes/) about how people optimize it to output y with a speed of 10 GB/s. I get 7.5 GB/s on my machine using this command.
I tried to achieve similar results with Java but it stopped at 70 MB/s.
My code is:
public class yes {
static byte[] bb = new byte[]{'y','n'};
public static void main(String[] args) throws IOException {
BufferedOutputStream writer = new BufferedOutputStream(System.out);
for (;;) {
writer.write(bb);
}
}
}
Do you have any idea how to optimize it to get results similar to UNIX version?
Solution
Your program calls the write()
method of the BufferedOutputStream
with a two byte array. These two bytes are stored in the
stream’s buffer, and the buffer is written to standard output when it is
full.
On macOS 10.13 the fs_usage
tool shows that the buffer size is 8 kB, and on my MacBook the
java -cp . yes | pv > /dev/null
benchmark reports that the data is written with about 50 MB/s.
This can be improved by calling write()
with a larger byte array:
public class yes {
static final int BUFFERSIZE = 8 * 1024;
static byte[] bb = new byte[BUFFERSIZE];
public static void main(String[] args) throws IOException {
for (int i = 0; i < BUFFERSIZE; i += 2) {
bb[i] = 'y';
bb[i+1] = 'n';
}
BufferedOutputStream writer = new BufferedOutputStream(System.out);
for (;;) {
writer.write(bb);
}
}
}
Now only one write()
call is needed to write 8 kB data, instead
of 4096 calls. On my MacBook this increased the speed to
about 2 GB/s, that is a factor of more than 40.
This can possibly be further improved by choosing a larger buffer.
With a byte array larger that the stream buffer size, BufferedOutputStream
does
not buffer anymore and writes the data to stdout immediately. Therefore the
“lower-level” unbuffered FileOutputStream
would be sufficient:
FileOutputStream writer = new FileOutputStream(FileDescriptor.out);
for (;;) {
writer.write(bb);
}
However, I could not observe a difference in the performance.