Finding count of character occurrences in a string

Posted on

Problem

I am taking an introductory course in java in high school, however I have some experience in python. I was tasked to find the number of times a character occurs in a string so I did this.

``````String a = "hello";
String letter = "l";
int occurrence = a.length() - a.replace(letter, "").length();
``````

Happy that I came up with a clever one-liner, I showed it to my teacher and she said that it’s more elegant to use a `for` loop. What does she mean by that? How do we define elegance in code? Is it subjective? As an aside is this more efficient than a `for` loop or vice-versa?

Solution

To explain what’s going on here, perhaps a real world analogy would help.

You have a book, and have to count the number of times “e” appears in the book. Your solution is to copy the contents of the book into another book, letter by letter. Then you go through your copy of the book, and leave out each “e”. Next you count the number of letters in each book, and the difference is the number of times “e” appeared. That works, but it uses up an awful lot of paper and an awfully long time!

To some extent you’re right that “elegance” is a subjective concept.
As a rule of thumb, elegant code generally starts from thinking about the problem and understanding the fundamental thing that you are trying to do. It then does that, as straightforwardly (in terms of being understandable to humans) as it can and as efficiently (in terms of minimizing the time, memory, and other resources) as it can. (The subjectivity comes when deciding which of those considerations is more important)

One thing that is nice about your solution is that you’re willing to use features from the Java standard library. A lot of student programmers are determined to write everything from scratch in terms of for loops and if statements. Using the inbuilt library is an excellent habit to get into. It just so happens that in this case the library functionality you’re using is quite expensive for the problem that you’re trying to solve, and is not as straightforward that it’s counting characters as a loop, an if, and a tally would have been.

I am not sure I agree that a `for` loop is more elegant per se. The downside of your method compared to a traditional `for` loop is that it requires a bit more processing most of the time, because you need to create a new character buffer and put the replacements in there, and you create a new `String` object.

I would probably create an initial implementation with a Java 8 `Stream`, e.g.:

``````String string = "hello";
char letter = 'l';
long count = string.chars().filter(character -> character == letter).count();
``````

To me this is probably more elegant than a traditional `for` loop. With `Stream`s it is possible to write it in a declarative way. In a lot of cases they are also trivial to parallelize, requiring only a `parallel` function call on the `Stream`. Furthermore, they are lazy which means the processing can stop early, only the necessary data will be processed. Note that in this case, we have to process each element (each character), so it makes no difference here.

You’re lucky that the task is to count the occurrences of a single character. Your code instead looks as if it would count the occurrences of a string.

The code, as it is currently, only works for strings of length 1. Curiously, this excludes emojis:

``````String text = "hello  ";
String chr = " ";
int occurrence = text.length() - text.replace(chr, "").length();
``````

This code counts 14, but there are only 7 emojis. Therefore you have to divide by the length of the searched string:

``````int occurrence = (text.length() - text.replace(chr, "").length()) / chr.length();
``````

An emoji looks like a single character, but it occupies two `char` places in a string, for historical reasons. Java stores strings in the UTF-16 encoding. There’s much more to learn about this topic, and it’s a good idea to become knowledgeable in this area.

Except from this issue, your code is clever and gets its job done. The `for` loop is more complicated to read but probably a bit faster. Anyway, you should always write automated tests for your code, and these tests will be a lot simpler than your clever one-liner. Therefore, having this cleverness is ok.

``````public class OccurrenceTest {

@Test
public void testEmoji() {
String text = "hello  ";
String chr = " ";
int occurrence = text.length() - text.replace(chr, "").length();

assertEquals(7, occurrence); // FIXME: currently it is 14 since the emoji is 2 chars long
}
}
``````

Understanding parameters

You’re lucky that the task is to count the occurrences of a single character. Your code instead looks as if it would count the occurrences of a string.

I think that’s a good start to understanding why defining a method with the right parameters is important. The original task is to count character occurrences, so it’s likely the method should look something like this in Java:

``````public static int countOccurrence(String input, char value) { /* ... */ }
// to call
countOccurrence("hello", 'l');
``````

In that case, your implementation will then change slightly to become:

``````public static int countOccurrence(String input, char value) {
return input.length() - input.replace(String.valueOf(value), "").length();
}
``````

By making it clear what your parameters are, we can now care about how to explain your logic.

Explaining logic vs implementation

Similar to what was pointed out in @Josiah‘s answer, have a think about how you will explain your logic, and then reflect if that was as elegant as saying “loop through each character and count the number of matches for my ‘value’ character”.

It’s easy to commingle short code as elegant code, but as you have to deal with more complex logic in larger code bases, you will learn that you have to care about how it can be explained rather than how short it reads.

It is not as elegant to understand a simple ‘count-the-occurrence’ method as having to get the length of the input string, and subtract that with a copy that does not have the matching ‘value’ character.

Performance

To be frank, you really should not have to care about performance for a simple method as this, as that reeks of micro-optimization. However, if you are interested in comparing your solution and a simple `for` loop, you can roughly think in terms of the memory required by your code as it goes along.

Your solution already knows about its input, and it needs to create a copy of that input, albeit shorter. A `for` loop requires an additional `int` to count. It doesn’t require making that extra copy.

On the other hand, the stream-able way has more ‘plumbing’ underneath, but its advantage come in terms of a more declarative way of approach, instead of the imperative approaches we traditionally have.

When reading through code one shouldn’t be surprised about how something is implemented. The best pieces of code are those that make you think “well ofcourse it’s written like this”.

When we look at the assignement:

count how many letters of the input string are equal to the wanted character

which of these solutions sounds least surprising:

loop through each character in the input string and increment the counter if it’s equal to the wanted character.

or

remove all characters equal to the wanted character from the original string. Substract the length of the remaining string from the original strings lenght.

or to include the stream option of Koekje

go through all the characters of the input string and keep only those characters equal to the wanted character. Count how many are left.

I personally prefer the really simple for loop since it reads almost like the assignment and I can understand how other people might prefer the stream solution. Your solution on the other hand takes an extra second or 2 to understand what it does exactly.

This is most likely also what your teacher means when she said the for loop solution is more “elegant”.

Being clever when writing a piece of code doesn’t help someone else (or you in a couple of years) to understand that piece of code when reading it. If you want some extreme examples of this, look at some codegolf solutions. I doubt any of the authers of those answers would be able to tell what it does when they read those extremely clever oneliners again after a couple of years.