Count number of each char in a String

Posted on

Problem

I know there is a simpler way of doing this, but I just really can’t think of it right now. Can you please help me out?

String sample = "hello world";
char arraysample[] = sample.toCharArray();
int length = sample.length();

//count the number of each letter in the string
int acount = 0;
int bcount = 0;
int ccount = 0;
int dcount = 0;
int ecount = 0;
int fcount = 0;
int gcount = 0;
int hcount = 0;
int icount = 0;
int jcount = 0;
int kcount = 0;
int lcount = 0;
int mcount = 0;
int ncount = 0;
int ocount = 0;
int pcount = 0;
int qcount = 0;
int rcount = 0;
int scount = 0;
int tcount = 0;
int ucount = 0;
int vcount = 0;
int wcount = 0;
int xcount = 0;
int ycount = 0;
int zcount = 0; 

for(int i = 0; i < length; i++)
{
    char c = arraysample[i];
    switch (c) 
    {
        case 'a': 
            acount++;
            break;
        case 'b': 
            bcount++;
            break;
        case 'c': 
            ccount++;
            break;
        case 'd': 
            dcount++;
            break;
        case 'e':
            ecount++;
            break;
        case 'f':
            fcount++;
            break;
        case 'g':
            gcount++;
            break;
        case 'h':
            hcount++;
            break;
        case 'i':
            icount++;
            break;
        case 'j':
            jcount++;
            break;
        case 'k':
            kcount++;
            break;
        case 'l':
            lcount++;
            break;
        case 'm':
            mcount++;
            break;
        case 'n':
            ncount++;
            break;
        case 'o':
            ocount++;
            break;
        case 'p':
            pcount++;
            break;
        case 'q':
            qcount++;
            break;
        case 'r':
            rcount++;
            break;
        case 's':
            scount++;
            break;
        case 't':
            tcount++;
            break;
        case 'u':
            ucount++;
            break;
        case 'v':
            vcount++;
            break;
        case 'w':
            wcount++;
            break;
        case 'x':
            xcount++;
            break;
        case 'y':
            ycount++;
            break;
        case 'z':
            zcount++;
            break;
        }
}
System.out.println ("There are " +hcount+" h's in here ");
System.out.println ("There are " +ocount+" o's in here ");

Solution

Oh woah! xD It’s just.. woah! What patience you have to write all those variables.

Well, it’s Java so you can use a HashMap.

Write something like this:

String str = "Hello World";
int len = str.length();
Map<Character, Integer> numChars = new HashMap<Character, Integer>(Math.min(len, 26));

for (int i = 0; i < len; ++i)
{
    char charAt = str.charAt(i);

    if (!numChars.containsKey(charAt))
    {
        numChars.put(charAt, 1);
    }
    else
    {
        numChars.put(charAt, numChars.get(charAt) + 1);
    }
}

System.out.println(numChars);
  1. We do a for loop over all the string’s characters and save the current char in the charAt variable
  2. We check if our HashMap already has a charAt key inside it
    • If it’s true we will just get the current value and add one.. this means the string has already been found to have this char.
    • If it’s false (i.e. we never found a char like this in the string), we add a key with value 1 because we found a new char
  3. Stop! Our HashMap will contain all chars (keys) found and how many times it’s repeated (values)!

A possibly faster, and at least more compact version than using a HashMap is to use a good old integer array. A char can actually be typecasted to an int, which gives it’s ASCII code value.

String str = "Hello World";
int[] counts = new int[(int) Character.MAX_VALUE];
// If you are certain you will only have ASCII characters, I would use `new int[256]` instead

for (int i = 0; i < str.length(); i++) {
    char charAt = str.charAt(i);
    counts[(int) charAt]++;
}

System.out.println(Arrays.toString(counts));

As the above output is a bit big, by looping through the integer array you can output just the characters which actually occur:

for (int i = 0; i < counts.length; i++) {
    if (counts[i] > 0)
        System.out.println("Number of " + (char) i + ": " + counts[i]);
}

  1. Actually, there is an even better structure than maps and arrays for this kind of counting: Multisets. Documentation of Google Guava mentions a very similar case:

    The traditional Java idiom for e.g. counting how many times a word occurs in a document is something like:

    Map<String, Integer> counts = new HashMap<String, Integer>();
    for (String word : words) {
      Integer count = counts.get(word);
      if (count == null) {
        counts.put(word, 1);
      } else {
        counts.put(word, count + 1);
      }
    }
    

    This is awkward, prone to mistakes, and doesn’t support collecting a variety of useful statistics, like the total number of words. We can do better.

    With a multiset you can get rid of the contains (or if (get(c) != null)) calls, what you need to call is a simple add in every iteration. Calling add the first time adds a single occurrence of the given element.

    String input = "Hello world!";
    
    Multiset<Character> characterCount = HashMultiset.create();
    for (char c: input.toCharArray()) {
        characterCount.add(c);
    }
    for (Entry<Character> entry: characterCount.entrySet()) {
        System.out.println(entry.getElement() + ": " + entry.getCount());
    }
    

    (See also: Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK’s built-in libraries but I think the reasoning could be true for other libraries too.)

  2. int length = sample.length();
    ....
    for (int i = 0; i < length; i++) {
        char c = arraysample[i];
    

    You could replace these three lines with a foreach loop:

    for (char c: arraysample) {
    
  3. int length = sample.length();
    ....
    for (int i = 0; i < length; i++) {
        char c = arraysample[i];
    

    You don’t need the length variable, you could use sample.length() in the loop directly:

    for (int i = 0; i < sample.length(); i++) {
    

    The JVM is smart, it will optimize that for you.

  4. char arraysample[] = sample.toCharArray();
    int length = sample.length();
    for (int i = 0; i < length; i++) {
        char c = arraysample[i];
    

    It’s a little bit confusing that the loop iterating over arraysample but using sample.length() as the upper bound. Although their value is the same it would be cleaner to use arraysample.length as the upper bound.

Yes… there is a simpler way. You have two choices, but each about the same. Use an Array, or a Map. The more advanced way of doing this would certainly be with a Map.

Think about a map as a type of array where instead of using an integer to index the array you can use anything. In our case here we’ll use char as the index. Because chars are ints you could just use a simple array in this case, and just mentally think of ‘a’ as 0, but we’re going to take the larger step today.

String sample = "Hello World!";

// Initialization
Map <Character, Integer> counter = new HashMap<Character, Integer>();
for(int c = 'a'; c <= 'z'; c++){
    counter.put((Character)c, 0);
}

// Populate
for (int i = 0; i < sample.length(); i++){
    if(counter.containsKey(sample.charAt(i)))
        counter.put(sample.charAt(i), counter.get(sample.charAt(i)) + 1 );
}

Now anytime you want to know how many of whatever character there was just call this method

int getCount(Character character){
    if(counter.containsKey(character))
        return counter.get(character);
    else return 0;
}

Note: This only will work for counting punctuation.

The use of functional programming can even simplify this problem to a great extent.

public static void main(String[] args) {

    String myString = "hello world";

    System.out.println(
        myString
            .chars()
            .mapToObj(c -> String.valueOf((char) c))
            .filter(str -> !str.equals(" "))
            .collect(Collectors.groupingBy(ch -> ch, Collectors.counting()))

    );
}

First we get the stream of integers from the string

myString.chars()

Next we transform the integers into string

mapToObj(c -> String.valueOf((char) c))

Then we filter out the charcters we don’t need to consider, for example above we have filtered the spaces.

filter(str -> !str.equals(" "))

Then finally we collect them grouping by the characters and counting them

collect(Collectors.groupingBy(ch -> ch, Collectors.counting()))

Leave a Reply

Your email address will not be published. Required fields are marked *