Find a prefix of a query string in the values of a Map

Posted on

Problem

I have a hash map which maps to some strings which serve as prefixes and are of small length (max length is 6):

Map<String, String> map = new HashMap<>();  
map.put("codeA", "100");  
map.put("codeB", "7");  
map.put("codeC", "0012");  
etc  

This is fine so far, but I also need when provided an input string to actually break it into 2 parts if the string has a prefix that matches one of the values in my map.

What I do is:

boolean found = false;  

String [] result;  
for(Entry<String, String> e: map.entrySet()) {  
   String code = e.getKey();  
   String value = e.getValue();  
   if(value.length >= inputString.length) continue;    
   if(inputString.startsWith(value)) {  
       result = new String[2];  
       result[0] = value;  
       result[1] = inputString.substring(value.length + 1);  
       found = true;  
       break;  
     }  
}    
return result;

Could this be improved? Could I have been using some additional datastructure/API for this?

I am interested in an approach in Java 7 without any extra libs. The HashMap has ~400 entries and the input string 8-11 characters.

I would need a way to get the prefix having the code (hence the HashMap) and break the input string into the prefix and the rest part.

Solution

Since you don’t care about the keys and just want to match the values, you can use a NavigableSet (such as a TreeSet) of values:

NavigableSet<String> prefixes = new TreeSet<>(map.values());

String prefix = prefixes.floor(inputString);

if (prefix != null && inputString.startsWith(prefix)) {
    return new String[] {prefix, inputString.substring(prefix.length())};
} else {
    return null;  // or whatever you want to return if there's no match
}    

Trie

A trie would solve this problem perfectly. With a trie, you could search all your prefixes in O(n)O(n) time, where nn is the length of the input string. You current implementation requires O(mn)O(mn) time, where mm in the number of prefixes and nn is the length of the input string.

Of course, the trie solution will be much more complex than your existing solution, so you will need to weigh the performance benefits of using a trie versus the simplicity of using a HashMap.

Misha’s answer is on the right track – as far as i know, a NavigableSet is the best tool in the standard library. A trie would be perfect, but there isn’t one!

However, it’s not enough to just look for the lexicographically closest prefix, because there might be unrelated prefixes in between the true prefix and the search string. Consider the set of prefixes “pot” and “potash”, and the input string “potato”.

Instead, you have to find the closest prefix, and then walk backwards through the set of prefixes until you either find a match, or find something that can’t possibly be a prefix. I think this should do it:

private String[] search(NavigableSet<String> prefixes, String inputString) {
    Iterator<String> it = prefixes.headSet(inputString, true).descendingIterator();
    while (it.hasNext()) {
        String prefix = it.next();
        if (inputString.startsWith(prefix)) return new String[]{prefix, inputString.substring(prefix.length())};
        else if (prefix.charAt(0) != inputString.charAt(0)) return null;
    }
    return null;
}

Although i have not tested this thoroughly.

Leave a Reply

Your email address will not be published. Required fields are marked *