Finding the longest unique sub-string in a string

Posted on

Problem

I recently interviewed with a company for software engineering position. I was asked the question of finding the longest unique sub-string in a string.

My algorithms was as follows:

Start from the left-most character, and keep storing the character in a hash table with the key as the character and the value as the index_where_it_last_occurred. Add the character to the answer string as long as its not present in the hash table. If we encounter a stored character again, I stop and note down the length. I empty the hash table and then start again from the right index of the repeated character. The right index is retrieved from the (index_where_it_last_occurred) flag. If I ever reach the end of the string, I stop and return the longest length.

For example, say the string was abcdecfg.

I start with a, store in hash table. I store b and so on till e. Their indexes are stored as well. When I encounter c again, I stop since it’s already hashed and note down the length which is 5. I empty the hash table, and start again from the right index of the repeated character. The repeated character being c, I start again from the position 3 ie., the character d. I keep doing this while I don’t reach the end of string.

I know there could be better algorithms than this. But I am interested in knowing what the time complexity of this algorithm will be. I believe it’ll be O(n2)O(n2).

import java.util.*;
public class longest
{
    static int longest_length = -1;
    public static void main(String[] args) 
    {
        Scanner in = new Scanner(System.in);
        String str = in.nextLine();
        calc(str,0);    
        System.out.println(longest_length);
    }

    public static void calc(String str,int index)
    {
        if(index >= str.length()) return;
        int temp_length = 0;
        LinkedHashMap<Character,Integer> map = new LinkedHashMap<Character,Integer>();
        for (int i = index; i<str.length();i++) 
        {
            if(!map.containsKey(str.charAt(i)))
            {
                map.put(str.charAt(i),i);
                ++temp_length;
            }
            else if(map.containsKey(str.charAt(i)))
            {
                if(longest_length < temp_length)
                {
                    longest_length = temp_length;
                }
                int last_index = map.get(str.charAt(i));
//              System.out.println(last_index); 
                calc(str,last_index+1);
                break;
            }
        }
        if(longest_length < temp_length)
            longest_length = temp_length;
    }
}

Solution

static int longest_length = -1;

No. Static is useless here and makes you method much less usable. A non-static variable would be better, but actually, why not return a result???

There’s no longest_length in Java. Maybe longestLength.

Better, return the substring, the length can be easily obtained from it.

Separated printing and computation, good.

if(index >= str.length()) return;

Spacing (after if). Also according to conventions, you should always use braces (I personally don’t).

LinkedHashMap<Character,Integer> map = new LinkedHashMap<Character,Integer>();

Use Java 7 diamonds or Guava’s Maps.newLinkedHashMap() to save yourself repeated generics.

You need no linked map.

calc(str,last_index+1);

I find the recursion confusing here. You really do nothing, but a simple loop, so use a loop (Even a goto would be clearer than recursion here).


The algorithm indeed runs in O(n*n), which can be shown with a string like

abcde.... abcde....

where you always run through half of the string. I was assuming an unlimited alphabet here, more precisely the complexity is O(n*alphabetSize) as no runs can be longer than alphabetSize.


A much better O(n) algorithm would keep track of the starting and ending positions. Wherever you see a duplicate, advance the start. Sounds damn simple.

Leave a Reply

Your email address will not be published. Required fields are marked *