Minimum window substring which includes all the characters from substring

Posted on

Problem

I recently came across this problem which is as follows:

Given a string S and a string T, find the minimum window in S which
will contain all the characters in T in complexity O(n).

For example, S = “ADOBECODEBANC” T = “ABC” Minimum window is “BANC”.

Note: If there is no such window in S that covers all characters in T,
return the emtpy string “”.

If there are multiple such windows, you are guaranteed that there will
always be only one unique minimum window in S.

My solution first finds the first window which contains all the characters in T.

It does it by first creating a lookup called toFind (array of int[256]) containing the counts for all the characters in T.

It maintains another lookup of int[256] called currentlyFound in which it stores the count of all the characters it has found so far.

While iterating through the characters of S, the algorithm keeps track of all the characters encountered so far, if all the characters of T have been encountered during the first scan, the algorithm will return the offsets for the first window and store it as the current minimum window.

The next steps will try to increment the ‘begin’ or ‘end’ until the next minimum window has been discovered.

The begin is incremented only if the number in currentlyFound for character at index begin is greater than the number of characters in toFind for index begin, since this way the invariant of the window having at least the total number of characters in T will not be disrupted.

The begin is also incremented when the character it points to is not contained in toFind.

If begin cannot be incremented, then end is incremented until it cannot be incremented anymore. While the end is incremented, the currrentlyFound for the char it points to is incremented.

This goes on until the offsets cannot be incremented anymore. Every time a window is found, the size of its offsets is compared to the current min. If it is less than the current min, it is updated, other wise the min window stays as it is.

After the iteration concludes, the current min is returned.

It will be great if somebody could take a look at my solution and let me know how I can improve it.

public class MinimumWindow{

    static class Pair{
        int start;
        int end;
        Pair(int start,int end){
            this.start = start;
            this.end = end;
        }
    }


    private static int[] populateToFind(String target){
        int[] toFind = new int[256];
        for(char c:target.toCharArray()){
            toFind[c] = toFind[c] + 1;
        }
        return toFind;
    }

    private static Pair findFirstWindow(String source,String target,int[] toFind,int[] currentlyFound){
        int startIdx = -1;
        for(int i=0;i<source.length();i++){
            if(toFind[source.charAt(i)] > 0){
                startIdx = i;
                break;
            }
        }
        if(startIdx == -1) return null;
        int foundCount = 0;
        int endIdx = startIdx;
        for(int i=startIdx;i<source.length();i++){
            if(foundCount == target.length()){
                break;
            }
            char currentChar = source.charAt(i);
            if(toFind[currentChar] > 0){
                if(currentlyFound[currentChar] < toFind[currentChar]){
                    foundCount++;
                }
                currentlyFound[currentChar] = currentlyFound[currentChar] + 1;
                endIdx = i;
            }
        }
        if(foundCount < target.length()){
            return null;
        }
        return new Pair(startIdx,endIdx);
    }

    private static boolean canMoveStart(int start,String source,int[] toFind,int[] currentlyFound){
        return ((toFind[source.charAt(start)]==0) 
        || (currentlyFound[source.charAt(start)] > toFind[source.charAt(start)]));
    }

    private static String minimumWindow(String source,String target){
        if(target.length() > source.length()){
            return "";
        }

        int[] toFind = populateToFind(target);
        int[] currentlyFound = new int[256];

        Pair w = findFirstWindow(source,target,toFind,currentlyFound);
        if(w == null){
            return "";
        }

        int start= w.start;
        int end = w.end;
        Pair min = w;
        while(start < end){
            if(canMoveStart(start,source,toFind,currentlyFound)){
                if(toFind[source.charAt(start)] > 0){
                    currentlyFound[source.charAt(start)] = currentlyFound[source.charAt(start)] - 1;
                }
                start++;
            }else{
                if(end < source.length()-1){
                    if(toFind[source.charAt(end+1)] > 0){
                        currentlyFound[source.charAt(end+1)] = currentlyFound[source.charAt(end+1)] + 1;
                    }
                    end++;
                } else {
                    break;
                }
            }
            if(Math.abs(end - start) < Math.abs(min.end - min.start)){
                    min = new Pair(start,end);
            }

        }
        return source.substring(min.start,min.end+1);
    }

    public static void main(String[] args){
        String source = "ADOBECODEBANC";
        String target = "ABC";
        System.out.println(minimumWindow(source,target));
    }
}

Solution

After seen this question, I had the feeling it could be better with more OO programming.
So I started to create mine own way, with more OO, less code and easier to read.
I was happy with mine result until I did benchmark it with your code.

Result of 1000000 iterations :

new method :
BAANC
Total execution time: 4527ms
original method :
BAANC
Total execution time: 480ms

So I’m really not going to speak that way, you are coding smart with primitives, what’s also the reason of the benchmark result explains.
There are a few points what could be improved.

At first sight, I didn’t understand why you create arrays of 256 size.
After a better look, I understand it but please use a constant for this and name it like MAX_ASCII_VALUE_CHARACTER.
While they do not speak over what characters the String could contain, I’m tending more to go to the original ASCII table and not the extended, so I would change it to 123(z) or 128(full).

Your naming of variable is pretty good, but sometimes you still name them bad.
Example : Pair w or Pair min.
While the min is already slightly better chosen, you could still name it like currentMinPair and the other one, you actually don’t need it.

You can assign the result directly to min, check min for null and read start and end from min pair.
The reason is that you don’t use w further in your code.

So final conclusion for me :

Pretty good and smart code, with some (very minor) points to improve.

API

How would one use this class?
Since it’s “just” a solution for a challenge,
it might be easy to say “who cares, nobody will use this”,
but in general you can always count on one user:
the unit tester (a.k.a the first user of a program).

Most methods are private, only main is public.
But main is not intended for testing,
and in any case it’s not very useful anyway because it doesn’t use its parameters and performs a simple hardcoded operation.

To make the class testable, simply make minimumWindow public.

Another odd element of the API is that Pair is not private.
It’s a minor thing,
but since you made all other methods private,
this one sticks out a bit.

Readability

The code is hard to read, in many ways.


Repeated long-ish expressions.
For example here:

private static boolean canMoveStart(int start,String source,int[] toFind,int[] currentlyFound){
    return ((toFind[source.charAt(start)]==0) 
    || (currentlyFound[source.charAt(start)] > toFind[source.charAt(start)]));
}

The expression source.charAt(start) is used 3 times there.
It’s long, and it makes the condition long too and hard to read.
By extracting it to a local variable,
you could make the boolean expression shorter and easy to read,
and less error prone, like this:

private static boolean canMoveStart(int start, String source, int[] toFind, int[] currentlyFound) {
    char startChar = source.charAt(start);
    return toFind[startChar] == 0 || currentlyFound[startChar] > toFind[startChar];
}

Another benefit of extracting source.charAt(start) to a variable is that you save on the processing logic in .charAt. That method is more than just a simple access into an array: it also checks for boundaries.
Doing that repeatedly is wasteful.

As a matter of fact, the caller of this method also does a .charAt.
And since this method only uses a single character in source,
you could just get source.charAt(start) in the caller,
and pass just that character to this method instead of the entire source.

private static boolean canMoveStart(char startChar, int[] toFind, int[] currentlyFound) {
    return toFind[startChar] == 0 || currentlyFound[startChar] > toFind[startChar];
}

This writing style is too compact and hard to read:

    for(int i=0;i<source.length();i++){

The recommended writing style is this:

    for (int i = 0; i < source.length(); i++) {

Instead of:

        toFind[c] = toFind[c] + 1;

It’s simpler and cleaner to use ++ (or the += operator):

        toFind[c]++;

Method decomposition

Most of the methods are very long.
They could be decomposed to smaller methods,
so that they become easier to understand.
More on that in the next section.

Take for example this piece of code in findFirstWindow:

int startIdx = -1;
for(int i=0;i<source.length();i++){
    if(toFind[source.charAt(i)] > 0){
        startIdx = i;
        break;
    }
}

This could be extracted to a method:

private static int findStartIndex(String source, int[] toFind) {
    for (int i = 0; i < source.length(); i++) {
        if (toFind[source.charAt(i)] > 0) {
            return i;
        }
    }
    return -1;
}

So that you can replace the earlier snippet in findFirstWindow with this single line:

int startIdx = findStartIndex(source, toFind);

Notice that method extraction has the added benefit of explaining what the snippet does, thanks to the descriptive method name.

When you start decomposing a large method to smaller pieces,
sometimes further, unexpected opportunities appear.
For example,
as I tried to extract the second part of findFirstWindow,
I arrived at this helper method:

private static int findIndex(String source, int[] toFind, int[] currentlyFound, int startIdx, int count) {
    for (int i = startIdx, foundCount = 0; i < source.length(); i++) {
        char currentChar = source.charAt(i);
        if (toFind[currentChar] > 0) {
            if (currentlyFound[currentChar] < toFind[currentChar]) {
                ++foundCount;
                if (foundCount == count) {
                    ++currentlyFound[currentChar];
                    return i;
                }
            }
            ++currentlyFound[currentChar];
        }
    }
    return -1;
}

The cool thing about this is that on closer look I realized that the previously extracted findStartIndex is no longer necessary.
This latter method can be reused for the same purpose,
reducing findFirstWindow to:

private static Pair findFirstWindow(String source, String target, int[] toFind, int[] currentlyFound) {
    int startIdx = findIndex(source, toFind, currentlyFound, 0, 1);
    if (startIdx == -1) {
        return null;
    }
    int endIdx = findIndex(source, toFind, currentlyFound, startIdx + 1, target.length() - 1);
    if (endIdx == -1) {
        return null;
    }
    return new Pair(startIdx, endIdx);
}

Unit testing

Before refactoring your code aggressively with confidence,
I added a bunch of unit tests to cover my back.
This is a good practice I recommend, especially when working with non-trivial code.
Here are some examples using JUnit4 to get you started:

public class MinimumWindowTest {

    private String solve(String s, String t) {
        return MinimumWindow.minimumWindow(s, t);
    }

    @Test
    public void test_ADOBECODEBANC_ABC() {
        assertEquals("BANC", solve("ADOBECODEBANC", "ABC"));
    }

    @Test
    public void test_a_aa() {
        assertEquals("", solve("a", "aa"));
    }

    @Test
    public void test_aaab_aab() {
        assertEquals("aab", solve("aaab", "aab"));
    }
}

Leave a Reply

Your email address will not be published. Required fields are marked *