Problem
I recently came across this problem which is as follows:
Given a string S and a string T, find the minimum window in S which
will contain all the characters in T in complexity O(n).For example, S = “ADOBECODEBANC” T = “ABC” Minimum window is “BANC”.
Note: If there is no such window in S that covers all characters in T,
return the emtpy string “”.If there are multiple such windows, you are guaranteed that there will
always be only one unique minimum window in S.
My solution first finds the first window which contains all the characters in T
.
It does it by first creating a lookup called toFind
(array of int[256]
) containing the counts for all the characters in T
.
It maintains another lookup of int[256]
called currentlyFound
in which it stores the count of all the characters it has found so far.
While iterating through the characters of S
, the algorithm keeps track of all the characters encountered so far, if all the characters of T
have been encountered during the first scan, the algorithm will return the offsets for the first window and store it as the current minimum window.
The next steps will try to increment the ‘begin’ or ‘end’ until the next minimum window has been discovered.
The begin
is incremented only if the number in currentlyFound
for character at index begin
is greater than the number of characters in toFind
for index begin
, since this way the invariant of the window having at least the total number of characters in T
will not be disrupted.
The begin
is also incremented when the character it points to is not contained in toFind
.
If begin
cannot be incremented, then end
is incremented until it cannot be incremented anymore. While the end
is incremented, the currrentlyFound
for the char
it points to is incremented.
This goes on until the offsets cannot be incremented anymore. Every time a window is found, the size of its offsets is compared to the current min. If it is less than the current min, it is updated, other wise the min window stays as it is.
After the iteration concludes, the current min is returned.
It will be great if somebody could take a look at my solution and let me know how I can improve it.
public class MinimumWindow{
static class Pair{
int start;
int end;
Pair(int start,int end){
this.start = start;
this.end = end;
}
}
private static int[] populateToFind(String target){
int[] toFind = new int[256];
for(char c:target.toCharArray()){
toFind[c] = toFind[c] + 1;
}
return toFind;
}
private static Pair findFirstWindow(String source,String target,int[] toFind,int[] currentlyFound){
int startIdx = -1;
for(int i=0;i<source.length();i++){
if(toFind[source.charAt(i)] > 0){
startIdx = i;
break;
}
}
if(startIdx == -1) return null;
int foundCount = 0;
int endIdx = startIdx;
for(int i=startIdx;i<source.length();i++){
if(foundCount == target.length()){
break;
}
char currentChar = source.charAt(i);
if(toFind[currentChar] > 0){
if(currentlyFound[currentChar] < toFind[currentChar]){
foundCount++;
}
currentlyFound[currentChar] = currentlyFound[currentChar] + 1;
endIdx = i;
}
}
if(foundCount < target.length()){
return null;
}
return new Pair(startIdx,endIdx);
}
private static boolean canMoveStart(int start,String source,int[] toFind,int[] currentlyFound){
return ((toFind[source.charAt(start)]==0)
|| (currentlyFound[source.charAt(start)] > toFind[source.charAt(start)]));
}
private static String minimumWindow(String source,String target){
if(target.length() > source.length()){
return "";
}
int[] toFind = populateToFind(target);
int[] currentlyFound = new int[256];
Pair w = findFirstWindow(source,target,toFind,currentlyFound);
if(w == null){
return "";
}
int start= w.start;
int end = w.end;
Pair min = w;
while(start < end){
if(canMoveStart(start,source,toFind,currentlyFound)){
if(toFind[source.charAt(start)] > 0){
currentlyFound[source.charAt(start)] = currentlyFound[source.charAt(start)] - 1;
}
start++;
}else{
if(end < source.length()-1){
if(toFind[source.charAt(end+1)] > 0){
currentlyFound[source.charAt(end+1)] = currentlyFound[source.charAt(end+1)] + 1;
}
end++;
} else {
break;
}
}
if(Math.abs(end - start) < Math.abs(min.end - min.start)){
min = new Pair(start,end);
}
}
return source.substring(min.start,min.end+1);
}
public static void main(String[] args){
String source = "ADOBECODEBANC";
String target = "ABC";
System.out.println(minimumWindow(source,target));
}
}
Solution
After seen this question, I had the feeling it could be better with more OO programming.
So I started to create mine own way, with more OO, less code and easier to read.
I was happy with mine result until I did benchmark it with your code.
Result of 1000000 iterations :
new method :
BAANC
Total execution time: 4527ms
original method :
BAANC
Total execution time: 480ms
So I’m really not going to speak that way, you are coding smart with primitives, what’s also the reason of the benchmark result explains.
There are a few points what could be improved.
At first sight, I didn’t understand why you create arrays of 256 size.
After a better look, I understand it but please use a constant for this and name it like MAX_ASCII_VALUE_CHARACTER.
While they do not speak over what characters the String
could contain, I’m tending more to go to the original ASCII table and not the extended, so I would change it to 123(z) or 128(full).
Your naming of variable is pretty good, but sometimes you still name them bad.
Example : Pair w
or Pair min
.
While the min is already slightly better chosen, you could still name it like currentMinPair
and the other one, you actually don’t need it.
You can assign the result directly to min
, check min for null
and read start
and end
from min pair.
The reason is that you don’t use w
further in your code.
So final conclusion for me :
Pretty good and smart code, with some (very minor) points to improve.
API
How would one use this class?
Since it’s “just” a solution for a challenge,
it might be easy to say “who cares, nobody will use this”,
but in general you can always count on one user:
the unit tester (a.k.a the first user of a program).
Most methods are private, only main
is public.
But main
is not intended for testing,
and in any case it’s not very useful anyway because it doesn’t use its parameters and performs a simple hardcoded operation.
To make the class testable, simply make minimumWindow
public.
Another odd element of the API is that Pair
is not private.
It’s a minor thing,
but since you made all other methods private,
this one sticks out a bit.
Readability
The code is hard to read, in many ways.
Repeated long-ish expressions.
For example here:
private static boolean canMoveStart(int start,String source,int[] toFind,int[] currentlyFound){
return ((toFind[source.charAt(start)]==0)
|| (currentlyFound[source.charAt(start)] > toFind[source.charAt(start)]));
}
The expression source.charAt(start)
is used 3 times there.
It’s long, and it makes the condition long too and hard to read.
By extracting it to a local variable,
you could make the boolean expression shorter and easy to read,
and less error prone, like this:
private static boolean canMoveStart(int start, String source, int[] toFind, int[] currentlyFound) {
char startChar = source.charAt(start);
return toFind[startChar] == 0 || currentlyFound[startChar] > toFind[startChar];
}
Another benefit of extracting source.charAt(start)
to a variable is that you save on the processing logic in .charAt
. That method is more than just a simple access into an array: it also checks for boundaries.
Doing that repeatedly is wasteful.
As a matter of fact, the caller of this method also does a .charAt
.
And since this method only uses a single character in source
,
you could just get source.charAt(start)
in the caller,
and pass just that character to this method instead of the entire source
.
private static boolean canMoveStart(char startChar, int[] toFind, int[] currentlyFound) {
return toFind[startChar] == 0 || currentlyFound[startChar] > toFind[startChar];
}
This writing style is too compact and hard to read:
for(int i=0;i<source.length();i++){
The recommended writing style is this:
for (int i = 0; i < source.length(); i++) {
Instead of:
toFind[c] = toFind[c] + 1;
It’s simpler and cleaner to use ++
(or the +=
operator):
toFind[c]++;
Method decomposition
Most of the methods are very long.
They could be decomposed to smaller methods,
so that they become easier to understand.
More on that in the next section.
Take for example this piece of code in findFirstWindow
:
int startIdx = -1;
for(int i=0;i<source.length();i++){
if(toFind[source.charAt(i)] > 0){
startIdx = i;
break;
}
}
This could be extracted to a method:
private static int findStartIndex(String source, int[] toFind) {
for (int i = 0; i < source.length(); i++) {
if (toFind[source.charAt(i)] > 0) {
return i;
}
}
return -1;
}
So that you can replace the earlier snippet in findFirstWindow
with this single line:
int startIdx = findStartIndex(source, toFind);
Notice that method extraction has the added benefit of explaining what the snippet does, thanks to the descriptive method name.
When you start decomposing a large method to smaller pieces,
sometimes further, unexpected opportunities appear.
For example,
as I tried to extract the second part of findFirstWindow
,
I arrived at this helper method:
private static int findIndex(String source, int[] toFind, int[] currentlyFound, int startIdx, int count) {
for (int i = startIdx, foundCount = 0; i < source.length(); i++) {
char currentChar = source.charAt(i);
if (toFind[currentChar] > 0) {
if (currentlyFound[currentChar] < toFind[currentChar]) {
++foundCount;
if (foundCount == count) {
++currentlyFound[currentChar];
return i;
}
}
++currentlyFound[currentChar];
}
}
return -1;
}
The cool thing about this is that on closer look I realized that the previously extracted findStartIndex
is no longer necessary.
This latter method can be reused for the same purpose,
reducing findFirstWindow
to:
private static Pair findFirstWindow(String source, String target, int[] toFind, int[] currentlyFound) {
int startIdx = findIndex(source, toFind, currentlyFound, 0, 1);
if (startIdx == -1) {
return null;
}
int endIdx = findIndex(source, toFind, currentlyFound, startIdx + 1, target.length() - 1);
if (endIdx == -1) {
return null;
}
return new Pair(startIdx, endIdx);
}
Unit testing
Before refactoring your code aggressively with confidence,
I added a bunch of unit tests to cover my back.
This is a good practice I recommend, especially when working with non-trivial code.
Here are some examples using JUnit4 to get you started:
public class MinimumWindowTest {
private String solve(String s, String t) {
return MinimumWindow.minimumWindow(s, t);
}
@Test
public void test_ADOBECODEBANC_ABC() {
assertEquals("BANC", solve("ADOBECODEBANC", "ABC"));
}
@Test
public void test_a_aa() {
assertEquals("", solve("a", "aa"));
}
@Test
public void test_aaab_aab() {
assertEquals("aab", solve("aaab", "aab"));
}
}