Problem
I have invented the fastest string search algorithm. Can someone prove that it is not fastest?
I am not considering worst case scenarios.
The algorithm can be found here: https://sourceforge.net/projects/fastest-string-search-algo/files/
/*
* Choudhary algorithm
*/
public static int choudharyStringSearchAlgorithm(char[] text, char[] pattern) {
int i = 0;
int index = 0;
int end_index = 0;
boolean not_found = false;
int text_len = text.length;
int pattern_len = pattern.length;
int pi_44 = pattern_len - 1;
int pi_34 = (3 * pattern_len) / 4;
int pi_24 = pattern_len / 2;
int pi_14 = pattern_len / 4;
int[] skip_table = new int[ASIZE];
// preprocess pattern and fill skip_table
for (i = 0; i < pattern_len; i++) {
skip_table[pattern[i]] = pattern_len - 1 - i;
}
// now search
for (i = 0; i < text_len; i++) {
if ((text_len - i) < pattern_len) {
return -1;
}
if (pattern[pi_44] != text[i + pi_44]) {
if (skip_table[(int) (text[i + pi_44])] > 0) {
i = i + skip_table[(int) (text[i + pi_44])] - 1;
}
continue;
}
if (pattern[0] != text[i]) {
continue;
}
if (pattern[pi_24] != text[i + pi_24]) {
continue;
}
if (pattern[pi_34] != text[i + pi_34]) {
continue;
}
if (pattern[pi_14] != text[i + pi_14]) {
continue;
}
end_index = i + pi_44;
not_found = false;
for (index = i; index <= end_index; index++) {
if (text[index] != pattern[index - i]) {
not_found = true;
break;
}
} // end of inner for loop
if (not_found == false) { // match is found
return i;
}
} // end of outer for loop
return -1;
} // end of choudharyStringSearchAlgorithm
Solution
If you are searching for long patterns, your algorithm wins most of the time (long strings as far to the end of the text as possible are favored), but if you compare the algorithms on patterns up to 100 characters long (human search), Boyer-Moore smokes everyone and it is not even close.
Edit: After some testing I have found out that your algorithm is worse on long patterns than if you just implemented the skip table.
Edit 2: Actually here is the implementation that is consistently faster than every other algorithm in your collection more than 95% of the time:
public class MyTextSearch implements TextSearch
{
@Override
public int search(
char[] text,
char[] pattern
)
{
int[] skip_table = new int[256];
int patternLength = pattern.length - 1;
char lastPatternChar = pattern[patternLength];
Arrays.fill(skip_table, patternLength);
for (int i = 0; i < pattern.length; i++)
{
skip_table[pattern[i]] = patternLength - i - 1;
}
for (int i = 0; i < text.length; i++)
{
if (i + patternLength >= text.length)
{
return -1;
}
char lastTextChar = text[i + patternLength];
if (lastPatternChar != lastTextChar)
{
i += skip_table[lastTextChar];
continue;
}
if (isEqual(text, i, pattern))
{
return i;
}
}
return -1;
}
private boolean isEqual(
char[] text,
int at,
char[] pattern
)
{
for (int j = 0; j < pattern.length; j++)
{
if (text[at + j] != pattern[j])
{
return false;
}
}
return true;
}
@Override
public String name()
{
return "My Text Search";
}
}
I have refactored your code a bit and it is on https://github.com/blazmrak/fastest-string-search if you are interested.