Problem
I have made a function called testing
. This function tests whether five numbers from each sub-array of $source
are included in $dataArray
sub-arrays, at least $limit
times. If so, it returns the source sub-array as a string; else it returns “x”. Please see or try the example below if this description isn’t clear.
And now it comes: this function works properly but with large quantity of sub-arrays in $sourceArray
it is relatively slow (13 mins for 100000 tests on i7-7700K, PHP/7.1.4).
Is there any way to make this function more effective and quicker? Is there any noticeable bad construction / leak which may have an immediate impact on performance?
<?php
$dataArray=array(array(8,25,32,33,37,43),array(6,10,16,32,33,44),array(18,33,36,37,41,46),array(3,4,20,29,31,43),array(2,9,13,15,19,30),array(14,21,22,29,44,49),array(3,7,9,24,44,49),array(4,6,18,35,45,48));
// and so on - circa 4000 sub-arrays
$sourceArray=array(array(3,4,6,14,16,21,22,25,26,31,32,34,41,45,47,49),array(5,8,11,15,16,18,25,26,32),array(1,2,6,16,17,18,19,22,32,33,40,44,49),array(5,8,11,15,16,18,25,26,32),array(4,12,15,16,20,23,30,37,41,44));
// and so on - 10M+ sub-arrays (variable quantity)
function testing($dataArray, $source, $limit) {
$guessedArray = array();
foreach ($dataArray as $data) {
$counter = 0;
foreach ($source as $number) {
if (in_array($number, $data)) {
$counter++;
}
}
array_push($guessedArray, $counter);
}
$guessedArray = array_count_values($guessedArray);
ksort($guessedArray);
if ($guessedArray[5] >= $limit) {
return implode("-", $source);
} else {return "x";}
}
foreach ($sourceArray as $oneArray) { // just example but $sourceArray's sub-arrays must be processed one by one
$result = testing($dataArray, $oneArray, 1);
echo "$result<br />";
}
/*
Result:
x
x
1-2-6-16-17-18-19-22-32-33-40-44-49
x
x
-> because only the third sub-array in $sourceArray contains five numbers located in $dataArray's sub-arrays for at least $limit times
(in this example $limit = 1)
*/
?>
Solution
This testing routine runs three to four times faster on the test data:
function testing($dataArray,$source,$limit)
{
$matchesFound = 0;
foreach ($dataArray as $data)
{
$counter = 0;
foreach ($data as $number)
{
if (!in_array($number,$source)) $counter++;
if ($counter == 2) break;
}
if ($counter == 1) $matchesFound++;
if ($matchesFound >= $limit) return implode("-", $source);
}
return "x";
}
A big assumption is that $dataArray
only contains arrays of 6 numbers and you will always look for 5 matching numbers.
The improvement is that I stop looping as soon as I know whether there is a match or not. You simply loop over all the data. So the real speed improvement depends heavily on the data. My guess is that it can perform even better on the complete data set.
More can probably be gained if your question was more informative.