Problem
(EDIT: here is the follow up qusetion)
Using this program to test for the smallest number where its permutations or itself is divisible by 10 or less numbers is twice as slow as the fastest program I found doing this (I don’t have the source code). How can this program be twice as slow, what optimizations can I do without messing with the compiler flags?
#include <stdio.h>
#include <string.h>
#define MAX 1000000
int main() {
int N,p[10],n[100];
int mask[MAX];
register int i, j, k;
scanf("%d", &N);
for(i=0;i<N;i++) scanf("%d", &n[i]);
memset(mask,0,sizeof(mask));
i = MAX+1;
while(--i) {
k=i;
p[0]=0;
p[1]=0;
p[2]=0;
p[3]=0;
p[4]=0;
p[5]=0;
p[6]=0;
p[7]=0;
p[8]=0;
p[9]=0;
while (k>0) {p[k%10]++; k/=10;}
for(j=1;j<10;j++) if(p[j]>0) {k=j; p[j]--; break;}
for(j=0;j<10;j++) while (p[j]>0) {k=10*k+j; p[j]--;}
for(j=0;j<N;j++) if(i%n[j]==0) mask[k]|=(1<<j);
}
for(i=1;i<MAX;i++) if(mask[i]==(1<<N)-1) break;
fprintf(stdout, "%dn", i);
}
This program uses bitmasks to test all numbers between 1 and MAX for divisibility with the given numbers and then prints out the smallest number whose permutations or itself is divisible by all given numbers. It is not fast enough!
Example input
7
164 278 293 382 483 598 23
This will test all numbers and check which ones are divisible by the given numbers and the output should be 102246
.
It is compiled using the GCC compiler with the following flags:
-g -O2 -std=gnu99 -static -lm
Test the code here.
Solution
Check if work is necessary first
I was able to speed up your program 2x by making the following simple change. If i
is not divisible by any n[j]
, you can skip the whole lengthy computation of k
and just move on to the next i
. In other words, at the top of your loop, add this:
while(--i) {
// This is a quick check to see if we even need to compute k.
for(j=0;j<N;j++) {
if(i%n[j]==0)
break;
}
// None of the numbers divide into i, so skip this i.
if (j == N)
continue;
// The rest of your loop...
}
Since your program was only running for 0.7 seconds on my machine, I tested it by increasing MAX by 10x and using inputs that would force the answer to be something really big. By doing that, I found that this change sped up your program by about 2x compared to your original program.
Further optimizations
A small optimization to my previous change is that you can reuse the work that you do during the quick check by saving the bits that you need to set in the mask.
Another thing I noticed was that the p
array is always set to 0 at the end of every loop, so you don’t have to explicitly zero it yourself at the beginning of the loop. However, this change didn’t appear to speed up your program so the compiler probably already optimized that away.
So your loop would now look like this:
int p[10] = {0};
while(--i) {
int maskBits = 0;
for(j=0;j<N;j++) {
if(i%n[j]==0)
maskBits |= (1 << j);
}
if (maskBits == 0)
continue;
k=i;
while (k>0) {p[k%10]++; k/=10;}
for(j=1;j<10;j++) if(p[j]>0) {k=j; p[j]--; break;}
for(j=0;j<10;j++) while (p[j]>0) {k=10*k+j; p[j]--;}
mask[k] |= maskBits;
}
This ran about 5-10% faster than the previous version.
for
loop
i = MAX+1;
while(--i) {
This should be converted to a for
loop, it will become more readable.
Smallest scope possible
Maybe in this challenge C99 is not an option, but I would like to reccomend declaring variables in loop declarations, like:
for(int i=1;i<MAX;i++)
Avoiding repetition
Please, avoid repetion:
p[0]=0;
p[1]=0;
p[2]=0;
p[3]=0;
p[4]=0;
p[5]=0;
p[6]=0;
p[7]=0;
p[8]=0;
p[9]=0;
Should be:
for (int i = 0; i <= 9; i++) {
p[i] = 0;
}
and it becomes so much clearer (think what would have happened if you had to set 100 of such items).