Problem
This is extension of the rep-string task from Rosette code. I do not only want to check if the input string is the shortest periodic string (rep-string) or not, but also check if it is preperiodic.
Periodic means, that a part must occur two or more times.
Input is a binary string (only 1 and 0 chars). It can be for example binary expansion of the fraction
Output is a formatted binary string: preperiod(period)
10101010 = (10) // periodic
10010101010 = 100(10) // preperiodic
11111111110 // aperiodic
Below is my C code. I use simple / naive string-search algorithm, and the code works well.
#include <stdio.h>
#include <string.h>
// array of strings for testing
char *strs[] = {
// aperiodic
"11111111110", // aperiodic
"011111111110",
"11000100000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001", // https://en.wikipedia.org/wiki/Liouville_number
"110001000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010", // https://en.wikipedia.org/wiki/Liouville_number + 0
"101", // aperiodic because only one and a half repetition, possibly periodic if longer string
// periodic
"0100100", // (010) preperiod = 0 period = 3, truncated
"00100100100100100100100100100100100100100100100100100100100100100100100100100", // (001) preperiod = 0 period = 3 truncated
"1001110011", // = (10011) preperiod = 0 period = 5
"1110111011", // = (1110) preperiod = 0 period = 4
"0010010010", /* 4 x 001 and truncated, last char can be from 001*/
"001001001", /* 4 x 001 */
"1111111111", // (1)
"11", // (1) periodic
"00", // (0) periodic
// preperiodic
"0100101101",
"00100100101", /* 4 x 001 but last 2 chars are NOT from 001 */
"100100100101", /* */
"01111111111", // 0(1) preperiodic
"001010101010101", // preperiodic = 0(01) = 1/6
"0100101010101010" // preperiodic = 010(01) = 7/24
};
//
// strstr : Returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
// size_t is an unsigned integer typ
// looks for the shortest substring !!!!!
// GivePeriod = repstr_shortest
// https://rosettacode.org/wiki/Rep-string#C
int GivePeriod(char *str)
{
if (!str) return 0; // if empty input
size_t sl = 1;
size_t sl_max = strlen(str)/2+1 ; // more then one repetition of periodic parts
while (sl < sl_max) {
if (strstr(str, str + sl) == str) // How it works ???? It checks the whole string str
return sl;
++sl;
}
return 0;
}
int FormatString(char *str){
int p; // offset or preperiod ( if period is found)
int pMax = strlen(str) ; // length without null character
int period = 0;
int preperiod = 0;
char *substr;
for (p=0; p < pMax; ++p ){
substr = str+p;
period = GivePeriod( substr);
if (period > 0 ) {
preperiod = p;
//printf("substring =t%*st from position =%d has preperiod = %dtperiod = %dn", pMax, substr, p, preperiod, period ); // pring part of the string from p position to the end ( without last null character)
printf("%s = %.*s(%.*s) preperiod = %dtperiod = %dn", str, p, str, period, str+p, preperiod, period );
return period;
}
// else printf("substring =t%*st from position = %dtn", pMax, substr, p); // pring part of the string from p position to the end ( without last null character)
}
printf("%s is aperiodicn", str);
return 0;
}
int main(){
int iMax = sizeof(strs) / sizeof(strs[0]); // number of test values
int i; // test number = index of the array
for (i = 0; i < iMax; ++i) // check all test values
FormatString(strs[i]);
return 0;
}
To compile :
gcc p.c -Wall
to run:
./a.out
My output is:
11111111110 is aperiodic
011111111110 is aperiodic
11000100000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 is aperiodic
110001000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010 is aperiodic
101 is aperiodic
0100100 = (010) preperiod = 0 period = 3
00100100100100100100100100100100100100100100100100100100100100100100100100100 = (001) preperiod = 0 period = 3
1001110011 = (10011) preperiod = 0 period = 5
1110111011 = (1110) preperiod = 0 period = 4
0010010010 = (001) preperiod = 0 period = 3
001001001 = (001) preperiod = 0 period = 3
1111111111 = (1) preperiod = 0 period = 1
11 = (1) preperiod = 0 period = 1
00 = (0) preperiod = 0 period = 1
0100101101 = 0100(101) preperiod = 4 period = 3
00100100101 = 0010010(01) preperiod = 7 period = 2
100100100101 = 10010010(01) preperiod = 8 period = 2
01111111111 = 0(1) preperiod = 1 period = 1
001010101010101 = 0(01) preperiod = 1 period = 2
0100101010101010 = 010(01) preperiod = 3 period = 2
Questions:
- Can I do it better ?
- Can you give examples of strings for which my program fail ?
- Does this code follow common best practices?
Edit 1
notation:
- period = length of periodic ( repeating) part
- preperiod = length of part before periodic part
- periodic string : preperiod = 0, period > 0
- preperiodic string : preperiod > 0 and period > 0
- aperiodic string : perperiod = 0 and period = 0
Note the input string is finite ( = finite part of infinite string) so maybe in a longer input string period or preperiod will be > 0
Solution
Looking at your code for style
#include <stdio.h>
#include <string.h>
// array of strings for testing
These strings should be char const *
unless you really want them to be altered
char *strs[] = {
You have the ‘aperiodic comment twice here, which confuses me somewhat. If this is a section of aperiodic strings, why do you need the 2nd comment at all?
Also I suspect the 1st comment should be a block comment (especially in view of some of the other stuff)
// aperiodic
"11111111110", // aperiodic
"011111111110",
With strings this long I’d put the descriptive comment by itself on the line before
"11000100000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001", // https://en.wikipedia.org/wiki/Liouville_number
"110001000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010", // https://en.wikipedia.org/wiki/Liouville_number + 0
"101", // aperiodic because only one and a half repetition, possibly periodic if longer string
Again, I think this comment should be a block comment
// periodic
"0100100", // (010) preperiod = 0 period = 3, truncated
"00100100100100100100100100100100100100100100100100100100100100100100100100100", // (001) preperiod = 0 period = 3 truncated
"1001110011", // = (10011) preperiod = 0 period = 5
"1110111011", // = (1110) preperiod = 0 period = 4
"0010010010", /* 4 x 001 and truncated, last char can be from 001*/
"001001001", /* 4 x 001 */
"1111111111", // (1)
"11", // (1) periodic
"00", // (0) periodic
probably a block comment again. Does preperiodic mean periodic after a prefix, as the word doesn’t appear in the referenced web page (in fact, neither do periodic or aperiodic, but I know what both of those mean, but I’ve never come across preperiodic)
// preperiodic
"0100101101",
"00100100101", /* 4 x 001 but last 2 chars are NOT from 001 */
Blank comments are generally unhelpful
"100100100101", /* */
You’ve already said preperiodic so why repeat?
"01111111111", // 0(1) preperiodic
"001010101010101", // preperiodic = 0(01) = 1/6
"0100101010101010" // preperiodic = 010(01) = 7/24
};
//
The c standard defines what strstr does, so this doesn’t convey any helpful information
// strstr : Returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
Ditto for size_t
// size_t is an unsigned integer typ
If it looks for the shortest substring, why not call it that, rather than GivePeriod? Or is the comment wrong?
// looks for the shortest substring !!!!!
// GivePeriod = repstr_shortest
// https://rosettacode.org/wiki/Rep-string#C
- I think this is getting the repeat period, not giving it
- you should pass
char const *
here. You don’t alter the string in this and you presumably don’t intend do.
int GivePeriod(char *str)
{
This is not ’empty input’. This protects against passing a null pointer. It’s not usually worth protecting against that.
if (!str) return 0; // if empty input
size_t sl = 1;
I can’t relate the comment to the calculation at all.
size_t sl_max = strlen(str)/2+1 ; // more then one repetition of periodic parts
You could make this a for loop
while (sl < sl_max) {
That doesn’t explain how it works at all
if (strstr(str, str + sl) == str) // How it works ???? It checks the whole string str
Never omit the braces after if/while/do/for. Read up on ‘goto fail‘. I honestly would be a rich person if I had a penny for every time someone added a line into an if statement during a maintenance and didn’t add the braces
return sl;
++sl;
}
return 0;
}
Again, const correctness. You don’t (and shouldn’t) alter the string. You should pass pointers as pointer to const by default, and only change if you actually need to modify what is pointed to
int FormatString(char *str){
Given you have period and preperiod defined shortly after, this comment is confusing. Looking at the code, it isn’t either anyway, it’s the current offset in the string.
int p; // offset or preperiod ( if period is found)
That’s what strlen does. Not a helpful comment
int pMax = strlen(str) ; // length without null character
You only use period, preperiod and substr inside the loop, so declare them inside the loop. It helps the reader understand the intended scope
int period = 0;
int preperiod = 0;
char *substr;
for (p=0; p < pMax; ++p ){
substr = str+p;
period = GivePeriod( substr);
if (period > 0 ) {
preperiod = p;
- Don’t release commented out code to production
- you don’t actually use preperiod apart from this output. There’s not much point in having it as it has to be the same as p.
- in general I’d avoid having output to stdout/stderr inside a library function. It should be up to the client to determine what to do with the results of your function – including how and where to output them. Among other things, it is difficult to write a unit test that checks you output what you think you did.
//printf("substring =t%*st from position =%d has preperiod = %dtperiod = %dn", pMax, substr, p, preperiod, period ); // pring part of the string from p position to the end ( without last null character)
printf("%s = %.*s(%.*s) preperiod = %dtperiod = %dn", str, p, str, period, str+p, preperiod, period );
return period;
}
// else printf("substring =t%*st from position = %dtn", pMax, substr, p); // pring part of the string from p position to the end ( without last null character)
}
printf("%s is aperiodicn", str);
return 0;
}
int main(){
int iMax = sizeof(strs) / sizeof(strs[0]); // number of test values
int i; // test number = index of the array
Again do not omit the braces
for (i = 0; i < iMax; ++i) // check all test values
FormatString(strs[i]);
return 0;
}