Problem
I’ve solved UVA Problem 100: The 3n + 1 problem . In short, the task was to write a program that calculates a maximum of Collatz sequence lengths for any given range of starting values.
My program:
#ifdef ONLINE_JUDGE
#define NODEBUG
#endif
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <iostream>
#include <string>
#include <sstream>
#include <cmath>
#include <cassert>
#include <cstdint>
#include <cstdlib>
using namespace std;
using seqlen = unsigned short;
using seqval = unsigned long;
constexpr seqval inputMax = 999999;
constexpr seqval inputMin = 1;
seqlen getLength(seqval n)
{
assert(n%2 == 0 || n <= (UINT32_MAX-1)/3);
static unordered_map<seqval,seqlen> cache {{1,1}};
if(cache.count(n)) return cache[n];
return cache[n] = getLength(n%2==0 ? n/2 : 3*n+1) + 1;
}
seqlen getRangeMax(seqval i, seqval j)
{
assert(i <= j);
assert(i >= inputMin && j <= inputMax+1);
using cache_t = vector<unordered_map<seqval,seqlen>>;
static cache_t cache;
if(i == j) return 0;
auto exponent = static_cast<cache_t::size_type>(floor(log2(j-i)));
auto interval = static_cast<seqval>(pow(2,exponent));
seqval lb, ub;
while((lb = (ub = interval * (j/interval)) - interval) < i)
interval = static_cast<seqval>(pow(2,--exponent));
if(exponent == 0) return getLength(lb);
cache.resize(max(cache.size(), exponent));
return max(
max(getRangeMax(i, lb), getRangeMax(ub, j)),
cache[exponent-1].count(lb) ?
cache[exponent-1][lb] :
cache[exponent-1][lb] =
max(getRangeMax(lb, lb+interval/2), getRangeMax(lb+interval/2, ub)));
}
struct testCase
{
seqval i=0, j=0;
friend istream &operator >> (istream &is, testCase &tc)
{
if((is>>ws).eof()) {return is;}
string inpstr; getline(is, inpstr); if(!is.good()) return is;
stringstream inp(inpstr+'n'); inp >> tc.i >> tc.j;
if(!inp.good()) {is.setstate(inp.rdstate() |
((inp.rdstate()&ios_base::eofbit) != 0 ?
ios_base::failbit : static_cast<ios_base::iostate>(0))
); return is;}
if(tc.i < inputMin || tc.j < inputMin || tc.i > inputMax || tc.j > inputMax)
{is.setstate(ios_base::failbit); return is;}
if(!(inp>>ws).eof()) {is.setstate(ios_base::failbit); return is;}
return is;
}
bool noInput() {return i==0 || j==0;}
};
int main()
{
#ifdef NODEBUG
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
#endif
cin.exceptions(ios_base::failbit | ios_base::badbit);
cout.exceptions(ios_base::eofbit | ios_base::failbit | ios_base::badbit);
testCase tc; while(!(cin >> tc).eof())
cout << tc.i << ' ' << tc.j << ' ' <<
getRangeMax(min(tc.i,tc.j), max(tc.i,tc.j)+1) << 'n';
if(tc.noInput()) cout << 'n';
return EXIT_SUCCESS;
}
Notes:
- People tell me my code tends to be ugly and unreadable; I tried to make this one reasonably readable, and I’d especially like to hear comments on this matter.
- My goals I tried to meet as a self-imposed challenge:
- I tried to achieve sound time complexity even despite the judge’s lax requirements do not force me to do so.
- I also tried to write error safe code, in particular I wanted to make the program fail on input that is invalid as per the problem specification.
- Problem #1: This means I needed to distinguish the cases of lack of further input and the presence of an invalid input line. To meet this I broke conventions set up by the Standard Library by making
operator>>(istream&,testCase&)
only seteofbit
on lack of input but notfailbit
. I guess this is bad, but as of now I have no better idea. - Problem #2: I still cannot prevent UB in case of a stack overflow and I don’t think it is even possible in general. Maybe I should’ve refrained from using recursion?
- Problem #1: This means I needed to distinguish the cases of lack of further input and the presence of an invalid input line. To meet this I broke conventions set up by the Standard Library by making
- I tried to refrain from making use of any unnecessary implementation-specific guarantees, and instead make the program have to produce correct results as per the C++ Standard. As of now I can’t see where could I fail in here, but I’ve already learned that there are unobvious issues on this matter.
- As of now, the problem specification gives incorrect input boundaries, claiming that all numbers will be less than 10,000. The correct bound seems to be 1,000,000 instead, which is present in the archived version of the problem specification.
Solution
People tell me my code tends to be ugly and unreadable; I tried to make this one reasonably readable, and I’d especially like to hear comments on this matter.
I’ll do my best to keep this kind. People are right, and I’d like to explain why. In the comments, tinstaafl wrote:
All those inline statements make it much harder to see what your code is doing.
To which you reply:
But I was only trying to make the code more concise.
You didn’t. Concise code is not code that has as few lines as possible. It is code which expresses exactly what it needs to without any extraneous statements. You haven’t reduced the number of statements, just the number of lines. The number of lines of code is relatively unimportant, so long as the code is organized well, easy to understand, and performs well.
For example, you have this:
return max(
max(getRangeMax(i, lb), getRangeMax(ub, j)),
cache[exponent-1].count(lb) ?
cache[exponent-1][lb] :
cache[exponent-1][lb] =
max(getRangeMax(lb, lb+interval/2), getRangeMax(lb+interval/2, ub)));
This is not concise. And it’s worse than not being concise. It has a very unexpected assignment in the middle of it which will cause it to be misunderstood by nearly every person besides you who tries to read it. It’s a bunch of conditionals, assignments, calculations, and branches all crammed into a function call which is also a return statement. The compiler will generate the same results if you write it as:
seqlen result;
seqlen max1 = max(getRangeMax(i, lb), getRangeMax(ub, j));
if (cache[exponent - 1].count[lb] != 0)
{
result = max(max1, cache [ exponent - 1 ][ lb ];
}
else
{
seqlen max2 = max(getRangeMax(lb, lb+interval/2), getRangeMax(lb+interval/2, ub));
result = max(max1, max2);
cache [ exponent - 1 ][ lb ] = max2;
}
return result;
Now other people can also read it at a glance and understand what it does without having to first untangle a bunch of other stuff. (Although you should use more descriptive variable names than I’ve used. I don’t know this particular mathematical construct well enough to name things properly.)
Debugging
Your stream operator is particularly egregious. If there’s an error in it, how will you debug it? Try stepping through it in a debugger. When you get to, say, this line:
string inpstr; getline(is, inpstr); if(!is.good()) return is;
and find an error in it, how do you know which of those 3 statements contains the error? You can’t step over one to the other because it’s all one line. That doesn’t help anyone.
Are you sure this statement is correct?
if(!inp.good()) {is.setstate(inp.rdstate() |
((inp.rdstate()&ios_base::eofbit) != 0 ?
ios_base::failbit : static_cast<ios_base::iostate>(0))
); return is;}
Is the operator precedence correct? Does logical or come before not equal? I don’t remember and if I have to edit that line in the future, I’m likely to screw it up because it’s all crammed together and uses minimal parentheses.
Also, you seem concerned about having too many lines, but I count 5… no wait… 6 return is;
statements. The flow control of that function is terrible.
One way you could improve it is by having an error variable that you check for whether you should continue or not. For example, this would be one common way to write it:
friend istream &operator >> (istream &is, testCase &tc)
{
bool error = false;
if((is>>ws).eof()) {
error = true;
}
if (!error) {
string inpstr;
getline(is, inpstr);
if (!is.good()) {
error = true;
}
}
if (!error) {
stringstream inp(inpstr+'n');
inp >> tc.i >> tc.j;
if(!inp.good()) {
is.setstate(inp.rdstate() |
((inp.rdstate()&ios_base::eofbit) != 0 ?
ios_base::failbit : static_cast<ios_base::iostate>(0)));
error = true;
}
}
if (!error) {
if(tc.i < inputMin || tc.j < inputMin || tc.i > inputMax || tc.j > inputMax) {
is.setstate(ios_base::failbit);
error = true;
}
}
if (!error) {
if (!(inp>>ws).eof()) {
is.setstate(ios_base::failbit);
error = true;
}
}
return is;
}
The reason this is better than returning on error is because it’s easy to see at a glance the flow of control. If you needed to allocate a resource at the beginning of the function and release it on return, you now only have a single place you need to do it. (Granted C++ has some good tools for making that much easier, but it can still come up in real-world use.) With your version, I can’t tell quickly if there are non-error returns or if all returns but the last are for error cases.
If you want to write concise code learn how to express your ideas with fewer statements rather than fewer lines. And also remember that there’s a difference between concise and densely packed. It can be easy to write a very short statement or set of statements which do many things, but if it’s too subtle, people who have to maintain it (which is you in a few months time) won’t be able to see that subtlety when time is critical.
People tell me my code tends to be ugly and unreadable; I tried to make this one reasonably readable, and I’d especially like to hear comments on this matter.
I’ll have to agree with the other “people” on this one: Even though you said that you tried to make it more readable, it is still a huge mess of unreadable code. It seems that you think that the less line your code has, the better. That’s not the case though. It’s true that less lines of code is better, but that doesn’t mean you need to cram multiple statements on one line just to reduces the total count of lines:
if((is>>ws).eof()) {return is;}
string inpstr; getline(is, inpstr); if(!is.good()) return is;
stringstream inp(inpstr+'n'); inp >> tc.i >> tc.j;
if(!inp.good()) {is.setstate(inp.rdstate() |
((inp.rdstate()&ios_base::eofbit) != 0 ?
ios_base::failbit : static_cast<ios_base::iostate>(0))
); return is;}
if(tc.i < inputMin || tc.j < inputMin || tc.i > inputMax || tc.j > inputMax)
{is.setstate(ios_base::failbit); return is;}
if(!(inp>>ws).eof()) {is.setstate(ios_base::failbit); return is;}
return is;
That part is especially unreadable, don’t you think? Well, let’s see if we can make your code readable 🙂
Don’t use using namespace std;
It is bad practice, so please refrain from using it.
You don’t need to return anything on success from main
You don’t need to return EXIT_SUCCESS;
, as the compiler will implicitly use it if you don’t specify it. But that’s pretty opinion based.
Use better naming
You have some i
and j
outside of loops. Those are very bad variable names, they don’t tell what i
and j
do. I also find that getLength
isn’t an ideal name considering that getting the length of an integer doesn’t make any sense.
Use proper indentation and don’t stick lines together
As said before, you seem to like putting multiple statements in one line:
string inpstr; getline(is, inpstr); if(!is.good()) return is;
Those lines are really unreadable and hard to debug. Same thing goes for lines without indentation and/or whitespace:
if(!(inp>>ws).eof()) {is.setstate(ios_base::failbit); return is;}
Please use your Enter and Space keys 🙂
std::pow
has poor performance
std::pow
is (very) slow to calculate powers, that’s because it can also calculate powers of floating point numbers, and thus can’t use the traditional approach. It would be better (and faster!) if you made your own exponentiation function:
constexpr auto pow(unsigned long long base, unsigned long long exp) noexcept {
auto result = 1ull;
for (auto i = 0ull; i < exp; ++i)
result *= base;
return result;
}
Why use macros?
I don’t see any reason to use macros, especially since you cannot debug your code easily. Also, for debugging purposes, it is not really important that stdio
is synced, as you’re not even using stdio
. Same for tying std::cin
and std::cout
.
testCase
is weird and complicated
Your testCase
struct is very complicated and long and weird. I don’t know why you don’t use:
seqval num1 = 0;
seqval num2 = 0;
while(std::cin >> num1 >> num2)
std::cout << num1 << ' ' << num2 << ' ' << getRangeMax(std::min(num1, num2), std::max(num1, num2)+1) << 'n';
This will always get input from stdin
until EOF is reached. Because online judges will pass input from a file (I can’t imagine them doing something else – maybe they are, but that’s the easy solution), it will behave as expected.
If you want to stop the input and see the output, just input anything that is not a number (except for a space or a newline of course).
If you tie std::cout
and std::cin
again, you’ll see the output immediately, instead of at the end.
Your algorithm is unclear to me
The algorithm that you are using is very unclear from just looking at the code IMO, so I can’t give you any tips on this one, just give you how I would solve the problem. This is the best examples where comments would really help out to understand the algorithm you are using.
The combination of striving to minimize code lines, bad naming, no comments is the reason why it is hard for me to understand it.
Here’s how I would do it:
seqlen calculate_collatz(seqval number) noexcept {
seqlen length = 1;
while (number != 1) {
if (number % 2 == 0)
number /= 2;
else
number = 3 * number + 1;
++length;
}
return length;
}
seqlen max_collatz(seqval lower, seqval upper) noexcept {
seqlen largest_sequence = 0;
for (auto i = lower; i < upper + 1; ++i) {
auto curr_sequence = calculate_collatz(i);
if (curr_sequence > largest_sequence)
largest_sequence = curr_sequence;
}
return largest_sequence;
}
You might notice that I haven’t used any cache. That’s because even without a cache, my way is 0.6s faster than yours (0.2s vs. 0.8s)! Both were compiled using -O3
with gcc 6.3.1
on a i7-6700HQ.
With a cache, I would say that using a cache will make the algorithm even faster. Also, you might want to read this question, as it explains what you can do to optimize the algorithm even further. For example, only caching odd numbers.