Kattis Challenge – Animal Classification

Posted on

Problem

I’m new to competitive programming and C++ in general. I have solved the Kattis Animal Classification challenge. We are given two classification trees, each represented using nested parentheses. For example, ((3,(1,(5,2))),4) and (((5,1),(2,3)),4) represent the trees

   ╱╲                   ╱╲
  ╱╲ 4                 ╱╲ 4
 ╱ ╱╲                 ╱  ╲
3 1 ╱╲               ╱╲  ╱╲
   5  2             5 1  2 3

The challenge is to report the fact that there are 7 subgroups in common between the two trees (namely {1}, {2}, {3}, {4}, {5}, {1,2,3,5}, and {1,2,3,4,5}).

My approach is to parse the input string [ e.g. ((3,(1,(5,2))),4) ] into a list [ ((3(1(52)))4) ]. After that I note all elements, split it into the two subtrees and add both back to a queue, starting over again, until there is nothing to split anymore. Afterwards i compare the two sets i obtained this way. If i am not mistaken this should lead to a complexity of O(N log(N)), since for every subtree, i need to collect all elements that belong to it, which would amount to N log(N) in a complete binary tree. This works for the provided test sets, but exceeds the time limit on the submission.

Could you please help me figure out where I can improve my code and what I did inefficiently?

#include <iostream>
#include <list>
#include <algorithm>
#include <string>
#include <unordered_set>
#include <deque>

using namespace std; 


struct hash_X{
  size_t operator()(const unordered_set<int> &x) const{
    int sum = 0;
    for(int i: x){sum = sum+i;}
    return sum;
  }
};

unordered_set<unordered_set<int>, hash_X> mt(list<string> L){
    unordered_set<unordered_set<int>, hash_X> S;
    list<string> K;
    deque<list<string>> Q;
    Q.push_back(L);
    while(not Q.empty()){
    K = Q.back();
    unordered_set<int> tmp;
    for(string s: K){if( s!="(" and s!=")" ){int i = stoi(s); tmp.insert(i);}}
    S.insert(tmp);
    //K is of the form for example (x(yz)), so first we unwrap ~ x(yz) and split into the two subtrees x and (yz)
    //and add both to the queue. If only one node is left ( x ) we dont add it back to the queue
    if(K.size() >= 2){
        K.pop_front();
        K.pop_back();
        list<string>::iterator it = K.begin();
        if(*it == "("){
            int i = 1;
            while(i != 0){
                it++;
                if(*it == ")"){i = i-1;}
                else if(*it == "("){i=i+1;}
                }
            }
        it++;
        list<string> B;
        list<string>::iterator itB = B.begin();
        B.splice(itB, K, it, K.end());
        Q.push_front(K);
        Q.push_front(B);
    }
    Q.pop_back();
    }
    return S;
}


int main(){
    int n;
    string x, y;
    char c;
    list<string> A, B;
    cin >> n;
    cin.ignore();
    cin >> x; 
    cin >> y;
    // parse "((3,(1,(5,2))),4)" into ((3(1(52)))4)
    for(int i = 0; i < x.size(); i++){
    string c = "";
    c += x[i];
    if( c == "(" or c == ")" ){A.push_back(c);}
    if( isdigit(x[i]) ){
        string tmp = "";
        while( isdigit(x[i]) ){
            tmp += x[i];
            i++;
        }
        i--;
        A.push_back(tmp);
    }
    }
    for(int i = 0; i <= y.size(); i++){
    string c = "";
    c += y[i];
    if( c == "(" or c == ")" ){B.push_back(c);}
    if( isdigit(y[i]) ){
        string tmp = "";
        while( isdigit(y[i]) ){
            tmp += y[i];
            i++;
        }
        i--;
        B.push_back(tmp);
    }
    }

    // make the trees
    unordered_set<unordered_set<int>, hash_X> t1 = mt(A);
    unordered_set<unordered_set<int>, hash_X> t2 = mt(B);
    unordered_set<unordered_set<int>, hash_X> i(t1.begin(), t1.end());
    int res = count_if(t2.begin(), t2.end(), [&](unordered_set<int> k) {return i.find(k) != i.end();});
    cout << res;
}

Solution

This isn’t really what you’re looking for, just some general feedback on your code. I find the code quite difficult to follow, resolving some of the issues below to help with readability might make it more likely that you will get feedback that aimed at your algorithm.

Naming

Your names seem overly concise. A few extra letters won’t make any difference to the speed of your application, but make it a lot more readable. Variables in main are n,x,y,c,A,B,i,c,tmp,t1,t2,res. Without looking at the context, these names are meaningless.

Your mt method is similarly nondescript. Perhaps make_tree might be more appropriate?

Consistent expecatations

In main, you reuse the variable name i for both an int and an unordered_set<unordered_set<int>, hash_X>. Switching variable types like this makes your code harder to follow than necessary and illustrates that you could benefit from more meaningful names.

Indentation

Your indentation is all over the place. This makes it difficult to read. Sometimes you have indentations next to each nesting level, sometimes you don’t. If it looks like this in your development environment, you should fix it. If it doesn’t, then you need to check the preview when adding code to questions. Usually copying the whole thing from your IDE, selecting it and clicking the {} icon will format the code correctly.

Consider this:

if(*it == "("){
    int i = 1;
    while(i != 0){
        it++;
        if(*it == ")"){i = i-1;}
        else if(*it == "("){i=i+1;}
        }
    }

At first glance, the final closing brace belongs to the while loop, not the if statement. This is unnecessarily confusing.

Don’t be afraid of using an extra line

Consider the following line or two:

for(string s: K){if( s!="(" and s!=")" ){int i = stoi(s); tmp.insert(i);}}

There’s a lot going on there for a single line. It’s a lot easier to read spread across multiple lines:

for(string s: K) {
    if( s!="(" and s!=")" ) {
        int i = stoi(s); 
        tmp.insert(i);
    }
}

Everything forsvarir said plus:

  1. using namespace std; By using this statement you (and lots of others) remove the purpose of namespaces. if the full name of a method/class is too long and ugly then use a typedef, this will also provide a buffer against the (unlikely) possibility of a class you use changing name.

  2. In main there is a lovely bit of code:

    char c;
    ...
    if (c == "(" || c == ")")

You can’t do that!! Oh hang on you can, because in the middle there is a redefinition of c to a std::string. Although what you have done will compile it does add an element of confusion that is undesirable. If you variable had more meaningful names then this might not have occurred.

  1. There are 4 lines of comments in your listing. To you this is all very simple, you have been working on it for a while and know it all backwards. The fact that you have written so much in your post explaining what problem the code solves and how it works indicates that you realise the code is not trivial. Imaging the situation where you are looking at the code in 6 months time, do you think you will know what it does and how? At the very least copy and paste the top of this post into the file.

I understand you want comments on you algorithm, but unfortunately you code is quite difficult to read and so those comments won’t be forthcoming.

Update

by changing

unordered_set<unordered_set<int>, hash_X> t1 = mt(A);
unordered_set<unordered_set<int>, hash_X> t2 = mt(B);
unordered_set<unordered_set<int>, hash_X> i(t1.begin(), t1.end());
int res = count_if(t2.begin(), t2.end(), [&](unordered_set<int> k) {return i.find(k) != i.end();});
cout << res;

to

unordered_set<unordered_set<int>, hash_X> setsA = mt(A);
unordered_set<unordered_set<int>, hash_X> setsB = mt(B);
int res = count_if(setsB.begin(), setsB.end(), [&](unordered_set<int> subsetofB) {return setsA.find(subsetofB) != setsA.end();});
cout << res;

the code gets faster, because it does not need to create the set “i” anymore. This works on more test cases now, but is still too slow.

Update 2

Consider the tree:

/
 /
  /
  ...

the parsing and splitting part of my algorithm would parse the complete right subtree everytime, which leads to “O(N**2)” complexity. So there seems to be the problem!

Update 3

Using a stack one can parse the treestructure from the input string in O(N)( as mentioned here : https://stackoverflow.com/questions/7814209/parsing-text-to-make-a-tree-data-structure ). I think this will work afterwards.
For reference this is my current code, and the problem resides in the else block of the function “get_sets” ( i hope )

#include <iostream>
#include <list>
#include <algorithm>
#include <string>
#include <unordered_set>
#include <deque>

using namespace std; 


struct hashX{
  size_t operator()(const unordered_set<int> &x) const{
    int sum = 0;
    for(int element: x){sum = sum+element;}
    return sum*sum%10007;
  }
};

unordered_set<int> get_sets(list<string> &L, unordered_set<unordered_set<int>, hashX> &subsets){
    if(L.size() == 1){
        unordered_set<int> subset;
        int i = stoi(*L.begin());
        subset.insert(i);
            subsets.insert(subset);
        return subset;
    }
        else{
        L.pop_front();
        L.pop_back();
        list<string>::iterator it = L.begin();
        if(*it == "("){
            int i = 1;
            while(i != 0){
                it++;
                if(*it == ")"){i = i-1;}
                else if(*it == "("){i=i+1;}
            }
        }
        it++;
        list<string> right;
        list<string>::iterator it_right = right.begin();
        right.splice(it_right, L, it, L.end());
        unordered_set<int> X = get_sets(right, subsets);
        unordered_set<int> Y = get_sets(L, subsets);
        for(int i: X){
            Y.insert(i);
        }
        subsets.insert(Y);
        return Y;
    }
}



int main(){
    int n;
    cin >> n;
    //input stuff
    string inputA, inputB;
    cin >> inputA; 
    cin >> inputB;
    // parse "((3,(1,(5,2))),4)" into ((3(1(52)))4)
    list<string> A, B;
    for(int i = 0; i < inputA.size(); i++){
        string c = "";
        c += inputA[i];
        if( c == "(" or c == ")" ){A.push_back(c);}
        else if( isdigit(inputA[i]) ){
            string tmp = "";
            while( isdigit(inputA[i]) ){
                tmp += inputA[i];
                i++;
            }
            i--;
            A.push_back(tmp);
        }
    }
    for(int i = 0; i < inputB.size(); i++){
        string c = "";
        c += inputB[i];
        if( c == "(" or c == ")" ){B.push_back(c);}
        else if( isdigit(inputB[i]) ){
            string tmp = "";
            while( isdigit(inputB[i]) ){
                tmp += inputB[i];
                i++;
            }
            i--;
            B.push_back(tmp);
        }
    }

    //get the subsets.
    unordered_set<unordered_set<int>, hashX> setsA;
    unordered_set<unordered_set<int>, hashX> setsB;
    get_sets(A, setsA);
    get_sets(B, setsB);
    int result = count_if(setsB.begin(), setsB.end(), [&](unordered_set<int> subsetofB) {return setsA.find(subsetofB) != setsA.end();});
    cout << result;
}

Update 4:

Changed the parsing method to get the tree in one pass over the input string, and i like it very much. After that i recursively traverse down the tree and write down the sets of the subtrees. For some reason this is still to slow. I tried to comment enough and choose descriptive names at key places. If someone is still interested here is the code:

#include <iostream>
#include <list>
#include <algorithm>
#include <string>
#include <unordered_set>

using namespace std; 


struct hashX{
  size_t operator()(const unordered_set<int> &x) const{
    int sum = 0;
    for(int element: x){sum = sum+element;}
    return hash<int>()(sum);
  }
};

class node{
    public:
        struct node *left,*right;
        int data;
        bool isleaf;
};

void make_tree(string s, struct node **root){
    //the stack will hold the node hierarchy, the last element will be the current parent node.
    list<node*> stack;
    struct node *newnode,*current,*previous, *previousprevious;
    int i = 0;
    while(i < s.size()){
        // if we see ( we create a new node and push it back as the current parent onto the stack
        if(s[i] == '('){
            newnode = new node();
            newnode->data = -1;
            newnode->left = NULL;
            newnode->right= NULL;
            newnode->isleaf= false;
            stack.push_back(newnode);
        }
        // if we see a digit we parse it, create a leaf and push it onto the stack
        else if(isdigit(s[i])){
            //parse the whole number
            string tmp = "";
            while( isdigit(s[i]) ){
                tmp += s[i];
                i++;
                }
            int int_tmp = stoi(tmp);
            i--;
            //create a leaf node and push it onto the stack
            newnode = new node();
            newnode->data = int_tmp;
            newnode->left = NULL;
            newnode->right= NULL;
            newnode->isleaf= true;
            stack.push_back(newnode);
        }
        // if we see ) the current node is finished and the right child of the previous previous node, the previous node is the left child .. see (a, (b, c))  root = ( ..... ) .. left child = a, right child = (b, c)
        else if(s[i] == ')'){
            current = new node();
            current = stack.back();
            stack.pop_back();
            previous = new node();
            previous = stack.back();
            stack.pop_back();
            previousprevious = new node();
            previousprevious = stack.back();
            stack.pop_back();
            previousprevious->right = previous;
            previousprevious->left = current;
            stack.push_back(previousprevious);
        }
        i++;
    }
    *root = stack.front();
}

unordered_set<int> get_sets(node *T, unordered_set<unordered_set<int>, hashX> &subsets){
    //if we encounter a leaf node we add the singleton set of its data to the subsets
    if(T->isleaf){
            unordered_set<int> subset;
        subset.insert(T->data);
            subsets.insert(subset);
        return subset;
    }
    //else we join the subsets of its children, and add it to the subsets
        else{
        node *left = T->left;
        node *right = T->right;
        unordered_set<int> x = get_sets(left, subsets);
        unordered_set<int> y = get_sets(right, subsets);
        for(int i: x){
            y.insert(i);
        }
        subsets.insert(y);
        return y;
    }
}

int main(){
    int n;
    cin >> n;
    //input stuff
    string inputA, inputB;
    cin >> inputA; 
    cin >> inputB;
    //construct the trees
    struct node *TreeA, *TreeB;
    make_tree(inputA, &TreeA);
    make_tree(inputB, &TreeB);
    //get the subsets
    unordered_set<unordered_set<int>, hashX> setsA;
    unordered_set<unordered_set<int>, hashX> setsB;
    get_sets(TreeA, setsA);
    get_sets(TreeB, setsB);
    //compute the size of the intersection
    int result = count_if(setsB.begin(), setsB.end(), [&](unordered_set<int> subsetofB) {return setsA.find(subsetofB) != setsA.end();});
    cout << result;
}

Leave a Reply

Your email address will not be published. Required fields are marked *