Problem
need some help or advice, HackerRank’s frequency-queries challenge,
the challenge of this Hackerrank problem is to check if the frequencies of number in your data structure matches the queries.
small integer values, that occupy the first 2 bytes of the integer datatype are fast and passes the test, but for values close to 1.000.000.000 that the occupy the second half of the integer data type are much slower.
the code i made passes 14 of the 15 test.
the code runs in “0.270603 seconds” seconds on my pc for case 8, with 100.000 lines, while reading them from the text file (input08.txt), but it takes “28.1141 seconds” for test eleven (input11.txt) as also contains 100.000 lines but with values up to 1 billion or 1 to the power of 9 sized values on witch it fails.
it seamed i have build a really lean & fast code, made a custom main()
function, but …
the bottleneck seams to be …
does_exist_map(Map &map, int frequency)
i tried various ways to loop the map.
what can i do to make it faster?
typedef std::map<int, int> Map;
void add_to_map(Map &map, int value){
/*
std::cout << "add" << endl;
if (map.find(value) == map.end()) {
map[value] = 1;
}
else {
map[value]++;
}
*/
if (map.count(value) > 0)
{
map[value]++;
}
else
{
map[value] = 1;
}
}
void delete_or_update_from_map(Map &map, int value){
if(map[value] > 1){
map[value]--;
//std::cout << "found in map and updated: " << value << " value:" << map[value] << endl;
}
else{
//std::cout << "found in map and deleleted: " << value << endl;
map.erase(value);
}
/*
if (map.find(value) != map.end()) {
//cout<<"Key (" << value << ") found. Value is: "<<map[value]<<endl;
if(map[value] > 1){
map[value]--;
//std::cout << "found in map and updated: " << value << " value:" << map[value] << endl;
}
else{
//std::cout << "found in map and deleleted: " << value << endl;
map.erase(value);
}
}
else {
//std::cout << "not found in map: " << value << endl;
}
*/
/*
if (map.count(value) > 0)
{
std::cout << "'hat' Found" << std::endl;
}
else
{
std::cout << "'hat' Not Found" << std::endl;
}
*/
}
bool does_exist_map(Map &map, int frequency){
//std::cout << "frequency: " << frequency << endl;
/*
or (auto& it : map) {
//cout << it.first << ' ' << it.second << 'n';
if(it.second == frequency){
return true;
}
}
*/
bool itexist = false;
int sizeMap = map.size();
auto it = map.begin();
for(int i = 0; i < sizeMap; i++){
if(it->second == frequency)
{
itexist = true;
break;
//return true;
}
it++;
}
/*
bool itexist = false;
auto it = map.begin();
while(it != map.end())
{
if(it->second == frequency)
{
itexist = true;
break;
//return true;
}
it++;
}
//return false;
*/
return itexist;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(0);
ofstream fout(getenv("OUTPUT_PATH"));
string q_temp;
getline(cin, q_temp);
int q = stoi(ltrim(rtrim(q_temp)));
Map frequencies;
for (int i = 0; i < q; i++) {
int input[2] = {0,0};
for (int j = 0; j < 2; j++) {
cin >> input[j];
}
//std::cout << "task: " << input[0] << " value: " << input[1] << endl;
if(input[0] == 1){
add_to_map(frequencies, input[1]);
}
else if(input[0]== 2){
delete_or_update_from_map(frequencies, input[1]);
}
else if(input[0] == 3){
if(does_exist_map(frequencies, input[1])){
fout << "1n";
}
else{
fout << "0n";
}
}
}
fout.close();
return 0;
}
Solution
Performance of std::map
Well, looking up in a map is a slow operation.
if(map[value] > 1){
map[value]--;
is looking up the same thing twice. Don’t do that with maps!
You could write:
auto& v= map[value];
if (v>1) --v;
Hmm, but you continue with:
else{
//std::cout << "found in map and deleleted: " << value << endl;
map.erase(value);
}
So you call the form of erase
that again looks it up from scratch, even though you already found it. Note that there is a form of erase
that takes an iterator instead.
But do you actually have to delete it from the map? Why not just leave it as zero? Would that be faster and simpler?
I also see you wrote:
if (map.count(value) > 0)
{
map[value]++;
}
else
{
map[value] = 1;
}
Same here: count
looks it up. Then you look it up all over again just to increment it, or look up where it should be inserted in order to add it.
Did you know that map[value]
will automatically add the thing if it wasn’t already there? And, it will initialize the newly created value the same as a static variable would be; that is, zero. So none of that is necessary. Just write ++map[value]
and be done with it.
for (auto& it : map) {
//cout << it.first << ' ' << it.second << 'n';
if(it.second == frequency){
return true;
}
}
OK, you are doing a reverse-lookup. That’s not what maps are good at, and you do a full linear search for the value, using the (slow) pointer-based data structure. Your three variations of this code doesn’t change this, and this first one is the best (other than your missing const
). Of course, you are just implementing std::find
that already exists in the library.
You need a second map that indexes the reverse operation. Be sure to keep them updated in sync. There is a Boost multi-index map that you can use, too.
Note that every time you update the frequency, you need to change the reverse index, and that can end up being expensive. If you do lots of adding before any lookups, then don’t create the reverse index until you are done adding. I did something like that recently, and switched from “input mode” to “readout mode” by resorting how I kept the data.
If you are only doing 1 lookup, then sorting/indexing it is more work than you need.
You can instead try speeding up the search my using a Boost flat_map
. This keeps everything in a sorted vector instead of a dynamic-memory tree, so traversing it in order will be much faster. The bottleneck is the memory access: trees of pointers hit cold (uncached) memory at every step, and it can’t prefetch because it doesn’t know where it’s going. The vector is a contiguous range of memory which means several consecutive entries will be in the same cache row, and if you’re scanning through it the CPU will catch on and start prefetching the next block! The speed difference on modern architectures is astounding. There is an essay on the opening doc page of Boost’s flat map that explains in detail.
In the similar task I mentioned earlier, I ended up just using a sorted std::vector
, so I had full control over sorting, re-sorting, and searching. Note that you don’t have to implement a lot, but can do it all using the supplied Algorithms library: Use std::lower_bound
to either find the element or the position where it should be inserted to maintain sort order.
If lookup is clearly done after all addition/removal is performed, and you will have a lot of lookups, then you can re-sort based on the other field. But since sorting is n log n and a single unsorted find
is order of n, if you have fewer than log n lookups (as a first approximation) it’s faster to just do a linear search. You might also try using a hash table like std::unordered_map
but these are also slow for the same reasons explained above, contrary to what is taught in CS classes (because k
is so extremely large — memory access is the bottleneck).
other tips
typedef std::map<int, int> Map;
Don’t use typedef
anymore. Use using
instead, which is more general and works with templates.
for (int j = 0; j < 2; j++) {
cin >> input[j];
}
This could be:
for (auto& x : input) cin>>x;
if(does_exist_map(frequencies, input[1])){
fout << "1n";
}
else{
fout << "0n";
}
You’re just printing the result of the function call as 0 or 1, but you’re duplicating entire statements.
const bool b = does_exist_map(frequencies, input[1]);
fout << b << 'n';
You can control how bool
values print via formatting flags. I think “no alpha” is the default though.