Optimizing priority queue streaming algorithm in C++

Posted on

Problem

This is a sorting algorithm that I wrote for finding the maximum nn elements at any time in a stream of IDs and integers with unknown length. I primarily do my coding in Python, so I’m not too familiar with C++ best practices, but I wanted to write this in C++ to gain some efficiency and speed.

For all of you seasoned C++ programmers; am I following best practices here or not? If not, what improvements can I make to make this more professional? Lastly, are there any optimizations that I can make to increase its speed?

#include <iostream>
#include <fstream>
#include <queue>
#include <vector>
#include <map>
#include <chrono>

// Overload comparison operator to create a min heap
struct compare
{
  bool operator()(const int& l, const int& r)
  {
    return l > r;
  }
};

int main()
{
  // Input filenames
  std::cout << "Enter a filename for the input stream: ";
  std::string filename;
  std::cin >> filename;
  std::ifstream infile(filename);

  std::cout << "Enter a filename for the output stream: ";
  std::string output;
  std::cin >> output;
  std::ofstream outfile(output);

  // Input a value for k
  std::cout << "Enter a value for k: ";
  int k;
  std::cin >> k;
  std::cout << std::endl;

  // Declare priority queue and map
  int elements = 0;
  int replacements = 0;
  int value;
  std::string id;
  std::priority_queue<int, std::vector<int>, compare> heap;
  std::map<int, std::vector<std::string>> ids;

  // Begin read/sort timer
  std::cout << "Beginning read and sort ..." << std::endl;
  auto readsortstart = std::chrono::high_resolution_clock::now();

  // Begin read and sort
  while (infile >> id >> value){
    elements++;
    if (heap.size() < k){
      ids[value].push_back(id);
      heap.push(value);
    } else if (heap.top() < value) {
      ids[value].push_back(id);
      heap.pop();
      heap.push(value);
      replacements++;
    }
  }

  // End timer
  auto readsortstop = std::chrono::high_resolution_clock::now();
  auto readduration = std::chrono::duration_cast<std::chrono::milliseconds>(readsortstop-readsortstart).count();
  std::cout << "Read and sort completed." << std::endl << std::endl;

  // Ensure that k is not larger than the total number of lines
  k = std::min(k,elements);

  // Begin write timer
  std::cout << "Beginning write ..." << std::endl;
  auto writestart = std::chrono::high_resolution_clock::now();

  // Pop k entries off of the sorted heap to retrieve the k largest results
  for (int i = 0; i < k; i++) {
    int key = heap.top();
    outfile << ids[key].back() << " " << heap.top() << std::endl;
    ids[key].pop_back();
    heap.pop();
  }

  // End write timer
  auto writestop = std::chrono::high_resolution_clock::now();
  auto writeduration = std::chrono::duration_cast<std::chrono::milliseconds>(writestop-writestart).count();
  std::cout << "Write completed." << std::endl << std::endl;

  std::cout << "----------------------- Statistics ------------------------" << std::endl 
            << "  Total number of replacements: " << replacements << std::endl
            << "  Total number of elements:     " << elements << std::endl
            << "  Read/sort time elapsed:       " << readduration << "ms" << std::endl
            << "  Write time elapsed:           " << writeduration << "ms" << std::endl;
}

Solution

You could create a

struct Element {
   int value;
   std::string id;
};

and store that in the priority queue. Then you don’t need the map. Your compare function would of course only look at the value component:

struct compare {
   bool operator()(const Element& l, const Element& r) {
       return l.value > r.value;
   }
};

Note that it is possible to use a naked function as comparator, it doesn’t need to be a functor:

bool compare(const Element& l, const Element& r) {
   return l.value > r.value;
}

std::priority_queue<Element, std::vector<Element>, decltype(compare)> heap(compare);

Leave a Reply

Your email address will not be published. Required fields are marked *