json2CSV fails because of lack of RAM memory

Posted on

Problem

I run this code on my Heroku Node.js server in order to get CSV for few hundred rows, but it fails with R14 - Memory Quota Exceeded.
It worked fine when there were less rows in DB, even without writing to file, I was able to write CSV directly to HTTP response.
What can I do to solve this?

var mongoose = require("mongoose");
const fs = require('fs');
const path = require('path')
const Json2csvParser = require("json2csv").Parser;
var Follow = require("../models/follow");
const fields = ["brand", "handle", "title", "followDateTime", "posts", "followers", "following", "description", "link", "isPrivate"];

module.exports = function(app) {
    app.get("/csv", (req, res) => {
        Follow.find({}).exec(function(err, follows) {
            if (err) {
                res.status(500).send(err);
            } else {
                let csv;
                try {
                    const json2csvParser = new Json2csvParser({ fields });                  
                    csv = json2csvParser.parse(follows);
                } catch (err) {
                    return res.status(500).json({ err });
                }
                const dateTime = Date.now();
                const filePath = path.join(__dirname, "..", "public", "exports", "csv-" + dateTime + ".csv");

                fs.writeFile(filePath, csv, function (err) {
                    if (err) {
                      return res.json(err).status(500);
                    }
                    else {
                      setTimeout(function () {
                        fs.unlinkSync(filePath); // delete this file after 30 seconds
                      }, 30000)
                      return res.json("/exports/csv-" + dateTime + ".csv");
                    }
                  });
            }
        });
    });
};
```

Solution

Before we dive into code, I’d like to ask: Does this CSV really need to be generated on-the-fly/on-demand? If the answer is no, you can probably run a cron job using mongoexport. This way, you avoid Node altogether.

Libraries aren’t immune to memory limits. They create objects too! In this case, it starts when you loaded up a lot of Follow entries into follows. This is compounded when you converted all of that data into CSV. Under the hood, you 1) loaded a lot of data into an huge array of objects and 2) you converted that huge array of objects into a huge string.

Now luckily, json2csv has a streaming API. What this means is that, instead of processing all of your results in one go in memory, you have the option to build that CSV by chunk. Since we’re dealing with an array of objects instead of strings, buffers and arrays of raw data, you should look at the “object mode”.

So what you do is set up a pipeline – a bunch of functions connected together and called one after the other with the output of the previous being the input of the next. Data is streamed into this pipeline, transforming on every transform function it passes through until it reaches the end.

In your case, it would look something like this:

const { createWriteStream } = require('fs')
const { Readable } = require('stream')
const { Transform } = require('json2csv')

// ...somewhere in your controller code...

// With streams, everything starts with and ends with a stream. This is the start.
const input = new Readable({ objectMode: true })
input._read = () => {} // Not sure what this is for, really.

// We set up the transformer, the thing that converts your object into CSV rows.
const json2csv = new Transform ({ fields }, { objectMode: true })

// Create a stream to the file. This is the end of the line.
const output = createWriteStream('./output')

// You can optionally listen to the output when all the data is flushed
output.on('finish', () => { /* all done */ })

// We connect the dots. So this reads like:
// 1. Read data into input.
// 2. Input gets fed into json2csv.
// 3. The output of json2csv is fed into output (which is our file).
const stream = input.pipe(json2csv).pipe(output)

// Start pumping data into the stream. You could use setInterval if you wanted.
follows.forEach(o => input.push(obj))

// Close the input.
input.push(null) 

Now I’m not too familiar with Mongoose. But if it has an API that exposes your results as a stream, you can skip object mode and pipe that stream to json2csv instead. This way, the entire thing uses streams and at no point would all data be stored in memory. They get loaded, processed and flushed a few pieces at a time.

const stream = streamFromMongoose.pipe(json2csv).pipe(output)

Leave a Reply

Your email address will not be published. Required fields are marked *