4
\$\begingroup\$

Here's a function I use to generate a 2.5 gig SQL dump file for testing. It works but it takes a long time. How can I make it more efficient?

const crypto = require("crypto");
const fs = require('fs');
const os = require('os');
const path = require('path');

(async function main(){
    var dumpfile = await generate_large_dump();
    console.log("\nDump created: ", dumpfile);
})();

function generate_large_dump(){
    return new Promise(async (resolve, reject)=>{
        var total_bytes = 0;
        const target_bytes = 2.5 * 1e+9;
        const target_file = path.join(os.tmpdir(), 'large_dump.sql');

        const writerStream = fs.createWriteStream(target_file, {flags: 'w'});
        writerStream.on('error', console.error);

        const age = ()=>Math.floor(Math.random() * (95 - 18 + 1)) + 18;
        const name = ()=>crypto.randomBytes(16).toString("hex");

        const write = str => new Promise(resolve=>{
            total_bytes += Buffer.byteLength(str, 'utf8');
            writerStream.write(str, resolve);
            var pct = Math.min(100, Math.floor(total_bytes / target_bytes * 10000)/100);
            process.stdout.clearLine();
            process.stdout.cursorTo(0);
            process.stdout.write(pct+"% complete");
        });

        var create_sql = "CREATE TABLE `sample_table` (`id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(250) NOT NULL, `age` int(11) NOT NULL, `date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`));\n";
        await write(create_sql);

        while(total_bytes < target_bytes) await write("INSERT INTO `sampe_table` (`name`, `age`) VALUES ('"+name()+"', '"+age()+"');\n");

        writerStream.end();
        process.stdout.write("\n");
        resolve(target_file);
    }); 
}
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers. \$\endgroup\$
    – Mast
    Commented May 18, 2020 at 17:41

2 Answers 2

4
\$\begingroup\$

The main culprit here is your progress indicator.

You're continuously refreshing a line in stdout every time write is called, and write is called a very large number of times. If you reduce the frequency of console writes, you'll make your script a whole lot faster. One option is to write to the console only every 0.5 seconds or so:

// Performance checker:
setTimeout(() => {
    console.log('Process after 10 seconds: ', Math.floor(total_bytes / target_bytes * 10000)/100);
}, 10000);
// Actual code:
let timeoutId;
const write = str => new Promise(resolve=>{
    total_bytes += Buffer.byteLength(str, 'utf8');
    writerStream.write(str, resolve);
    if (!timeoutId) {
        timeoutId = setTimeout(() => {
            process.stdout.clearLine();
            process.stdout.cursorTo(0);
            const pct = Math.min(100, Math.floor(total_bytes / target_bytes * 10000)/100);
            process.stdout.write(pct+"% complete");
            timeoutId = null;
        }, 500);
    }
});

On my machine, this results in a speed improvement from around 0.25% in 10 seconds, to around 2.03% in 10 seconds - an improvement of a whole order of magnitude.

Another thing: if you're going to use ES2015+ syntax - which you are, and should - then always declare variables with const when possible. Never use var, it has too many gotchas to be worth using (such as function scope instead of block scope, the ability to accidentally re-declare it, automatically putting properties on the global object when on the top level in a browser, etc).

\$\endgroup\$
1
  • \$\begingroup\$ thanks! that got me down from 70 minutes to 10 minutes run time. \$\endgroup\$ Commented May 18, 2020 at 17:39
2
\$\begingroup\$

Question

How can I make it more efficient?

In addition to the suggestion by CertainPerformance, you may be able to find a more efficient way to write the data. I haven't tried this before but you could try making a stream (e.g. Readable) to push the lines to, and then use readable.pipe() to pipe the data to the writable stream.

Review

Nesting levels, re-used variable names

This code has more nesting levels than are necessary, and could be considered by some as "callback hell". The function write can be moved out of the function passed to new Promise that gets returned at the end of generate_large_dump, along with all the variables that write() needs like writerStream, target_file, total_bytes, etc. While they would have separate scopes, this can help avoid confusion of variables like resolve, which has a re-used name. If you need to have a nested promise it would be better to use distinct names for the sake of readability.

This would lead to the function passed to the returned promise being much smaller. It could also be pulled out to a named function as well.

Constants

Idiomatic JavaScript, as is the case for many other languages (e.g. C-based) tend to have hard-coded constants declared in ALL_CAPS format - so target_bytes would be better as TARGET_BYTES. It can still be declared within generate_large_dump() to limit the scope unless it would be useful elsewhere.

Braces

While braces obviously aren't required for expressions following while it can be helpful if you ever need to add a line to the block.

while(total_bytes < TARGET_BYTES) await write("INSERT INTO `sampe_table` (`name`, `age`) VALUES ('"+name()+"', '"+age()+"');\n");

Even with bracts the line can stay as a one-liner:

while(total_bytes < TARGET_BYTES) { await write("INSERT INTO `sampe_table` (`name`, `age`) VALUES ('"+name()+"', '"+age()+"');\n"); }

Though some would argue it would be more readable with separate lines:

while(total_bytes < TARGET_BYTES) {
    await write("INSERT INTO `sampe_table` (`name`, `age`) VALUES ('"+name()+"', '"+age()+"');\n");
}

Some style guides disallow keeping the expression on the same line as the control structure - e.g. The Google JS Style guide:

4.1.1 Braces are used for all control structures

Braces are required for all control structures (i.e. if, else, for, do, while, as well as any others), even if the body contains only a single statement. The first statement of a non-empty block must begin on its own line.

Disallowed:

if (someVeryLongCondition())
  doSomething();

for (let i = 0; i < foo.length; i++) bar(foo[i]);

Exception: A simple if statement that can fit entirely on a single line with no wrapping (and that doesn’t have an else) may be kept on a single line with no braces when it improves readability. This is the only case in which a control structure may omit braces and newlines.

if (shortCondition()) foo();

1

\$\endgroup\$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.