5
\$\begingroup\$

my task is to generate random strings and store in a file till the size of file is less than 10MB. My approach towards this problem is as follows.

import java.io.File;
import java.io.FileWriter;

public class Application {
    public static void main(String[] args) throws Exception {
        long start = System.currentTimeMillis();
        File file = new File("Hello.txt");
        file.createNewFile();
        FileWriter writer = new FileWriter(file);

        while (file.length() <= 1e+7) {
            writer.write(Generator.generateRandomString(12, Generator.Mode.ALPHANUMERIC));
            writer.write("\n");
            writer.write(Generator.generateRandomString(12, Generator.Mode.NUMERIC));
            writer.write("\n");
            writer.write(Generator.generateRandomString(12, Generator.Mode.ALPHA));
            writer.write("\n");
        }
        writer.flush();
        writer.close();
        long end = System.currentTimeMillis();
        System.out.println((end - start) / 1000f + " seconds");
    }
}

It takes about 100.495 seconds to complete on my i3 processor with 4GB of RAM. So my question is, how to improve this performance? Why is the performance so poor? Is there any another workaround for this?

\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

It's the repeated calls to file.length() that is slowing you down.

Let's try this

while (file.length() <= 1e+7) {
  writer.write("abcdefghijkl");
  writer.write("\n");
  writer.write("abcdefghijkl");
  writer.write("\n");
  writer.write("abcdefghijkl");
  writer.write("\n");
}

On my machine that took ~51s.

Now let's try this:

for (int length = 0; length <= 1e+7; length += 39) {
  writer.write("abcdefghijkl");
  writer.write("\n");
  writer.write("abcdefghijkl");
  writer.write("\n");
  writer.write("abcdefghijkl");
  writer.write("\n");
}

Approximately 0.3s. And a quick sanity check:

$ du -h Hello.txt
9.6M    Hello.txt

There's a problem with the original method, which is the interaction of file.length() and using a buffered writer. Let's try to write up to 400 bytes:

while (file.length() <= 400) {
  writer.write("abcdefghijkl");
  writer.write("\n");
}

We actually end up with a lot more because of the buffered writer:

$ du -b Hello.txt
8203    Hello.txt

A similar problem is also present in the method presented here: we end up writing 403 bytes, not because of buffering but because of how we're counting. If you want to be precise about the total file size, you will need to modify it.

\$\endgroup\$
3
  • \$\begingroup\$ Can you provide some insight on why length+=39. Btw your code works blazingly fast. \$\endgroup\$
    – ajknzhol
    Commented Dec 9, 2014 at 5:26
  • \$\begingroup\$ Nice catch. I never realized exactly how slow Files.length() is. I was in the process of working things up 'my own way' using a DataOutputStream which has a size() method, but your 'track' got the right answer first. \$\endgroup\$
    – rolfl
    Commented Dec 9, 2014 at 5:41
  • 1
    \$\begingroup\$ @ajkumar25 sorry, wrote it in a hurry. Three strings of twelve characters + 3 new line characters = 39 characters = 39 bytes (assuming one byte per character -- which is not necessarily true, just assuming UTF-8 encoding and all chars printed are single-byte; you should specify the encoding). Note this isn't an example of good code, I just wanted to demonstrate the speed difference. Also there may be a difference in the output of the two methods due to buffering. \$\endgroup\$
    – mjolka
    Commented Dec 9, 2014 at 5:42

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.