Generate random binary arrays

Question

I am fairly new to C++ programming and I, so far, love it. I just have been trying to make sure what I am doing is fast, unique, readable, and efficient. Is there anyway I can improve this code, or is there ways I can improve my coding convention (in C++)? Here I wrote a program that generates a 20 random vectors of binary digits (20 to test "randomness").

#include <stdbool.h>
#include <cstdlib>
#include <iostream>
#include <ctime>

void returnarray(bool* array, int size)
{
    for (int i = 0; i < size; i++)
    {
        std::cout << array[i];
    }
    std::cout << "\n";
}

bool* assignzero(int size)
{
    bool* b = new bool[size];
    for (int i = 0; i < size; i++)
    {
        b[i] = false;
    }

    return b;
}

void generate()
{
    int p = 9;
    int t = rand() % p;
    bool* b = assignzero(p);

    for (int x = 0; x < t; x++)
    {
        b[rand() % p] = true;
    }

    returnarray(b, p);
}

int main()
{
    long inc;
    srand(time(0));
    while (inc < 20)
    {
        generate();
        inc += 1;
    }
}

Use container, instead of raw owner pointer. So you would avoid your memleak. — Jarod42, Commented Jul 12, 2016 at 17:30
What kind of random distribution did you expect to achieve? Were you attempting for each binary digit to have a 50% chance of being true/false? I'm not understanding your thought process of using that t variable, because that is causing your distribution to not be 50/50, so I'm not sure if that's what you intended. — JS1, Commented Jul 12, 2016 at 18:12
@JS1 t is the variable of the random amount of 1's in the vector. The array is predefined as a set of 0s, and is then changed by a random number of 1s. I did not intend a 50/50 chance of each index value to be 1 but rather be determined by a random amount. t could be changed to create 50/50 by int t = p / 2, and the program works just fine. — Tristen Woodruff, Commented Jul 12, 2016 at 18:21
I'll just note that there might not be t ones in your array, since the randomized indices might contain duplicates. That will make the number and distribution of ones in the output somewhat difficult to analyze. — ilkkachu, Commented Jul 13, 2016 at 16:47

forsvarir · Accepted Answer · 2016-07-12 18:14:48Z

Initialise your variables

You aren't initialising inc. This is probably working for you, because the compiler is setting it to 0. This isn't guaranteed though, so you should be explicitly initialising it before you use it:

long inc = 0;
srand(time(0));
while (inc < 20)

Cleanup after yourself

As I mentioned in the comments, you're also leaking memory:

bool* b = new bool[size];

the allocated array is never cleaned up by your application. You should have a paired delete somewhere in the code when you're done with the array:

delete [] b;

Expressive names

Think about using more expressive names. You're referring to the array of bools as both 'b' and 'array' in different parts of your code. One of the names tells your there's going to be more than one, whilst the other name hints at the type.

Your returnarray method actually prints it to the console. If you get the names right, your program will be a lot easier to follow.

Prefer to use a container over a raw pointer. That should be the advice about clean up. — Loki Astari, Commented Jul 13, 2016 at 10:37

Incomputable · Accepted Answer · 2016-07-13 09:44:33Z

Your implementation

First I will write my opinion about your code.

The main reason I wouldn't use your implementation is because it doesn't support iterators. I would love to just pass the iterators of the container to your function and it does what it should. This would deal with 2 problems: managing memory, finding a way to pass the result back to the caller. Additionally, it would solve the problem of the algorithm being non-generic, because with iterators every container which has element type supporting operator=(bool value) would be able to be filled with your algorithm. It is a common practice in C++ to leave iterator range checks to the caller.

The second problem which is already mentioned is random distribution. As of now, it is not practical.

The third problem is legacy srand(). I suggest you to usemt19937 with bernoulli_distribution. It will solve the problem with good distribution. So, lets rewrite the code.

Suggested implementation

First of all we need to identify minimal iterator, so we could make a template relying on the concept. Since we need to write to pointed-to element and we don't need to double pass, move backwards, step to arbitrary distance, we are good with output iterators.

template <typename OutputIt>
void generate(OutputIt first, OutputIt last)

This way, we can call the function like this: generate(container.begin(), container.end(); Furthermore, we can make random number generator customizable:

template <typename OutputIt, typename Engine = std::mt19937>
void generate(OutputIt first, OutputIt last)

Now users of the code can choose to leave default engine or to supply custom one.

Then lets create engine and bernoulli_distribution to have fair distribution:

static Engine engine;
std::bernoulli_distribution distribution;

Notice that engine is static, it will prevent getting the same values on each function call.

Next we just generate random bit and write it into container as we go:

while (first != last)
{
    *first++ = distribution(engine);
}

The reason why we are not using std::transform here is that it doesn't guarantee in order application of the unary operator:

std::transform does not guarantee in-order application of unary_op or binary_op. To apply a function to a sequence in-order or to apply a function that modifies the elements of a sequence, use std::for_each

source: cppreference.

I don't like std::for_each, which doesn't make the code more readable in my opinion.

We are done, we don't need anything else.

Full version

#include <random>
#include <iostream>
#include <vector>

template <typename OutputIt, typename Engine = std::mt19937>
void generate(OutputIt first, OutputIt last)
{
    static Engine engine;
    std::bernoulli_distribution distribution;

    while (first != last)
    {
        *first++ = distribution(engine);
    }
}

int main()
{
    std::vector<int> v1;
    v1.resize(100);
    generate(v1.begin(), v1.end());

    for (const auto& elem : v1)
    {
        std::cout << elem << ' ';
    }

    std::cout<< '\n';        

    std::vector<bool> v2;
    v2.resize(100);
    generate(v2.begin(), v2.end());

    for (const auto& elem : v2)
    {
        std::cout << elem << ' ';
    }
}

Further improvement

If the user would like to keep control of the lifetime of the Engine object we could wrap it into a class, so they would need to keep lifetime of wrapper to keep getting different results:

template <typename Engine = std::mt19937>
class random_bits_generator
{
    Engine engine;
    std::bernoulli_distribution distribution;
public:
    template <typename OutputIt>
    void operator()(OutputIt first, OutputIt last)
    {
        while (first != last)
        {
            *first++ = distribution(engine);
        }
    }

    Engine get()
    {
        return engine;
    }
};

Notice the change in iterator type, input iterator doesn't do what we need, but output iterator does. — Incomputable, Commented Jul 12, 2016 at 21:10
I don't think the static variable engine is a good idea. Better make it an additional parameter: template <typename OutputIt, typename Engine = std::mt19937> void generate(OutputIt first, OutputIt last, Engine& engine). — celtschk, Commented Jul 13, 2016 at 9:20
@celtschk, I was actually thinking about wrapping this into an object with engine as member variable, with 2 functions: operator()(OutputIt first, OutputIt last). and get() to get an engine state. What do you guys think about it? — Incomputable, Commented Jul 13, 2016 at 9:29
Such an object would IMHO be the best solution; however given the beginner tag (and the obviously beginner code) I'm not sure if, at this stage, that code would be helpful for the asker. — celtschk, Commented Jul 13, 2016 at 9:36
@celtschk, I will leave it as further improvement, so asker could iteratively get there. — Incomputable, Commented Jul 13, 2016 at 9:38

Community · Accepted Answer · 2020-06-10 13:24:26Z

Random distribution is off

When you say "random binary vector" I think each digit has a 50% chance of being 0 or 1. What you have actually generated are binary vectors where each digit has a far greater chance of being a 0 than a 1.

Your current algorithm has two steps:

Generate a random number of bits t from 0 to 8.
For each of t bits, randomly place a 1-bit in the output vector, even if it overlaps with a previously placed 1-bit.

Just from step #1, you can see that the random distribution will be off, because 1/9 of the time, you will get all zeroes in the output vector. Also, you will never get an output vector with all ones.

And because in step #2 you allow bits to overlap, you often won't even get as many 1-bits as the t picked in step #1.

Generating `t` bits correctly

From your comment, it seems like you intended to generate t 1-bits, where t could chosen to be in the range 0..p. The way to do that properly would be:

Generate t 1-bits and the rest 0-bits.
Shuffle the vector (for example using a Fisher-Yates shuffle).

Returning the array

I would modify your program to actually return the array instead of calling a function called returnarray(). You could then print the array from main.

KIIV · Accepted Answer · 2016-07-12 20:01:52Z

7

1) Readability is not good. For example I'd expect returning something from returnarray but it seems to be printing it.

2) Efficiency either. Every bool takes whole byte, so you are wasting (and loosing) memory. You can use much more bits from one rand() call (depends on RAND_MAX). It affects speed too, but 'premature optimalisation is root of all evil' for now.

3) Nothing is printed for me. I suppose it's because of undefined value in long inc;

4) It's not so good example for c++ as it's more or less C code (except for new and cout).

5) For loop is more suitable for this occasion in main().

6) Code is not much suitable for testing distribution. And if I rewrited it correctly, distribution is pretty bad. For p=3 and 100000 samples: 0 = 58530 1 = 24860 2 = 8273 3 = 8337 No 4..7 and so...

edited Jul 12, 2016 at 20:01

answered Jul 12, 2016 at 18:31

KIIV

2662 silver badges7 bronze badges

\$\begingroup\$ Thanks for the response. returnarray can be changed to return a string representing each line of the bool array and then I can let main() print it. This way I can also use a global bool array (to save memory?). I now set inc to 0 (simple overlook). \$\endgroup\$
– Tristen Woodruff
Commented Jul 12, 2016 at 18:41
2

\$\begingroup\$ @Tristen, global variables are almost always bad. It doesn't save memory by any means, and returning it as string would confuse people. \$\endgroup\$
– Incomputable
Commented Jul 12, 2016 at 21:07
\$\begingroup\$ While I'm generally for saving memory, I think avoiding the "waste" of seven bytes is not really worth it. Also you could say that you "waste" processor cycles by using a more compact representation, as you need to do bit manipulation in that case. \$\endgroup\$
– celtschk
Commented Jul 13, 2016 at 9:09
\$\begingroup\$ @celtschk it depends on usage. If he wants to do some distribution test, it would be faster to have keys packed. Also one rand() call can be used to set whole byte or more - it spares lots of function calls (precision of rand() is only issue for setting whole unsigned int at once here). \$\endgroup\$
– KIIV
Commented Jul 13, 2016 at 9:34

Add a comment |

Nikita Kakuev · Accepted Answer · 2016-07-12 18:52:49Z

Starting from C++11 the most canonical way to generate any kind of pseudo-random sequence in through the standard library pseudo-random generation facilities. Like this:

#include <random>
#include <vector>
#include <algorithm>

std::vector<bool> generateRandomSequence() 
{
    std::vector<bool> randomSequence;
    randomSequence.resize(20);

    std::random_device rd;
    std::mt19937 generator(rd());
    std::bernoulli_distribution distribution(0.5); // your 50/50 chance

    std::generate(randomSequence.begin(), randomSequence.end(),
        [&generator, &distribution] { return distribution(generator); });

    return randomSequence;
}

This function is immune to memory leaks since it doesn't perform any explicit allocations and is guaranteed to give you the desired "50/50 chance" (rand() % n won't do the trick).

The only catch is the dreaded std::vector<bool> specialization, but in this example, it's not going to do any harm.

Whenever the performance of std::vector<bool> (or really, any other alternative) is important to the program, I recommend measuring. — Edward, Commented Jul 12, 2016 at 21:12

Jerry Coffin · Accepted Answer · 2016-07-12 23:28:50Z

A great deal here depends on exactly what you really want. If you want numbers, each of which has exactly N pseudo-randomly chosen bits set, it's probably easiest to start by setting the chosen number of bits, then use std::shuffle to randomize the positions:

std::vector<bool> bits(total_length - set_bits);

for (int i = 0; i < set_bits; i++)
    bits.push_back(1);

std::shuffle(bits.begin(), bits.end(), rnd);

If we print out the results in binary, we get something like this:

00000100101110101000000010001000
11001010100100000000001001100000
10100100101010000001000100000100
10010001000000001100000011001010
11000010101000010001000001000010
00101000001000001000000110010110
00100010100000011000000110010100
10000000001000101100100100101000
01010110010000000011001000010000
11010011000100000100000100001000

To summarize that: 32 bit numbers, each of which has exactly 9 bits set, but the positions of the bits that are set varies (pseudo-)randomly.

More commonly, you'll want numbers in which each bit has a 1/N chance of being set, so any individual number might have more or less than that set, but in the long term it should tend toward a total of approximately 1/N bits being set. In this case, we can use the library's Bernoulli distribution:

const double odds = 1.0/9.0;
const int total_length = 32;
std::mt19937 rnd{ std::random_device()() };
std::bernoulli_distribution dist(odds);

std::vector<bool> bits;

std::generate_n(std::back_inserter(bits),
    total_length,
    [&] { return dist(rnd); });

Output from this would look something like this:

10000000100000000000001001100111
00000010010101010000000000100010
01000000100000000000111000110000
00001011000000000001001000001100
11010110101111010000001001000100
00000000110000010100100101100000
00000000000001100001110101001010
01010101001011110000100110100000
00000001000100010110000010001000
01001100111000000001011100100000

The previous generator was deterministic when we looked at the last bit of any output number--if we'd already see 9 set bits, then the last one had to be clear (and if we'd only seen 8, it had to be set). This one has independent odds for each bit of the result, so even though we should get about 9/32 bits set as a long-term average, we'll frequently see more or fewer than that set in any particular output. In fact, we should see (for example) all bits set with a frequency of approximately 1/9³² (i.e., about 1 out of 2.9 x 10³¹ times we generate a number).

Now, a few more comments more specifically about your code and how you've written it:

Avoid `new[]`

In most reasonably written code, you should almost never see new used at all, unless you're looking at the internals of an allocator, or something on that order.

Using the array form of new, like T *something = new T[size]; should be even more unusual--to the point that I honestly can't think of a single situation where it's what I'd choose to do/use (and, in fact, I haven't used it in years).

Avoid magic numbers

Right now, the basic intent of your code isn't entirely clear in some places. For example, at least one (and probably a couple) of reviews have mis-read your intent as being to have even odds of a 0 or 1 for any bit in your result. Although you've since clarified in comments that you intended a skewed distribution of 0's vs. 1's, it would be much better if the code were written to make that intent clear¹.

Avoid `rand()` and `srand()`

Unless you need to write your code so it's compatible with C compilers, it's generally better to use the generators and distribution classes from <random> instead. They use specified algorithms, so the quality of results have been tested and are well known. A generator encapsulates its own state, so using one generator won't (can't) affect the state of another. They are a little bit of extra work to use, but generally not so much that it's really a major problem.

^{1. I should probably add that although using odds for the number is probably better than a magic number, it's still not entirely perfect. odds_of_set_bit or something on that order would make the intent even more explicit.}

What should I use instead of new[]? Sorry if the array declaration was unusual, I tried to find functionality, I come from Java and I am used to declaring arrays using the new keyword. Again, I am new to C/C++. I also want to know how could I make it clear that I wanted a skewed distribution of 1s? — Tristen Woodruff, Commented Jul 12, 2016 at 23:53
@Tristen: Usually, std::vector or std::array instead of new. Yes, I realize Java is primitive, and forces you to do most memory management manually, but C++ is a higher level language, so you rarely do with it. :-) I thought the footnote gave at least some idea of how to specify a skewed distribution more explicitly. — Jerry Coffin, Commented Jul 13, 2016 at 0:01
@JerryCoffin "Yes,, I realize Java is primitive" ... "C++ is a higher level language" That's the first time I've heard C++ being higher level than Java. — Hatted Rooster, Commented Jul 13, 2016 at 15:10
@GillBates: keep listening. Most people notice that C++ provides lower level access to the machine than Java (which is true--it does) but ignore the fact that (especially with template meta programming) it also provides (much) higher level abstractions than Java even attempts. — Jerry Coffin, Commented Jul 13, 2016 at 15:15
@JerryCoffin Fair enough, but that doesn't make it a higher level language in my eyes. A GC is something I'd expect from a higher level language. C++ allowing you to shoot yourself in the foot really bad when it comes to memory management with simple code makes it qualify as a lower level language than Java for me. It's all subjective again I suppose though. — Hatted Rooster, Commented Jul 13, 2016 at 15:18

Pharap · Accepted Answer · 2016-07-12 22:50:51Z

Everybody else has covered the most important points, but I desparately need to point out something everyone has missed: stop including stdbool.h.

In C++ the bool type is built in to the language, stdbool.h is a hangover from the days of C where C++ inherited most of C's headers.

Which brings me to point two, all the former C headers with their original names have been deprecated and renamed to a c-prefixed version. A couple of examples:

stdbool.h (deprecated) is now cstdbool (not deprecated). stdlib.h (deprecated) is now cstdlib (not deprecated). stdio.h (deprecated) is now cstdio (not deprecated).

A full list can be found here under the Deprecated headeing.

And finally, if those weren't enough to convince you, when C++17 is put into action, cstdbool is among 4 C compatibility headers slated to be deprecated, which means that as of next year the official advice will be to avoid using them because they might eventually be removed completely.

Stack Exchange Network

Generate random binary arrays

7 Answers 7

Your implementation

Suggested implementation

Full version

Further improvement

Random distribution is off

Generating `t` bits correctly

Returning the array

Avoid `new[]`

Avoid magic numbers

Avoid `rand()` and `srand()`

Linked

Hot Network Questions

Generate random binary arrays

7 Answers 7

Your implementation

Suggested implementation

Full version

Further improvement

Random distribution is off

Generating t bits correctly

Returning the array

Avoid new[]

Avoid magic numbers

Avoid rand() and srand()

Linked

Related

Hot Network Questions

Generating `t` bits correctly

Avoid `new[]`

Avoid `rand()` and `srand()`