Creating and storing binary arrays

Question

I recently created a library in C, which creates and stores binary arrays. My library uses the standard fixed data types and allocates no more than 8 bytes for a binary array (8 bytes = 64-bit unsigned long long number). It also sets the binary equivalent at the bit level using the setBitsAt() method to make it quite efficient.

I want to know if there is a more efficient way of creating and storing binary arrays in C.

BinaryLibrary4C

Code snippet to create/store a binary array:

uint8_t* getBinaryArray(uint64_t num)
{
    int8_t encodeBitIndex = 0,byteIndex = -1;
    int8_t encodeBits = getEncodingBits(num);

    binaryArrayLen = getBytes4mBits(encodeBits);
    binaryArrayLen = (binaryArrayLen <= 0 || binaryArrayLen > BYTE) ? 0x1 : binaryArrayLen; /*Sanity check for size > 64bits*/

    int8_t bitIndex = 0,binValue;
    uint8_t *binaryArray = (uint8_t*)malloc(binaryArrayLen); /*Dynamic Array to store binary equivalent*/

    memset(binaryArray,0x0,binaryArrayLen); /*Set 0 as initial value to Array*/

    /*Storing binary equivalent in 1-bit each of binaryArray*/
    for (encodeBitIndex = 0; encodeBitIndex < encodeBits; encodeBitIndex++,bitIndex++)
    {

        if(isNextByte(encodeBitIndex))
        {
            byteIndex += 1;
            bitIndex = 0; /*-_- reset bitIndex for every byte*/
        }

        binValue = ((num >> encodeBitIndex) & 1) ?  1 : 0;
        setBitsAt((binaryArray + byteIndex),binValue,bitIndex,encodeBits);
    }
    return binaryArray;
}

void setBitsAt(uint8_t *dest,uint64_t bits,uint8_t at,uint8_t nBits)
{
    uint64_t mask = ((~0ULL) >> (sizeof(uint64_t) * BYTE - nBits)) << at;
    *dest = (*dest & ~mask)|((bits<<at) & mask);
}

const int8_t getEncodingBits(uint64_t num)
{
    return getRoundedBits((int8_t)fabs(floor(negLog2(num) - 1) + 1));
}

long double negLog2(uint64_t num)
{
    long double negLogVal = 0.0f;
    negLogVal = (num < 0) ? (sizeof(num) * BYTE) : (log2l(1.0L) - log2l(num));
    return isNumInMaxRange(num) ?  fabs(negLogVal) + 1 : negLogVal;
}

bool isNumInMaxRange(uint64_t num)
{
    return ((num == (UINT8_MAX  + 1U) || num == (UINT16_MAX  + 1U) || num == (UINT32_MAX  + 1ULL))) ?  true : false;
}

const int8_t getRoundedBits(int8_t bits)
{
    int8_t roundedBits;
    for(roundedBits = BYTE; roundedBits <= QWORD; roundedBits+=BYTE)
    {
        if(bits >= 0 && bits <= roundedBits)
            return roundedBits;
    }
    return -1;
}

What kind of efficiency are you looking for? Execution time? Throughput? Memory? — Ben Steffan, Commented Nov 24, 2017 at 19:18
@BenSteffan efficient in terms of memory and speed (execution time) because that what matters the most . — HeavenHM, Commented Nov 24, 2017 at 19:48
Memory or speed? Pick one. A lot of optimizations for memory are not optimal for speed and vice versa. — Ben Steffan, Commented Nov 24, 2017 at 22:45
@BenSteffan its quite efficient in memory i guess , i am looking for speed optimisation now . — HeavenHM, Commented Nov 24, 2017 at 23:12
Can you explain what setBitsAt() is supposed to do? I don't think it does what you think it does, but I can't be sure because I don't know what it is supposed to do. — JS1, Commented Nov 25, 2017 at 6:19

Edward · Accepted Answer · 2019-01-13 12:48:46Z

I see a number of things that may help you improve your program.

Understand `const`

In a number of places in the code we have function declarations like this:

const int8_t getRoundedBits(int8_t bits)

The const is ignored by the compiler in this case, so it's merely confusing to the reader. One can't have type qualifiers (such as const) on a return type, which makes sense if one thinks about it. It makes no sense to return an int8_t value that can't be altered. It's like going to the market and buying an egg, and being told, "You may have this egg, but you are not allowed to alter it."

Think carefully about data types

The code currently contains this function:

long double negLog2(uint64_t num)
{
    long double negLogVal = 0.0f;
    negLogVal = (num < 0) ? (sizeof(num) * BYTE) : (log2l(1.0L) - log2l(num));
    return isNumInMaxRange(num) ?  fabs(negLogVal) + 1 : negLogVal;
}

Since num is an unsigned type, the test for num < 0 will never return true, so this, too is useless code that the compiler easily ignores, but is confusing to a human reader of the code.

Simplify your code

The code contains this peculiar function:

bool isNumInMaxRange(uint64_t num)
{
    return ((num == (UINT8_MAX  + 1U) || num == (UINT16_MAX  + 1U) || num == (UINT32_MAX  + 1ULL))) ?  true : false;
}

First, the function uses the trinary operator (?:) where it isn't needed and simply clutters the code. Second, there are a number of parentheses that aren't needed.

Avoid global variables

By the context, it appears that the binaryArrayLen variable is global. It's generally better to explicitly pass variables into and out of your functions rather than using the vague implicit linkage of a global variable. Not least, the current implementation would seem to preclude the possibility of ever having more than one BinaryArray. In this case, it seems to me that a BinaryArray should be a struct containing both the data and the length.

Avoid conversions to and from floating point

We have a function getEncodingBits() which appears to be intending to return the number of bits required to encode the given number. It looks like this:

int8_t getEncodingBits(uint64_t num)
{
    return getRoundedBits((int8_t)fabs(floor(negLog2(num) - 1) + 1));
}

The other functions have still more floating point mathematics in them, none of which is not really needed and will slow down your program. Here's an alternative approach that is much simpler and uses no floating point:

int8_t getEncodingBits(uint64_t num)
{
    int8_t bits = 8;
    for (num >>= 8; num; num >>= 8) {
        bits += 8;
    }
    return bits;
}

Understand the standard library

The code currently includes these lines:

uint8_t *binaryArray = (uint8_t *)malloc(binaryArrayLen);
memset(binaryArray,0x0,binaryArrayLen);

First, memory allocation can fail, so the value of binaryArray must be checked before using it. Second, if you really need the memory cleared, simply use calloc instead.

General comments

Handling this one bit at a time in the manner that this program does is a very inefficient mechanism. For speed, it's better to handle as much data at one time as possible. So instead of extracting and writing one bit at a time, it would make more sense to write the data in uint64_t sized chunks where possible. Allocating memory in small chunks is also time-consuming. In this case, the maximum data size is apparently uint64_t, so it would make sense to me to simply use uint64_t everywhere that you're currently using a BinaryArray. Example:

#include <stdio.h>
#include <stdint.h>

typedef uint64_t BitArray;

int printBitArray(BitArray n) {
    const int len = sizeof(BitArray) * 8;
    char string[len + 1];
    char *ptr = &string[len];
    *ptr-- = ‘\0’;
    if (n == 0) {
        *ptr-- = '0';
    } else {
        for (*ptr = '0'; n; n >>= 1) {
            *ptr-- = '0' + (n & 1);
        }
    }
    return printf("%s\n", ptr+1);
}
BitArray setBit(BitArray n, int bitnum) {
    const BitArray mask = 1 << bitnum;
    return n | mask;
}
BitArray clrBit(BitArray n, int bitnum) {
    const BitArray mask = 1 << bitnum;
    return n & ~mask;
}

int main() {
    // set to 10101101 binary
    BitArray bits = 173u; 
    printBitArray(bits);
    bits = setBit(bits, 14);
    printBitArray(bits);
    bits = clrBit(bits, 14);
    printBitArray(bits);
    bits = clrBit(bits, 0);
    printBitArray(bits);
}

Stack Exchange Network

Creating and storing binary arrays

1 Answer 1

Understand `const`

Think carefully about data types

Simplify your code

Avoid global variables

Avoid conversions to and from floating point

Understand the standard library

General comments

Hot Network Questions

Creating and storing binary arrays

1 Answer 1

Understand const

Think carefully about data types

Simplify your code

Avoid global variables

Avoid conversions to and from floating point

Understand the standard library

General comments

Related

Hot Network Questions

Understand `const`