3
\$\begingroup\$

I am trying to solve longest collatz sequence problem under 1000000 with the below code. Can anyone suggest a faster way to approach this problem? I was thinking of dynamic programming, but I'm having trouble in understanding it.

#include <stdio.h>

long long int col(long long int n)
{
    if(n==1)
    {
        return 1;
    }
    else
    {
        if(n%2==0)
            return (1+col(n/2));
        else
            return (1+col(3*n+1));
    }
}

int main()
{
    long long int i=0, c, max, k=1;
    max=1;
    for(i=1; i<1000000; i++)
    {
        c=col(i);
        if(c>max)
        {
            max=c;
            k=i;
        }
    }

    printf("%lld",max);
    printf("\n%lld\n",k);
    return 0;
}
\$\endgroup\$
2
  • 1
    \$\begingroup\$ if you don't mind to use memory, could have a container, say int cache[1000] = {0};, and save in every element of the container the collatz chain length when starting with that number. Using this cache could stop calculating a lot of collatz chain length when encounter an already calculated number. \$\endgroup\$
    – NetVipeC
    Commented Aug 30, 2014 at 3:16
  • 1
    \$\begingroup\$ This looks a lot like a solution for projecteuler.net/problem=14 . If it is, this could be included in the question and the question could have the relevant project-euler tag. \$\endgroup\$
    – SylvainD
    Commented Oct 22, 2014 at 14:57

2 Answers 2

2
\$\begingroup\$

Well, it appears you are having a bit of trouble getting things off-the-ground, so to speak. Let's start with the basics. Let's name your file collatz.c and after a quick glance, it looks like it should compile:

gcc -Wall -Wextra -o ctz collatz.c

Good, it compiled with no errors and no warnings. Now let's see if it will run:

output:

$ time ./ctz
525
837799

real    0m2.088s
user    0m2.082s
sys     0m0.004s

Also good, max is 525 and k is 837799 and it completed in less than 2.1 seconds. The logic is implemented in a single recursive function akin to that used by a non-math-lib power function, so no great speed improvements come to mind. As was pointed out, there are optimization that can help reduce the execution time. Let's try the suggested -Ofast -fwhole-program:

gcc -Wall -Wextra -o ctz collatz.c -Ofast -fwhole-program

output:

$ time ./ctz
525
837799

real    0m0.480s
user    0m0.474s
sys     0m0.003s

A 400+% improvement. That's better. So it looks like your work is done. Drop a comment or edit your question if you have more specifics in mind.

\$\endgroup\$
8
  • \$\begingroup\$ ...although there's no optimization! \$\endgroup\$
    – ikh
    Commented Aug 30, 2014 at 2:42
  • \$\begingroup\$ With -Ofast -fwhole-program of gcc 4.8.3, it takes only 634ms 561ms 46ms (real/user/sys) \$\endgroup\$
    – ikh
    Commented Aug 30, 2014 at 2:44
  • 1
    \$\begingroup\$ Your computer is much much faster than this old box. \$\endgroup\$
    – David C. Rankin
    Commented Aug 30, 2014 at 2:46
  • \$\begingroup\$ Oh, sorry :) but in your case the time will be also decreased with optimization. It takes 1488ms 1419ms 31ms without optimization in my case >o< \$\endgroup\$
    – ikh
    Commented Aug 30, 2014 at 2:49
  • \$\begingroup\$ (after seeing edit) Hey, the really slow box was mine! (It's okay, because it's actually not mine; it's my school's >o<) \$\endgroup\$
    – ikh
    Commented Aug 30, 2014 at 2:50
1
\$\begingroup\$

I took the suggestion from NetVipeC and modified your program a bit. The changes were:

  1. I used a #defined constant MAXNUM to control the number of iterations.
  2. I added a result array to store previous results.
  3. In the col function, it now ends when it finds a previous result, instead of just stopping at 1.

The Modified Program

#include <stdio.h>

#define MAXNUM  1000000

unsigned short result[MAXNUM];

long long int col(long long int n)
{
    if (n < MAXNUM && result[n] != 0)
        return result[n];
    if(n%2==0)
        return (1+col(n/2));
    else
        return (1+col(3*n+1));
}

int main()
{
    long long int i=0, c, max, k=1;
    max=1;
    result[1] = 1;
    for(i=2; i<MAXNUM; i++)
    {
        c=result[i]=col(i);
        if(c>max)
        {
            max=c;
            k=i;
        }
    }

    printf("%lld",max);
    printf("\n%lld\n",k);
    return 0;
}

I also went ahead and rewrote your program to remove recursion, add in a small optimization, and allow the max to be set by a command line argument:

The Rewritten Program

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <inttypes.h>

#define DEFAULT_MAXNUM        100000000

static uint16_t *result;

uint16_t collatz(uint64_t n)
{
    uint64_t originalNumber = n;
    uint16_t count          = 1;

    while (1) {
        if (n & 1) {
            n = n*3+1;
        } else {
            n >>= 1;
            if (n < originalNumber)
                return count + result[n];
        }
        count++;
    }
}

int main(int argc, char *argv[])
{
    uint64_t i         = 0;
    uint16_t count     = 0;
    uint16_t bestCount = 1;
    uint64_t bestNum   = 1;
    uint64_t maxNum    = DEFAULT_MAXNUM;

    if (argc > 1) {
        maxNum = strtoull(argv[1], NULL, 0);
        if (maxNum < 2)
            maxNum = 2;
    }

    result = malloc(maxNum * sizeof(result[0]));
    if (result == NULL) {
        fprintf(stderr, "Error: Not enough memory\n");
        return 1;
    }

    result[1] = 1;
    for(i=2; i<maxNum; i++) {
        count = result[i] = collatz(i);
        if (count > bestCount) {
            bestCount = count;
            bestNum   = i;
        }
    }

    printf("Max considered: %" PRIu64 "\n", maxNum);
    printf("Largest number: %" PRIu64 "\n", bestNum);
    printf("Largest count : %d\n", bestCount);
    return 0;
}

The Timing Results

Here are the timing results from the three programs. Note that I used 100000000 as the number of iterations (100x the original amount) to get longer run times. I also used gcc -O4 to build all programs.

Original program : 37.7s
Modified program :  2.0s
Rewritten program:  1.2s
\$\endgroup\$