Python Octree Implementation

Question

I'm working with 3D point clouds stored in Numpy arrays. I'd succesfully used the scipy's KDTree implementation for task like k-neighbors search and outlier filtering.

However I wanted to try the octrees as an alternative data structure for other task like downsampling.

I was looking for an octree implementation in Python but I dind't find what I was looking for and as a result of that, I tried to implement a solution by myself.

The solution that I've writed so far, works, but it scales pretty bad when it has to manage "large" point clouds (+ 3 Million points) and when the number of divisions goes up.

The first thing that I use the octree for was to replace the points cointained in each of the nodes by their centroid, in order to downsample the density of the point cloud.

I thought that the fastest way to achieve that was to assing an index to each point indicating to wich node it belongs.

PART 1

Starting from a point cloud (I would use a really small point cloud to simplification) stored in a numpy array:

xyz

array([[ 0.   ,  0.   ,  0.   ],
       [ 0.125,  0.125,  0.125],
       [ 0.25 ,  0.25 ,  0.25 ],
       [ 0.375,  0.375,  0.375],
       [ 0.5  ,  0.5  ,  0.5  ],
       [ 0.625,  0.625,  0.625],
       [ 0.75 ,  0.75 ,  0.75 ],
       [ 0.875,  0.875,  0.875],
       [ 1.   ,  1.   ,  1.   ]])

Where the each rows represents 1 point and each colum the corresponding x, y and z coordinates.

I get the minimum bounding box of the point cloud as follows:

xyzmin = np.min(xyz,axis=0)
xyzmax = np.max(xyz,axis=0)

And use it for obtain the first 8 nodes of the octree, using the next function:

SPLIT FUNCTION

def split(xyzmin, xyzmax):
    """ Splits the node defined by xyzmin and xyzmax into 8 sub-nodes

    Parameters
    ----------
    xyzmin: (3,) ndarray
        The x,y,z minimum coordinates that delimite the node

    xyzmax: (3,) ndarray
        The x,y,z maxymum coordinates that delimite the node

    Returns
    -------
    nodes : (8,2,3) ndarray
        The x,y,z minimum and maximum coordiantes of the sub-nodes grouped along the
        second dimension

    xyzmed = (xyzmax + xyzmin) / 2

    nodes = np.array([
    [[xyzmin[0], xyzmin[1], xyzmin[2]], [xyzmed[0], xyzmed[1], xyzmed[2]]],
    [[xyzmin[0], xyzmed[1], xyzmin[2]], [xyzmed[0], xyzmax[1], xyzmed[2]]],
    [[xyzmed[0], xyzmed[1], xyzmin[2]], [xyzmax[0], xyzmax[1], xyzmed[2]]],
    [[xyzmed[0], xyzmin[1], xyzmin[2]], [xyzmax[0], xyzmed[1], xyzmed[2]]],
    [[xyzmin[0], xyzmin[1], xyzmed[2]], [xyzmed[0], xyzmed[1], xyzmax[2]]],
    [[xyzmin[0], xyzmed[1], xyzmed[2]], [xyzmed[0], xyzmax[1], xyzmax[2]]],
    [[xyzmed[0], xyzmed[1], xyzmed[2]], [xyzmax[0], xyzmax[1], xyzmax[2]]],
    [[xyzmed[0], xyzmin[1], xyzmed[2]], [xyzmax[0], xyzmed[1], xyzmax[2]]], 
    ])

    return nodes

Wich give us as a result:

nodes

array([[[ 0. ,  0. ,  0. ],
        [ 0.5,  0.5,  0.5]],

       [[ 0. ,  0.5,  0. ],
        [ 0.5,  1. ,  0.5]],

       [[ 0.5,  0.5,  0. ],
        [ 1. ,  1. ,  0.5]],

       [[ 0.5,  0. ,  0. ],
        [ 1. ,  0.5,  0.5]],

       [[ 0. ,  0. ,  0.5],
        [ 0.5,  0.5,  1. ]],

       [[ 0. ,  0.5,  0.5],
        [ 0.5,  1. ,  1. ]],

       [[ 0.5,  0.5,  0.5],
        [ 1. ,  1. ,  1. ]],

       [[ 0.5,  0. ,  0.5],
        [ 1. ,  0.5,  1. ]]])

Then I use those nodes to asing an index to each point according to inside of which of the nodes it can be found, by using this function:

GET INDEX FUNCTION

@jit(nopython=True)
def get_index(xyz, nodes):
    """ Get the index values indicatin to wich of the first 8 nodes each point belong
    """

    index = []

    for i in range(len(xyz)):

        n = 0

        for j in range(len(nodes)):

            if xyz[i,0] >= nodes[j,0,0] and xyz[i,1] >= nodes[j,0,1] and xyz[i,2] >= nodes[j,0,2]:

                if xyz[i,0] <= nodes[j,1,0] and xyz[i,1] <= nodes[j,1,1] and xyz[i,2] <= nodes[j,1,2]:

                    index.append(n+1)

                    break

                else:
                    n += 1
            else:
                n += 1

    return index

Aplying this function to the point cloud we obtain the following indices:

index
[1, 1, 1, 1, 1, 7, 7, 7, 7]

And I use those indices to filter the nodes that are empty in order to discard them:

u_index = np.unique(index)
u_index

array([1, 7])

#: filter the empty nodes         
nodes = nodes[u_index-1]
nodes

array([[[ 0. ,  0. ,  0. ],
        [ 0.5,  0.5,  0.5]],

       [[ 0.5,  0.5,  0.5],
        [ 1. ,  1. ,  1. ]]])

After filtering the empty nodes, I split the remaining nodes using the above mentioned function:

nodes = np.concatenate([split(i[0],i[1]) for i in nodes])

Wich result in an array of shape (M,2,3), where M is the number of remaining nodes * 8:

nodes.shape

(16, 2, 3)

PART 2

As this point, after the first round of splitting, I had to change the approach as the get_index function wouldn't work for my porpouse.

What I did is start a while loop under the condition of a limit of N iterations.

The loop is mainly based on the following function, wich is long and ugly.

What it bassically do is:

Loop over all the values in unique_index, getting the xyz coordinates of the points that belong to that index value and also getting the nodes that come from a division of that index value.
Loop over each of the new xyz coordinates checking to wich of the new nodes it belongs.
Once the point is founded inside a node, the associate point index is appended to one list and the associate node index to another list.
Loop over the original index list, concatenating the original values with the recently obtained point index values everywhere the original index list matches the unique_index value of the first loop.
Start again with a new unique_index value.

At the end of the loop, return the associate node index list.

LOOPY LOOP FUNCTION

@jit(nopython=True)
def loopy_loop(xyz, nodes, index, u_index):
    """ Loop over the input nodes checking wich point belongs to it.

    The loop will also update the values of the index list by concatenating
    the old values with the new values finded over this loop.

    As a result, a point that belongs to the node '1' before the loop, and is finded
    to belong to the node '6' in this loop, will be updated to have an index
    of '16'.

    Parameters
    ----------
    xyz: (N,3) ndarray
        The x,y,z coordinates of the N points in the point cloud.

    nodes: (M,2,3) ndarray
         The x,y,z minimum and maximum coordiantes of the nodes to loop over.

    index: list
         A list of lenght N indicating to wich of the previous nodes each point 
         belongs.

    u_index: list
         An ordered list containing all the unique index values from the previous
         split.

    Returns
    -------
    non_empty : list
        A sorted list indicating wich of the input nodes are not empty.

    """

    non_empty = []

    #: loop over each of the unique index values
    for i in range(len(u_index)):

        x = []
        y = []
        z = []

        #: loop over the xyz to keep only the points that belongs only to the
        # ith node.
        for j in range(len(xyz)):

            if index[j] == u_index[i]:

                x.append(xyz[j,0])
                y.append(xyz[j,1])
                z.append(xyz[j,2])

            else:

                pass

        #: get the corresponding nodes of the unique index value
        new_nodes = nodes[i * 8 : (i + 1) * 8]

        new_idx = []

        #: loop over all the coordinates
        for k in range(len(x)):

            n = 0

            #: loop over the filtered nodes
            for l in range(len(new_nodes)):

                #: check the min/max values to find to wich node the point belong
                if x[k] >= new_nodes[l,0,0] and y[k] >= new_nodes[l,0,1] and z[k] >= new_nodes[l,0,2]:

                    if x[k] <= new_nodes[l,1,0] and y[k] <= new_nodes[l,1,1] and z[k] <= new_nodes[l,1,2]:

                        new_idx.append(n+1)

                        #: check if the index is already on the list
                        if n + (8 * i) not in non_empty:

                            non_empty.append(n + 8 * i)

                        else:

                            pass

                        break

                    else:
                        n += 1
                else:
                    n += 1

        nm = 0

        #: loop over the index to update the values
        for m in range(len(index)):

            if index[m] == u_index[i]:

                index[m] = index[m] * (10 ** (int(log10(new_idx[nm])) + 1)) + new_idx[nm]

                nm +=1

            else:

                pass

    return sorted(non_empty)

After this function is done, the index is now updated and we use the returned nodes index list to filter the nodes again, split the filtered ones and start again:

    n=0
    #: there will be implemented more conditions to stop the split process
    while n < 4:

        #: check the function's docstring
        new_nodes = loopy_loop(xyz,nodes,index, u_index)

        u_index = np.unique(index)

        nodes = nodes[new_nodes]

        nodes = np.concatenate([split(i[0],i[1]) for i in nodes]) 

        n += 1

At the end of the loop, I get the desired index wich indicates to wich of the nodes each point belongs, having information of every level of subdivision:

octree.index

[11111, 11177, 11777, 17177, 17777, 71177, 71777, 77177, 77777]

As I said, it works "fast" with small point clouds but it scales pretty bad.

So I would like to know every possible way of optimization and also if there is something wrong about my current approach.

Thank you.

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers. — 301_Moved_Permanently, Commented Apr 26, 2016 at 16:07
Thanks for the advice Mathias. In a case like this, where the answer is my own anser, what sould I have done instead of update the question?. I mean, instead of answering my own question what I should have done was update the question with the new implementation I worked on? — David de la Iglesia, Commented Apr 26, 2016 at 17:01

David de la Iglesia · Accepted Answer · 2016-04-25 16:56:19Z

I have twisted the problem a little and find a way of optimize it at the cost of only get the information of one level of subdivision.

Whith this I mean that the bellow method obtains the same result as the method posted on the question (it assings 1 unique index to all the points that are inside a node) but instead of having a concatenation of the indexes obtained along all the subdivisions untill the last one, the indexes are only based on one unique level (what would be the last level in the previous method) of subdivision.

So, starting again from a point cloud, I get the minimum bounding box as before:

xyzmin = np.min(xyz,axis=0)
xyzmax = np.max(xyz,axis=0)

But now I also instanciate a variable n wich represent the subdivision level where I want to locate the points. In the above question this variable correspond to the X in the line first line of the while loop: while n < X

n = 4

And I find the number of segments in wich each of the coordinate axis would be subdivided at that level:

n = (2 ** n) + 1

After that I use that number of segments to create 3 linspances (1 for each axis):

x = np.linspace(xyzmin[0], xyzmax[0], n)
y = np.linspace(xyzmin[1], xyzmax[1], n)
z = np.linspace(xyzmin[2], xyzmax[2], n)

I create an empty container for the future indexes:

idx = np.zeros_like(xyz,dtype=int)

And use this function to fill the container:

@jit(nopython=True)
def fill_idx(xyz, x, y, z, empty_index):

    for i in range(len(xyz)):

        for j in range(len(x)):

            if xyz[i,0] >= x[j] and xyz[i,0] <= x[j+1]:

                for k in range(len(y)):

                    if xyz[i,1] >= y[k] and xyz[i,1] <= y[k+1]:

                        for l in range(len(z)):

                            if xyz[i,2] >= z[l] and xyz[i,2] <= z[l+1]:

                                empty_index[i,0] = j+1
                                empty_index[i,1] = k+1
                                empty_index[i,2] = l+1

                                break
                        break
                break

What the function does is pretty intuitive, it loops over all the points in the point cloud, looking for the segment of the x axis where the point lies; once that segement is founded, start a loop again over the 'y' axis, and repeat after that with the 'z' axis.

Afther that function we have an (N,3) array where each column indicates the segment of the corresponding axis where the point lies. By concatenating the columns with this function:

def merge_idx(idx):

    a = idx[:,0] * (10 ** (np.log10(idx[:,1]).astype(int) + 1)) + idx[:,1]

    b = a * (10 ** (np.log10(idx[:,2]).astype(int) + 1)) + idx[:,2]

    return b

We obtain an (N,) array with unique indixes values that allow us to agroup the points that lies on the same node.

TIMINGS

On a 3.Million point cloud and a subdivision level of 4:

%%timeit
build(xyz)
1 loop, best of 3: 26.4 s per loop

%%timeit
build2(xyz, 4)
1 loop, best of 3: 477 ms per loop

The improvements on the timing are huge, and as the use that I was giving to the indexes was not influenced by having the information of past subdivision, the new method I tried here serves as well.

Here are the resoults for the same point cloud used in the question:

index
Out[6]: [11111, 11177, 11777, 17177, 17777, 71177, 71777, 77177, 77777]

index2
Out[7]: 
array([   111,    444,    888, 121212, 161616, 202020, 242424, 282828, 323232])

I could even go further and say that the indexes of the new method are a little more intuitive about where each point lies.

Stack Exchange Network

Python Octree Implementation

PART 1

SPLIT FUNCTION

GET INDEX FUNCTION

PART 2

LOOPY LOOP FUNCTION

1 Answer 1

TIMINGS

Hot Network Questions

Python Octree Implementation

PART 1

SPLIT FUNCTION

GET INDEX FUNCTION

PART 2

LOOPY LOOP FUNCTION

1 Answer 1

TIMINGS

Related

Hot Network Questions