I'm working with 3D point clouds stored in Numpy arrays. I'd succesfully used the scipy's KDTree implementation for task like k-neighbors search and outlier filtering.
However I wanted to try the octrees as an alternative data structure for other task like downsampling.
I was looking for an octree implementation in Python but I dind't find what I was looking for and as a result of that, I tried to implement a solution by myself.
The solution that I've writed so far, works, but it scales pretty bad when it has to manage "large" point clouds (+ 3 Million points) and when the number of divisions goes up.
The first thing that I use the octree for was to replace the points cointained in each of the nodes by their centroid, in order to downsample the density of the point cloud.
I thought that the fastest way to achieve that was to assing an index to each point indicating to wich node it belongs.
PART 1
Starting from a point cloud (I would use a really small point cloud to simplification) stored in a numpy array:
xyz
array([[ 0. , 0. , 0. ],
[ 0.125, 0.125, 0.125],
[ 0.25 , 0.25 , 0.25 ],
[ 0.375, 0.375, 0.375],
[ 0.5 , 0.5 , 0.5 ],
[ 0.625, 0.625, 0.625],
[ 0.75 , 0.75 , 0.75 ],
[ 0.875, 0.875, 0.875],
[ 1. , 1. , 1. ]])
Where the each rows represents 1 point and each colum the corresponding x
, y
and z
coordinates.
I get the minimum bounding box of the point cloud as follows:
xyzmin = np.min(xyz,axis=0)
xyzmax = np.max(xyz,axis=0)
And use it for obtain the first 8 nodes of the octree, using the next function:
SPLIT FUNCTION
def split(xyzmin, xyzmax):
""" Splits the node defined by xyzmin and xyzmax into 8 sub-nodes
Parameters
----------
xyzmin: (3,) ndarray
The x,y,z minimum coordinates that delimite the node
xyzmax: (3,) ndarray
The x,y,z maxymum coordinates that delimite the node
Returns
-------
nodes : (8,2,3) ndarray
The x,y,z minimum and maximum coordiantes of the sub-nodes grouped along the
second dimension
xyzmed = (xyzmax + xyzmin) / 2
nodes = np.array([
[[xyzmin[0], xyzmin[1], xyzmin[2]], [xyzmed[0], xyzmed[1], xyzmed[2]]],
[[xyzmin[0], xyzmed[1], xyzmin[2]], [xyzmed[0], xyzmax[1], xyzmed[2]]],
[[xyzmed[0], xyzmed[1], xyzmin[2]], [xyzmax[0], xyzmax[1], xyzmed[2]]],
[[xyzmed[0], xyzmin[1], xyzmin[2]], [xyzmax[0], xyzmed[1], xyzmed[2]]],
[[xyzmin[0], xyzmin[1], xyzmed[2]], [xyzmed[0], xyzmed[1], xyzmax[2]]],
[[xyzmin[0], xyzmed[1], xyzmed[2]], [xyzmed[0], xyzmax[1], xyzmax[2]]],
[[xyzmed[0], xyzmed[1], xyzmed[2]], [xyzmax[0], xyzmax[1], xyzmax[2]]],
[[xyzmed[0], xyzmin[1], xyzmed[2]], [xyzmax[0], xyzmed[1], xyzmax[2]]],
])
return nodes
Wich give us as a result:
nodes
array([[[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5]],
[[ 0. , 0.5, 0. ],
[ 0.5, 1. , 0.5]],
[[ 0.5, 0.5, 0. ],
[ 1. , 1. , 0.5]],
[[ 0.5, 0. , 0. ],
[ 1. , 0.5, 0.5]],
[[ 0. , 0. , 0.5],
[ 0.5, 0.5, 1. ]],
[[ 0. , 0.5, 0.5],
[ 0.5, 1. , 1. ]],
[[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]],
[[ 0.5, 0. , 0.5],
[ 1. , 0.5, 1. ]]])
Then I use those nodes to asing an index to each point according to inside of which of the nodes it can be found, by using this function:
GET INDEX FUNCTION
@jit(nopython=True)
def get_index(xyz, nodes):
""" Get the index values indicatin to wich of the first 8 nodes each point belong
"""
index = []
for i in range(len(xyz)):
n = 0
for j in range(len(nodes)):
if xyz[i,0] >= nodes[j,0,0] and xyz[i,1] >= nodes[j,0,1] and xyz[i,2] >= nodes[j,0,2]:
if xyz[i,0] <= nodes[j,1,0] and xyz[i,1] <= nodes[j,1,1] and xyz[i,2] <= nodes[j,1,2]:
index.append(n+1)
break
else:
n += 1
else:
n += 1
return index
Aplying this function to the point cloud we obtain the following indices:
index
[1, 1, 1, 1, 1, 7, 7, 7, 7]
And I use those indices to filter the nodes that are empty in order to discard them:
u_index = np.unique(index)
u_index
array([1, 7])
#: filter the empty nodes
nodes = nodes[u_index-1]
nodes
array([[[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5]],
[[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]]])
After filtering the empty nodes, I split the remaining nodes using the above mentioned function:
nodes = np.concatenate([split(i[0],i[1]) for i in nodes])
Wich result in an array of shape (M,2,3), where M is the number of remaining nodes * 8:
nodes.shape
(16, 2, 3)
PART 2
As this point, after the first round of splitting, I had to change the approach as the get_index function wouldn't work for my porpouse.
What I did is start a while loop under the condition of a limit of N
iterations.
The loop is mainly based on the following function, wich is long and ugly.
What it bassically do is:
Loop over all the values in unique_index, getting the xyz coordinates of the points that belong to that index value and also getting the nodes that come from a division of that index value.
Loop over each of the new xyz coordinates checking to wich of the new nodes it belongs.
Once the point is founded inside a node, the associate point index is appended to one list and the associate node index to another list.
Loop over the original index list, concatenating the original values with the recently obtained point index values everywhere the original index list matches the unique_index value of the first loop.
Start again with a new unique_index value.
At the end of the loop, return the associate node index list.
LOOPY LOOP FUNCTION
@jit(nopython=True)
def loopy_loop(xyz, nodes, index, u_index):
""" Loop over the input nodes checking wich point belongs to it.
The loop will also update the values of the index list by concatenating
the old values with the new values finded over this loop.
As a result, a point that belongs to the node '1' before the loop, and is finded
to belong to the node '6' in this loop, will be updated to have an index
of '16'.
Parameters
----------
xyz: (N,3) ndarray
The x,y,z coordinates of the N points in the point cloud.
nodes: (M,2,3) ndarray
The x,y,z minimum and maximum coordiantes of the nodes to loop over.
index: list
A list of lenght N indicating to wich of the previous nodes each point
belongs.
u_index: list
An ordered list containing all the unique index values from the previous
split.
Returns
-------
non_empty : list
A sorted list indicating wich of the input nodes are not empty.
"""
non_empty = []
#: loop over each of the unique index values
for i in range(len(u_index)):
x = []
y = []
z = []
#: loop over the xyz to keep only the points that belongs only to the
# ith node.
for j in range(len(xyz)):
if index[j] == u_index[i]:
x.append(xyz[j,0])
y.append(xyz[j,1])
z.append(xyz[j,2])
else:
pass
#: get the corresponding nodes of the unique index value
new_nodes = nodes[i * 8 : (i + 1) * 8]
new_idx = []
#: loop over all the coordinates
for k in range(len(x)):
n = 0
#: loop over the filtered nodes
for l in range(len(new_nodes)):
#: check the min/max values to find to wich node the point belong
if x[k] >= new_nodes[l,0,0] and y[k] >= new_nodes[l,0,1] and z[k] >= new_nodes[l,0,2]:
if x[k] <= new_nodes[l,1,0] and y[k] <= new_nodes[l,1,1] and z[k] <= new_nodes[l,1,2]:
new_idx.append(n+1)
#: check if the index is already on the list
if n + (8 * i) not in non_empty:
non_empty.append(n + 8 * i)
else:
pass
break
else:
n += 1
else:
n += 1
nm = 0
#: loop over the index to update the values
for m in range(len(index)):
if index[m] == u_index[i]:
index[m] = index[m] * (10 ** (int(log10(new_idx[nm])) + 1)) + new_idx[nm]
nm +=1
else:
pass
return sorted(non_empty)
After this function is done, the index is now updated and we use the returned nodes index list to filter the nodes again, split the filtered ones and start again:
n=0
#: there will be implemented more conditions to stop the split process
while n < 4:
#: check the function's docstring
new_nodes = loopy_loop(xyz,nodes,index, u_index)
u_index = np.unique(index)
nodes = nodes[new_nodes]
nodes = np.concatenate([split(i[0],i[1]) for i in nodes])
n += 1
At the end of the loop, I get the desired index wich indicates to wich of the nodes each point belongs, having information of every level of subdivision:
octree.index
[11111, 11177, 11777, 17177, 17777, 71177, 71777, 77177, 77777]
As I said, it works "fast" with small point clouds but it scales pretty bad.
So I would like to know every possible way of optimization and also if there is something wrong about my current approach.
Thank you.