4
\$\begingroup\$

I'm implementing a version of the mean shift image processing algorithm for color segmentation in Python/NumPy.

I've written a pure NumPy version of the actual mean shifting per pixel (which I imagine is where the majority of time is taking). It slices an array of RGB values to work on out of the parent image, then creates lower bound and higher bound RGB reference arrays and generates a boolean masking array for the pixels to use for averaging then averages.

Any further optimizations? I suspect vectorizing the x/y for loops might give a speed up but for the life of me, but I haven't figured out how. (Some how generating an array of each pixel grid to work on and then generalizing the mean shift to take array input?) gL is grid length. gS is gL squared - the number of pixels in grid.

for itr in xrange(itrs):
    if itr != 0:
        img = imgNew
    for x in xrange(gL,height-gL):
        for y in xrange(gL,width-gL):
            cGrid = img[x-gSmp:(x+gSmp+1),y-gSmp:(y+gSmp+1)]
            cLow,cUp = np.empty((gL,gL,3)),np.empty((gL,gL,3))
            cLow[:] = [img[x,y][0]-tol,img[x,y][1]-tol,img[x,y][2]-tol]
            cUp[:] = [img[x,y][0]+tol,img[x,y][1]+tol,img[x,y][2]+tol]
            cBool = np.any(((cLow < cGrid) & (cUp > cGrid)),axis=2)
            imgNew[x,y] =  np.sum(cGrid[cBool],axis=0)/cBool.sum()
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

The following code is a first shot and it is still not vectorized. The major points here are the extraction of the creation of cLow and cUp (don't create arrays in loops, always 'preallocate' memory), the calculation of the tolerance levels can be done in one operation (under the assumption that broadcasting is possible at this point) and at last I removed the conditional case for copying the imgNew to img (I also doubt that you do not want to copy the last iteration back into img. If so you have to remove the copy line before the loop and move the copy at the beginning of the loop to its end.).

diff_height_gL = height - gL
diff_width_gL = width - gL
sum_gSmp_one = gSmp + 1

cLow, cUp = np.empty((gL, gL, 3)), np.empty((gL, gL, 3))

imgNew = img.copy()

for itr in xrange(itrs):

    img[:] = imgNew

    for x in xrange(gL, diff_height_gL):
        for y in xrange(gL, diff_width_gL):

            cGrid = img[x-gSmp:(x + sum_gSmp_one), y-gSmp:(y + sum_gSmp_one)]

            cLow[:] = img[x, y, :] - tol
            cUp[:] = img[x, y, :] + tol

            cBool = np.any(((cLow < cGrid) & (cUp > cGrid)), axis=2)

            imgNew[x, y] =  np.sum(cGrid[cBool], axis=0) / cBool.sum()

This problems seems to be perfectly shaped to do multiprocessing. This could be an alternative/extension to vectorization. If I have time I will try the vectorization...

Kind regeards

\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.