Python Kohonen algorithm

Question

I've tried to implement the Kohonen algorithm using Python and I've managed to do this, but my result is so slow. I want to know if anyone knows how I can improve it.

Objective:

I have to read a file with X and Y position of 3000 points
I generate N number of neutrons and arrange them uniformly over the whole chart (over the whole points)
I check the distance between each point and each neutron, and when I find the minimal distance between one point and a neutron, I move that neutron and his neighborhoods with an alpha value to that point.

I need to mention I'm still quite a beginner using Python.

-UPDATE-

Code:

from chart import save_image, draw_centroids, draw_mesh_between_centroids, chart, draw_centroid
import math
from copy import deepcopy

neurons = []
x = []
y = []
N = 15
t = 0
_alfa = 0

neuron = dict(x=0, y=0, row=0, col=0)


def alfa(t):
    return 0.7 * math.pow(math.e, -t / N)


def neighborhood(t):
    return int(7 * math.pow(math.e, -t / N) + 1)


def divisors(n):
    sqrt = int(math.sqrt(n))
    for num in range(sqrt, 2, -1):
        if n % num == 0:
            return num, int(n / num)
    return 0


def generate_neurons(x_range, y_range, size):
    div = divisors(size)
    if div == 0:
        sqrt = int(math.sqrt(size))
        size = sqrt * (sqrt + 1)
        row, col = divisors(size)
    else:
        row, col = divisors(size)

    step_x = x_range / (row + 1)
    step_y = y_range / (col + 1)

    for m in range(size):
        neuron["row"] = row
        neuron["col"] = col
        neurons.append(deepcopy(neuron))

    step_y_tmp = step_y
    for i in range(row):
        for j in range(col):
            if j == 0:
                neurons[i * col + j]["x"] = step_x
            else:
                neurons[i * col + j]["x"] = neurons[i * col + (j - 1)]["x"] + step_x
            neurons[i * col + j]["y"] = step_y_tmp

        step_y_tmp += step_y


def read_file(name):
    lines = [line.rstrip('\n') for line in open('../generate_file/' + name)]

    global x
    global y
    zone = []

    for index in range(len(lines)):
        x.append(int(lines[index].split()[0]))
        y.append(int(lines[index].split()[1]))
        zone.append(int(lines[index].split()[2]))


def run():
    global t
    global _alfa

    distances = []
    for j in range(len(x)):
        for c in neurons:
            dist = distance(x[j], y[j], int(c["x"]), int(c["y"]))
            distances.append(dist)

        minim = min(float(s) for s in distances)
        index = distances.index(minim)

        _alfa = alfa(t)
        move_neurons(x[j], y[j], neurons[index], _alfa)

        distances.clear()
    t += 1


def draw_result():
    chart(x, y)
    draw_mesh_between_centroids(neurons)

    draw_centroids(neurons, "black")

    save_image()


def detect_neighborhood(neuron, neighborhood):
    global neurons
    neighborhoods = []
    col = neurons[0]["col"]
    row = neurons[0]["row"]
    for i in range(row):
        for j in range(col):
            if neuron["x"] == neurons[i * col + j]["x"] and neuron["y"] == neurons[i * col + j]["y"]:

                check = -neighborhood + 1
                check_2 = neighborhood / 2
                if i - neighborhood >= check and j - neighborhood >= check:
                    for _i in range(i - neighborhood, i + 1):
                        for _j in range(j - neighborhood, j + 1):
                            if _i >= 0 and _j >= 0:
                                neighborhoods.append(neurons[_i * col + _j])

                if i - neighborhood >= check and j + neighborhood <= col + check_2:
                    for _i in range(i - neighborhood, i + 1):
                        for _j in range(j, (j + neighborhood) + 1):
                            if _i >= 0 and _j < col:
                                neighborhoods.append(neurons[_i * col + _j])

                if i + neighborhood <= row + check_2 and j - neighborhood >= check:
                    for _i in range(i, (i + neighborhood) + 1):
                        for _j in range(j - neighborhood, j + 1):
                            if _i < row and _j >= 0:
                                neighborhoods.append(neurons[_i * col + _j])

                if i + neighborhood <= row + check_2 and j + neighborhood <= col + check_2:
                    for _i in range(i, (i + neighborhood) + 1):
                        for _j in range(j, (j + neighborhood) + 1):
                            if _i < row and _j < col:
                                neighborhoods.append(neurons[_i * col + _j])

                # draw_centroids(uniqe_dic_list(neighborhoods), "yellow")
                # draw_centroid(neuron, "red")
                return uniqe_dic_list(neighborhoods)


def distance(x1, y1, x2, y2):
    return math.sqrt((x1 - x2) ** 2 + ((y1 - y2) ** 2))


def move_neurons(x, y, neuron, alfa):
    global neurons
    neighborhoods = detect_neighborhood(neuron, neighborhood(t))
    for neuron in neurons:
        for n in neighborhoods:
            if neuron == n:
                neuron["x"] = x + alfa * (x - neuron["x"])
                neuron["y"] = y + alfa * (y - neuron["y"])


def uniqe_dic_list(seq):
    new_d = []
    for x in seq:
        if x not in new_d:
            new_d.append(x)
    return new_d


generate_neurons(300, 300, 100)
read_file("input.txt")

run()
print(_alfa)
while _alfa > 0.2:
    run()
    print(_alfa)

draw_result()

Input file:

120 52 3

200 10 1

133 200 2

...so on...

20 200 1

-UPDATE- Output image:

Can you provide a small sample of the input file? (BTW you probably meant "neuron" instead of "neutron") — ChatterOne, Commented Apr 10, 2017 at 6:40
@ChatterOne I've update my question with how it looks a input file, and yes there was suppose to be "neutron", my mistake — Vildnex, Commented Apr 10, 2017 at 18:23

ChatterOne · Accepted Answer · 2017-04-10 19:59:33Z

First of all, what I think are two bugs.

I've generated a bunch of lines like this (redirecting the output to a file):

import random

for i in range(0, 3000):
    print str(random.randint(0, 300)) + ' ' + str(random.randint(-200, 600)) + ' 2'

And what happens is that I get:

neighborhoods.append(neutrons[_i * col + _j])
IndexError: list index out of range

This may be due to how I generated the file, but it's still something that is not being accounted for, which stops all the calculation that was maybe running for a long time. You should detect something like this and either skip the line silently, warn the user or ask the user what to do.

Also, the return uniqe_dic_list(neighborhoods) is indented to be part of the if, but it really looks like it should be out of the if and out of both the for loops.

Then the rest:

zone is not used. You declare it, assign it, but you never use it.
the line row, col = divisors(size) is in both branches of the if, so it should be out of it.

Like:

if div == 0:
    sqrt = int(math.sqrt(size))
    size = sqrt * (sqrt + 1)
row, col = divisors(size)

The function divisors can return either one value (an int) or two values (a list) and you're not accounting for when it returns only one value. You should either check for that or always return a list.
You don't need the neutron variable with the deepcopy, you can simply append it.

Like:

for m in range(size):
    neutrons.append(
        {'row': row, 'col': col, 'x': neutron['x'], 'y': neutron['y']})

You're not closing the input file after you open it.

You can use the with keyword and also combine the two loop:

def read_file(name):
    global x, y
    with open(name) as input_file:
        for line in input_file:
            line_values = line.rstrip('\n').split()
            x.append(int(line_values[0]))
            y.append(int(line_values[1]))

Naming matters, having i, _i, j, _j, check, check_2 and so on make it hard to read the code.

Thx for your answer and I will try to fallow each of your steps to make my code better :D. But I have to mention 2 things... 1. yes, I'm not using zone because in this algorithm I dont need that but I other yes, because I'm using the same input file and 2. I've update my code, now that error shouldn't exist so you can check again if you want :D. Thx again — Vildnex, Commented Apr 10, 2017 at 22:26

braunmagrin · Accepted Answer · 2017-04-10 18:05:35Z

I'll just give you a few tips now, that you can start working on. But later, with more time, I'll give a more thorough look at your code and improve this answer.

It would be nice if could share the chart as well, so we can test your code without having to make changes to it.
Try to not use the same name with different meanings. In your code neighborhood can mean both a function or a number, depending on the context you are. Inside the detect_neighborhood function it means the variable that was received via argument and elsewhere it means the neighborhood function.
You are using a one-dimensional list to store an information that's two-dimensional. Your code could be simpler if you were using a list of lists or a numpy matrix. This way you're having to do the _i * col + _j trick every time when you could be doing something like neighborhood[i][j] or neighborhood[i,j] (with numpy).
Depending on your access pattern, e.g. if you always are going to access individual items and not iterate over a row or column, you could even use a dictionary with the keys being a tuple (x,y).
For each "neutron" both row and col are always the same. You could remove these variables from each neutron and store them in another place, where they wouldn't be repeated.

I know some of the stuff I mentioned are not directly related to the performance problem you're having, but I consider them to be good programming practices and worth mentioning.

I hope I could make myself clear, but feel free to ask for clarifications and/or more reasons for each point I've put above. And, of course, you can disagree with everything I said. ;)

Stack Exchange Network

Python Kohonen algorithm

2 Answers 2

Hot Network Questions

Python Kohonen algorithm

2 Answers 2

Related

Hot Network Questions