Multivariable Gradient Descent in Numpy

Question

Just recently started learning ML, first I've gone through the notes of Ng's Coursera stuff. While I have nothing against Octave, I'm trying to solve exercises in Python. It's my beginning with that kind of algorithms, though I got mathematical background, so sorry for a bit messy code. My questions:
Is there a way to make it more readable, and where to find datasets with solutions to test? Also is that conversion to float in gradient descent main loop unavoidable?

import numpy as np


def scaling(X):
    """mean normalization"""
    l = []
    for k in range(X.shape[1]):
        x = X[:, k]

        def f(e):
            tmp0 = sum(x) / len(x)
            tmp1 = max(x) - min(x)
            return (e - tmp0) / tmp1

        m = list(map(float, (list(map(f, x[:, 0])))))
        l.append(m)
    return np.matrix(l).transpose()


l_cost_function = []  # lists to record data for debugging
l_iterations = []


def multivariate_g_d(A, y, alfa):
    """computes gradient descent"""
    X = np.c_[np.ones(len(A), A]
    tmp = np.matrix([0, 0, 0], dtype=np.float64)
    theta = np.matrix([0, 0, 0], dtype=np.float64)
    cnt = 0
    m = len(X)
    ma = alfa * (1 / m)
    delta = J_m(X, y, theta)
    while delta > 0.00001:
        beg = J_m(X, y, theta)
        for j in range(X.shape[1]):
            tmp[:, j] = theta[:, j] - ma * float((((X *   theta.transpose()) - y.transpose()).transpose()) * X[:, j])
        theta = tmp
        end = J_m(X, y, theta)
        l_cost_function.append(end)
        l_iterations.append(cnt)
        delta = abs(end - beg)
        cnt += 1
    return (theta, cnt)


def J_m(A, y, theta):
    """computes cost function """
    n = len(A)
   return (1 / (2 * n)) * float(sum([x * x for x in A * theta.transpose() - y.transpose()]))

I believe the Machine Learning course includes a whole bunch of datasets that you can try. Also, while it can be fun to implement the algorithms in Python, I do hope you will get them solved in Octave as well. — Simon Forsberg, Commented Jan 7, 2017 at 15:29
That's what I'm going to check tonight Ng's exercises and data sets and yes Octave is an option - it allows me to upload algorithms on course website. — Robert Hanigan, Commented Jan 7, 2017 at 15:57
There are many questions on SO tagged machine-learning and referencing Coursera. — hpaulj, Commented Jan 7, 2017 at 18:57
I doubt if float is needed anywhere in your code, especially if the inputs already float. It best used to convert a string to one number. I'd also recommend using np.array instead of np.matrix, though you need to take more vigilant about the number of dimensions (arrays may be 1d). — hpaulj, Commented Jan 7, 2017 at 19:00
Yes ,the conversion is redundant, it's been there from testing - looked as it need, but now works. Why numpy.array instead of numpy.matrix, because of efficiency? — Robert Hanigan, Commented Jan 7, 2017 at 19:55

hpaulj · Accepted Answer · 2017-01-07 22:23:47Z

Without sample inputs I can't run your whole code. And I prefer not to guess.

The use of np.matrix suggests it was translated from MATLAB/Octave code. That array subclass, in numpy, is always 2d, which makes it behave more like MATLAB matrices, especially old versions. Transpose always has effect; row and column indexing returns 2d matrices; and * is matrix multiplication (as opposed to element wise, the .* of MATLAB).

I'll focus on the scaling function. I don't see it being used, but it's simple and typical of the other functions.

import numpy as np
<your code>

X = np.arange(12).reshape(3,4)
print(X)
Xm = np.matrix(X)
print(scaling(Xm))

produces:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[-0.5 -0.5 -0.5 -0.5]
 [ 0.   0.   0.   0. ]
 [ 0.5  0.5  0.5  0.5]]

scaling doesn't work with X an array, because x = X[:, k] would be 1d, which conflicts with the x[:, 0] use.

But I can perform this same operation without row iteration.

def ascaling(X, n=0):
    # array scaling
    ctr = X.mean(axis=n, keepdims=True)
    rge = X.max(axis=n, keepdims=True)-X.min(axis=n, keepdims=True)
    return (X - ctr)/rge
print(ascaling(X,0))
print(ascaling(X,1))

producing

[[-0.5 -0.5 -0.5 -0.5]
 [ 0.   0.   0.   0. ]
 [ 0.5  0.5  0.5  0.5]]
[[-0.5        -0.16666667  0.16666667  0.5       ]
 [-0.5        -0.16666667  0.16666667  0.5       ]
 [-0.5        -0.16666667  0.16666667  0.5       ]]

In this case, the equivalent code, assumng X is np.matrix is simpler

def mscaling(X, n=0):
    # matrix scaling
    ctr = X.mean(axis=n)
    rge = X.max(axis=n)-X.min(axis=n)
    return (X - ctr)/rge

print(mscaling(Xm,0))
print(mscaling(Xm,1))

Maybe this example will make these operations clear:

Make a small 2d array:

In [861]: X=np.arange(6).reshape(2,3)
In [862]: X
Out[862]: 
array([[0, 1, 2],
       [3, 4, 5]])

Sum rows, resulting in a 1d array of length 3 (the columns)

In [863]: X.sum(axis=0)
Out[863]: array([3, 5, 7])

Sum columns - again a 1d result

In [864]: X.sum(axis=1)
Out[864]: array([ 3, 12])

But if I add keepdims, the result is 2d, (2,1) shape:

In [866]: X.sum(axis=1,keepdims=True)
Out[866]: 
array([[ 3],
       [12]])

sum applied to matrix does the same thing.

In [867]: np.matrix(X).sum(axis=1)
Out[867]: 
matrix([[ 3],
        [12]])

The nice thing about keeping dims is that I can do math like

In [868]: X-X.sum(axis=1,keepdims=True)
Out[868]: 
array([[-3, -2, -1],
       [-9, -8, -7]])

Without keepdims I'd have to do X-X.sum(axis=1)[:,None].

Your frequent use of transpose suggests that the mix of dimensions haven't been fully thought out. MATLAB code does use x' or x.' quite a bit. On the other hand beginner numpy coders try to apply transpose to 1d arrays, and wonder why nothing happens.

Thanks for your work! I will double check scaling function, but... checked only first raw of your first example, (which you claimed doesn't work), but is correct! This code isn't translated from Octave it's written from scratch(with Ng's lectures). Here is sample how this works (I made a jupyter notebook about it on my blog): nbviewer.jupyter.org/github/lion137/blog/blob/master/…. Feature scaling here: en.wikipedia.org/wiki/Feature_scaling. A, why I've changed from arrays to matrixes - it saves me from writing loops. — Robert Hanigan, Commented Jan 8, 2017 at 0:16

Stack Exchange Network

Multivariable Gradient Descent in Numpy

1 Answer 1

Hot Network Questions

Multivariable Gradient Descent in Numpy

1 Answer 1

Related

Hot Network Questions