I currently follow along Andrew Ng's Machine Learning Course on Coursera and wanted to implement the gradient descent algorithm in python3
using numpy
and pandas
.
This is what I came up with:
import os
import numpy as np
import pandas as pd
def get_training_data(path): # path to read data from
raw_panda_data = pd.read_csv(path)
# append a column of ones to the front of the data set
raw_panda_data.insert(0, 'Ones', 1)
num_columns = raw_panda_data.shape[1] # (num_rows, num_columns)
panda_X = raw_panda_data.iloc[:,0:num_columns-1] # [ slice_of_rows, slice_of_columns ]
panda_y = raw_panda_data.iloc[:,num_columns-1:num_columns] # [ slice_of_rows, slice_of_columns ]
X = np.matrix(panda_X.values) # pandas.DataFrame -> numpy.ndarray -> numpy.matrix
y = np.matrix(panda_y.values) # pandas.DataFrame -> numpy.ndarray -> numpy.matrix
return X, y
def compute_mean_square_error(X, y, theta):
summands = np.power(X * theta.T - y, 2)
return np.sum(summands) / (2 * len(X))
def gradient_descent(X, y, learning_rate, num_iterations):
num_parameters = X.shape[1] # dim theta
theta = np.matrix([0.0 for i in range(num_parameters)]) # init theta
cost = [0.0 for i in range(num_iterations)]
for it in range(num_iterations):
error = np.repeat((X * theta.T) - y, num_parameters, axis=1)
error_derivative = np.sum(np.multiply(error, X), axis=0)
theta = theta - (learning_rate / len(y)) * error_derivative
cost[it] = compute_mean_square_error(X, y, theta)
return theta, cost
This is how one could use the code:
X, y = get_training_data(os.getcwd() + '/data/data_set.csv')
theta, cost = gradient_descent(X, y, 0.008, 10000)
print('Theta: ', theta)
print('Cost: ', cost[-1])
Where data/data_set.csv
could contain data (model used: 2 + x1 - x2 = y
) looking like this:
x1, x2, y
0, 1, 1
1, 1, 2
1, 0, 3
0, 0, 2
2, 4, 0
4, 2, 4
6, 0, 8
Output:
Theta: [[ 2. 1. -1.]]
Cost: 9.13586056551e-26
I'd especially like to get the following aspects of my code reviewed:
- Overall
python
style. I'm relatively new topython
coming from aC
background and not sure if I'm misunderstanding some concepts here. numpy
/pandas
integration. Do I use these packages correctly?- Correctness of the gradient descent algorithm.
- Efficiency. How can I further improve my code?
np.zeros
to initializetheta
andcost
in your gradient descent function, in my opinion it is clearer. Also why uppercase X and lowercase y? I would make them consistent and perhaps even give them descriptive names, e.g.input
andoutput
. Finally, you could look into exceptions handling e.g. for bad input data from pandas or invalid values forlearning_rate
ornum_iterations
. \$\endgroup\$theta = np.zeros_like(X)
if you would like to initializetheta
with an array of zeros with dimensions ofX
. \$\endgroup\$theta
doesn't have the same dimensions asX
. Regardless I'll keep thenp.zeros_like(...)
function in the back of my head. \$\endgroup\$