futech: Implementation of Linear Regression model

Now it's time for implementing our first model of machine learning i.e Linear Regression
To do this we need various packages and libraries like
1. Numpy   : It is mainly use for N-dimensional array object.
2. Pandas   : It is python data analysis library including structures such as data frames.
3. Matplotlib   : it is 2D plotting library producing publication figure.
4. Scikit Learn : It is Machine Learning algorithm used for data analysis and data mining task.

Anaconda is software which consist of these libraries.And it is powered by python and we use jupyter notebook which contain ipython note book in which we can do our programming implementation to machine learning.
In the video given below I have demonstrate step by step from where to download anaconda and how to run ipython notebook.

Now's let's see how to implement Linear Regression model of machine learning algorithm.

We have use simple function to implement our linear regression:

y = \frac{x}{2}+sin(x)+\epsilon

where

\epsilon \sim \mathcal{N}(0,1)

Given below is the piece of code in which i have import various libraries like:

numpy and for simplicity in coding i have abbreviated it as np.

I have also import linear_model which will import linear regression model of machine leaning from sklearn libraries and also data sets.

And to display our dataset set on plot I have import matplotlib and abbreviated as plt for simplicity in readability.

import numpy as np
from sklearn import linear_model, datasets, tree
import matplotlib.pyplot as plt
%matplotlib inline

Now I will prepare the data for the equation:

y = \frac{x}{2}+sin(x)+\epsilon

where

\epsilon \sim \mathcal{N}(0,1)

is a Gaussian Noise

In the following piece of code I have first taken 100 samples from the data sets.Then I have set it's range  from negative to positive value for the number of samples from the data sets and stored it in object x.

Then I have created a function and to provide randomness to data I ahve called np.random.random().

Then I have use  plt.scatter() it will display the input datasets on the plot in scattered way and randomly and and each dot is indicated by black color.

then I have use plt.xlabel and plt.ylabel to label the x and y axis as x-input features and y-target values.And then plt.title to display title of the plot and the to display output I have use plt.show()

number_of_samples = 100
x = np.linspace(-np.pi, np.pi, number_of_samples)
y = 0.5*x+np.sin(x)+np.random.random(x.shape)
plt.scatter(x,y,color='black') #Plot y-vs-x in dots
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Fig 1: Data for linear regression')
plt.show()

The output plot for this following code will be like this:

Now we will split our data set into training and test set.As it is always encouraged in machine learning to split the available data into training, validation and test sets.

The training set is supposed to be used to train the model. The model is evaluated on the validation set after every episode of training.

The performance on the validation set gives a measure of how good the model generalizes.

Various hyper parameters of the model are tuned to improve performance on the validation set. Finally when the model is completely optimized and ready for deployment, it is evaluated on the test data and the performance is reported in the final description of the model.

To do this I have split the dataset in the ratio of 70%,15%,15% of training test and validation test respectively

random_indices = np.random.permutation(number_of_samples)
#Training set
x_train = x[random_indices[:70]]
y_train = y[random_indices[:70]]
#Validation set
x_val = x[random_indices[70:85]]
y_val = y[random_indices[70:85]]
#Test set
x_test = x[random_indices[85:]]
y_test = y[random_indices[85:]]

Now we fit a line to our data.Linear regression learns to fit a hyperplane to our data in the feature space. For one dimensional data, the hyperplane reduces to a straight line. We will fit a line to our data using sklearn.linear_model.LinearRegression

model = linear_model.LinearRegression() #Create a least squared error linear regression object

#sklearn takes the inputs as matrices. Hence we reshape the arrays into column matrices
x_train_for_line_fitting = np.matrix(x_train.reshape(len(x_train),1))
y_train_for_line_fitting = np.matrix(y_train.reshape(len(y_train),1))

#Fit the line to the training data
model.fit(x_train_for_line_fitting, y_train_for_line_fitting)

#Plot the line
plt.scatter(x_train, y_train, color='black')
plt.plot(x.reshape((len(x),1)),model.predict(x.reshape((len(x),1))),color='blue')
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Fig 2: Line fit to training data')

The output plot of the above code will be

As our model is ready,now we evaluate it. In a linear regression scenario, its common to evaluate the model in terms of the mean squared error on the validation and test sets.

mean_val_error = np.mean( (y_val - model.predict(x_val.reshape(len(x_val),1)))**2 )
mean_test_error = np.mean( (y_test - model.predict(x_test.reshape(len(x_test),1)))**2 )

print 'Validation MSE: ', mean_val_error, '\nTest MSE: ', mean_test_error

Now the output of the following code will be:

Validation MSE:  3.67954814357 
Test MSE:  4.96638767482

Now we come to the end of our first implementation of machine learning model.In the next article I will be explaining you about decision tree.Till then enjoy learning!!!

futech

Friday, January 20, 2017

Implementation of Linear Regression model

No comments:

Post a Comment