Wednesday, January 11, 2017

Performance Evaluation of model of Machine Learning algorithm

The Performance evaluation of Learning algorithm is important because the learning system are usually designed to predict the class of future unlabeled data points.We can find either error,accuracy or precision recall to evaluate the performance.
 To do such kind of evaluation we use training and test sets or k fold cross validation.

There are various ways to measure the errors of learning algorithm



Confusion matrix also known as error matrix is a specific table layout which allows the visualization of the performance of an algorithm, specifically supervised learning.
Here each column represent  instance in a predicted class while each row represent instance of actual class.The name "Confusion" is taken  from the fact that it make it easy to see if system is confusing two classes.
 It is a kind of Contingency table with two table i.e predicted and actual.And identical set of "classes"
in both dimensions


Here TP is True positive,FP is False Positive,TN  is true negative and FN is false negative

Errors in learning is caused by:
  1. Limited representation:This causes representation bias.
  2. Limited search: This causes search bias
  3. Limited data: This causes variance in the data
  4. Limited feature: This causes noise in the data.
The error in learning is devide into two that is:
  1. Sample error: It for the hypothesis  f  with respect to target function c and data sample S:

     2.True error:It is for the hypothesis  f  with respect to target function c and distribution D, is the probability that h will misclassify an instance drawn at random according to D.




Let's see first method for performance evaluation that is Training and  Test set:
 In this method we split the data in training and test set.
 Training set is use to train the learner while test set is use to evaluate the learner.
While implementing an algorithm we divide the data into training,validation and test set with a ratio of 70%,15%,15% respectively.

Now let's see other method of performance evaluation that is k fold cross validation:
In this we partition the data set into k bins of data sets.
For example if we have 200 data sample then we partition it into 10 bin then each bin would consist 20 data sets.And then we select one bin as the test bin and remaining as training bin.
Then in this k fold cross validation we run separate learning experiments.In this we perform k rounds of learning and on each round 1/k of the data is held out as test set and the remaining as training set and then we compute the average test scores of the k rounds



In the following video I have explain about evaluation and cross validation:

                                   

Hope you have enjoyed reading this article.In next article I will be discussing Linear regression and it's implementation in jupyter notebook .Till then enjoy learning!!!







No comments:

Post a Comment