Saturday, January 21, 2017

For constructing a good decision tree we need to be familiar with various terms like Information Gain,Entropy,Gain etc.These are useful in classifying the examples and chosing the next best attribute.
Let's see these terms in details:


Information Gain: measures how well a given attribute separates the training examples according to their target classification.This measure is used to select among the candidate attribute at each step while growing the tree.

Gain: It is the measurement of how well can we reduce the uncertainty(It's values lie between 0 and 1).

Entropy: It is the measure of uncertainty,purity and information content.

Information Theory: It is a optimal length code which assign  -log2p bits to the message having probability p.

S is sample of training examples in which:
     - p+ is the proportion of positive example in S.
     -  p-    is the proportion of negative examples in S

Entropy of S: It is an optimal number of bits to encode information about certainty/Uncertainty about S.It is given by this following formula;
          Entropy(S) = p+(-log2p+) + p-(-log2p-) = -p+log2p+- p-log2p

Gain(S,A): It is the expected reduction in entropy due to partitioning S on attribute A.
It is given by this following formula:
Gain(S,A)=Entropy(S) - åvÎvalues(A) |Sv|/|S| Entropy(Sv)
  

After terms which are useful in finding the attribute of next node in decision tree.Lte's see splitting rule which uses GINI Index.These are given the following formula:



There are two types of splitting based on continuous attributes:


For continuous attribute:
1. We partition the continuous value of attribute A into a discrete set of intervals.
2. We create a new attribute  Ac ,looking for a threshold c.

We choose the value of c by finding the best cut for all possible splits

In the video given below I have explain more about decision tree:

Hope you enjoy reading this article.In the next post I will be explaining about overfitting in machine learning. Till then enjoy learning!!!

No comments:

Post a Comment