Instance Based Learning is one way of solving task of approximating discrete or real valued target function.
* The key idea here is to:
1. Just store the training examples.
2. When the test example is given then find the closest matches.
To do this we take an inductive assumption that:
similar input maps to similar outputs.Under this
if it is not true then learning is impossible.
If it is true then learning reduces to defining similar
For implementing instance based learning we use k nearest neighbour classification.For this approach:
1. Save the training training example.
2. then at prediction time find the k training example (x1,y1),…(xk,yk) that are closest to test example x.
3. And then predict the most frequent class among yi’s.
we can do various improvement in this like:
- Weighting the examples from the neighborhood
- Measuring the "clossesness"
- finding the "close" example in large training set quickly.
For mathematical approach we use the following formulation:
here "k" is the number of nearest points for the given input.
•Average of k points more reliable when:
–noise in attributes
–noise in class labels
–classes partially overlap
While choosing the value of k in k nearest in neighbour the main issue is that how choose the value of k.For this we consider two situation
1. when "k" is large:
–less sensitive to noise (particularly class noise)
–better probability estimates for discrete classes
–larger training sets allow larger values of k
2. When " k" is small:
–captures fine structure of problem space better
–may be necessary with small training sets
* Balance must be struck between large and small value of k.
* As training set approaches infinity and k grows large,knn becomes Bayes optimal
But the tradeoff between large and small "k" can be difficult.
To overcome this difficulty, use large k but more emphasis on nearer neighbor.
There are two ways of weighting in kNN:
1.Distance-Weighted kNN:
The formulation of this approach is:
2. Locally weighted averaging:
The formulation of this approach is:
The algorithm which we have so far seen are strict averager i.e we can interpolate but we can't extrapolate.
So for extrapolation we do weighted regression,in this we are centered at test point ,weight is control by distance and kernel width.Local
regressor can
be linear, quadratic, n-th
degree polynomial, neural net, …
Usually in instance based learning we come across situation which is known as Curse of Dimesionality.This is due to the following reasons:
•as
number of dimensions increases, distance between points becomes larger and more
uniform
•if
number of relevant attributes is fixed, increasing the number of less relevant
attributes may swamp distance
•when
more irrelevant than relevant dimensions, distance becomes less reliable
The solution for this can be either we use large k or kernel width,feature selection,feature weight,more complex distance functions.Which is given by this following formula.
Now we come to the end of this article.In the next article i will discuss about feature selection.Till then enjoy learning!!!