Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Mining is more than just storing and In[2] the traditional KNN Algorithm is
analysing the data. Classification is one of the explained. It follows the following steps in
vital tasks of data mining which leads order to classify the data set into different
information generation by finding various classes.
patterns and their correlation. Classification 1. Find the distance between the point to be
helps in discovering various hidden data models classified and all of the points in the dataset.
which are otherwise not visible to the data
analyst. Classification is an important aspect 2. Rank the points in the increasing order of the
which helps us choosing and making decisions distance from the point to be classified and store
between two or more groups. KNN algorithm is it in the form of a list.
an effective way to implement classification.
3. From this list, calculate the majority vote on
But it also has a few disadvantages like, 1. In
class labels and evaluate the result.
most cases K need to be specified i.e. number of
nearest neighbours. This is very tough in the
data set where one cant just guess how many of In order to classify the data, taking example of
the data points will fit to one cluster. 2. It is a colored points.
brute force approach and takes all the values
Consider the blue point to be classified as either
into consideration. In real time data sets there
black point or red point using KNN algorithm.
can be few data points which are of no
significance or which are wrongly taken. 3. In
cases where the region of evaluation specified
by giving a specific radius, it checks all points
which is not required. 4. For large number of
Step 1: Plotting all the data points on graph. 2.1 Types on KNN classifiers:
A. Density based classifier [3]:
In [3] Structural Density of data points is
discussed. In some cases calculating only the
neighbors is not efficient. Structural Density is
the concentration of data points around the
volume of its neighborhood. Density is
evaluated over a particular region. Also density
of the entire dataset is also evaluated. We then
search for a value of 'r' such that the mean of the
individual densities is equal to the average
density calculated earlier.
Fig 1: Plotted data points
B. Weighted KNN classifier [3]:
Step 2: Calculate the distance between that point
and all points in the dataset. In this type of classifier, each feature is given a
weight based on how useful it is to find out the
class of the dataset. To accomplish this, they use
the concept of Index of Discernibility. In this,
they assume a hypersphere around every
element of the dataset having a fixed radius,
which is the average distance between that
element and rest of the element of the class.
After this, Elements of the same class which
belong inside the hypersphere are examined and
counted. Then discernibility of that element is
Fig 2. Calculating distances from all data points calculated by dividing number of these elements
belonging to a particular class by the total
Step 3: Store the distances in the increasing
number of elements in a dataset. The Index of
order in a list of their class labels.Here, we can
observe the distances approximately as Discernibility of the whole dataset is calculated
[r,r,r,b,b,b] as the number of elements having discernibility
higher than 0.5 divided by the total number of
Step 4: Let us consider value of k to be 3 .We elements. Its process flow is as follows:
see that majority of vote of first three points is
1.Each of the features of the dataset is evaluated
r from the list, so we classify the point as
using IDs
belonging to r group. 2. Weights are obtained by normalizing IDs.
3. Weight is now applied to the dataset.
4. Knn classifier is now applied on this dataset.