Sei sulla pagina 1di 6

KNN Algorithm

 The KNN algorithm is a robust and versatile classifier that is often used as a
benchmark for more complex classifiers such as Artificial Neural Networks
(ANN) and Support Vector Machines (SVM).
 Despite its simplicity, KNN can outperform more powerful classifiers and is
used in a variety of applications such as economic forecasting, data
compression and genetics.
 In the classification setting, the K-nearest neighbor algorithm essentially
boils down to forming a majority vote between the K most similar instances
to a given “unseen” observation.
 Similarity is defined according to a distance metric between two data
points. A popular choice is the Euclidean distance given by

 More formally, given a positive integer K, an unseen observation xx and a


similarity metric dd, KNN classifier performs the following two steps:
 It runs through the whole dataset computing dd between xx and each
training observation. We’ll call the K points in the training data that are
closest to x.
 The k-nearest neighbor (k-NN) classification is one of the easiest
classification methods to understand
 The k-nearest neighbor classification has a way to store all the known cases
and classify new cases based on a similarity measure (for example, the
Euclidean distance function).
 The k-NN algorithm is popular in its statistical estimation and pattern
recognition because of its simplicity.
 For 1-nearest neighbor (1-NN), the label of one particular point is set to be
the nearest training point.
 When you extend this for a higher value of k, the label of a test point is the
one that is measured by the k nearest training points. The k-NN algorithm is
considered to be a lazy learning algorithm because the optimization is done
locally, and the computations are delayed until classification.
 The advantages are high accuracy, insensitive to outliers, and no
assumptions about data. The disadvantages of k-NN is that it is
computationally expensive and requires a lot of memory.

 K-NN is a supervised classifier that memorizes observations from within a


labeled set to predict classification labels for new, unlabeled observations
 K-NN makes predictions based on the similarity between training
observations to the new observations
 The more similarity the observations values the more likely they will be
classified with same label
 Use cases
o Cancer Detection
o Stock Price Prediction
 K-NN Model Assumptions
o Dataset is labeled
o Dataset only contains relevant features
o Avoid using K-NN on large datasets. It will take probably long time
 A case is classified by a majority of votes of its neighbors, with case being
 Assigned to the class most common amongst its K neighbors measured by
distance function
 If K=1, then the case is simply assigned to the class of its nearest neighbor
 Choosing the optimal value for K is best done by first inspecting the data
 A larger K value is more precise as it reduces the overall noise
 Historically the optimal K for most datasets has been between 3-10. That
produces much better results than 1NN
 Prediction of test data is done based on the basis of its neighbor
 K is an integer(small), if K=1, K is assigned to the class of single nearest
neighbor
Types of papers:

 Quality of paper depends on Acid Durability & Strength


 K = 3, Good – 2, Bad – 1  Class is Good

Potrebbero piacerti anche