Benvenuto in Scribd!

Chap7 KNN

Caricato da

Il 0% ha trovato utile questo documento (0 voti)

10 visualizzazioni15 pagine

K-Nearest-Neighbor Classification is a non-parametric algorithm that classifies records based on the majority class of its k nearest neighbors. It makes no assumptions about the distribution of the data. The algorithm finds the Euclidean distance between the record to be classified and all other records, selects the k nearest neighbors, and assigns the record the majority class of those neighbors. The value of k is chosen to minimize error on validation data, balancing local structure and noise. The algorithm is simple but suffers from the "curse of dimensionality" as the number of predictors increases.

Descrizione originale:

K MEans

Titolo originale

Chap7_KNN

Copyright

Formati disponibili

PPT, PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Formati disponibili

Scarica in formato PPT, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

10 visualizzazioni15 pagine

Chap7 KNN

Caricato da

mitochondri

Copyright:

Formati disponibili

Scarica in formato PPT, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 15

Cerca all'interno del documento

K-Nearest-Neighbor Classification

Characteristics

Data-driven, not model-driven

Makes no assumptions about the data

Basic Idea
For a given record to be classified, identify nearby
records

“Near” means records with similar predictor values X1,

X2, … Xp

Classify the record as whatever the predominant class

is among the nearby records (the “neighbors”)
How to measure “nearby”?
The most popular distance measure is
Euclidean distance
Choosing k
K is the number of nearby neighbors to be used to
classify the new record
K=1 means use the single nearest record
K=5 means use the 5 nearest records

Typically choose that value of k which has lowest error

rate in validation data
Low k vs. High k
Low values of k (1, 3, …) capture local structure in
data (but also noise)

High values of k provide more smoothing, less noise,

but may miss local structure

Note: the extreme case of k = n (i.e., the entire data set)

is the same as the “naïve rule” (classify all records
according to majority class)
Example: Riding Mowers

Data: 24 households classified as owning or not

owning riding mowers

Predictors: Income, Lot Size

Income Lot_Size Ownership
60.0 18.4 owner
85.5 16.8 owner
64.8 21.6 owner
61.5 20.8 owner
87.0 23.6 owner
110.1 19.2 owner
108.0 17.6 owner
82.8 22.4 owner
69.0 20.0 owner
93.0 20.8 owner
51.0 22.0 owner
81.0 20.0 owner
75.0 19.6 non-owner
52.8 20.8 non-owner
64.8 17.2 non-owner
43.2 20.4 non-owner
84.0 17.6 non-owner
49.2 17.6 non-owner
59.4 16.0 non-owner
66.0 18.4 non-owner
47.4 16.4 non-owner
33.0 18.8 non-owner
51.0 14.0 non-owner
63.0 14.8 non-owner
XLMiner Output
For each record in validation data (6 records) XLMiner
finds neighbors amongst training data (18 records).

The record is scored for k=1, k=2, … k=18.

Best k appears to be k=8.

k = 9, k = 10, k=14 also share low error rate, but best

to choose lowest k.
% Error % Error
Value of k
Training Validation

1 0.00 33.33
2 16.67 33.33
3 11.11 33.33
4 22.22 33.33
5 11.11 33.33
6 27.78 33.33
7 22.22 33.33
8 22.22 16.67 <--- Best k
9 22.22 16.67
10 22.22 16.67
11 16.67 33.33
12 16.67 16.67
13 11.11 33.33
14 11.11 16.67
15 5.56 33.33
16 16.67 33.33
17 11.11 33.33
18 50.00 50.00
Using K-NN for Prediction
(for Numerical Outcome)

 Instead of “majority vote determines class” use

average of response values

 May be a weighted average, weight decreasing with

distance
Advantages
 Simple
 No assumptions required about Normal distribution,
etc.
 Effective at capturing complex interactions among
variables without having to define a statistical model
Shortcomings
 Required size of training set increases
exponentially with # of predictors, p
This is because expected distance to nearest neighbor increases
with p (with large vector of predictors, all records end up “far
away” from each other)
 In a large training set, it takes a long time to find
distances to all the neighbors and then identify the
nearest one(s)
 These constitute “curse of dimensionality”
Dealing with the Curse
of dimensionality”

 Reduce dimension of predictors (e.g., with PCA)

 Computational shortcuts that settle for “almost

nearest neighbors”
Summary
 Find distance between record-to-be-classified and all
other records
 Select k-nearest records
Classify it according to majority vote of nearest neighbors
Or, for prediction, take the as average of the nearest neighbors
 “Curse of dimensionality” – need to limit # of
predictors

Potrebbero piacerti anche

Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Intelligence
Documento17 pagine
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Intelligence
jay
100% (1)
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics
Documento16 pagine
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics
Everett Tu
Nessuna valutazione finora
Example 1: Riding Mowers
Documento6 pagine
Example 1: Riding Mowers
akirank1
Nessuna valutazione finora
003 03 KNN - The - Algorithm W3L1
Documento10 pagine
003 03 KNN - The - Algorithm W3L1
DOESSKKU
Nessuna valutazione finora
Statistical Quality Control: Eaching Uggestions Alternative Example 17.3
Documento7 pagine
Statistical Quality Control: Eaching Uggestions Alternative Example 17.3
José Manuel Orduño Villa
Nessuna valutazione finora
The Monty Hall Problem - The Three Doors Problem Purpose
Documento8 pagine
The Monty Hall Problem - The Three Doors Problem Purpose
Minich
Nessuna valutazione finora
Section3 Group10 Week6
Documento12 pagine
Section3 Group10 Week6
sankarshan kulkarni
Nessuna valutazione finora
K Nearest Neighbors: KNN, ID Trees, and Neural Nets Intro To Learning Algorithms
Documento14 pagine
K Nearest Neighbors: KNN, ID Trees, and Neural Nets Intro To Learning Algorithms
mlevils
Nessuna valutazione finora
Scott Knott
Documento26 pagine
Scott Knott
Márcio Sousa Rocha
Nessuna valutazione finora
Package Outliers': R Topics Documented
Documento15 pagine
Package Outliers': R Topics Documented
manuelq9
Nessuna valutazione finora
Outliers PDF
Documento15 pagine
Outliers PDF
Harold David Gil Muñoz
Nessuna valutazione finora
Cluster Analysis: Motivation: Why Cluster Analysis Dissimilarity Matrices Introduction To Clustering Algorithms
Documento34 pagine
Cluster Analysis: Motivation: Why Cluster Analysis Dissimilarity Matrices Introduction To Clustering Algorithms
bs_sharath
Nessuna valutazione finora
Binary Integer
Documento23 pagine
Binary Integer
Aung Zaw Moe
Nessuna valutazione finora
70 37 196 CST401ai Module-3notes
Documento35 pagine
70 37 196 CST401ai Module-3notes
Faris Basha
Nessuna valutazione finora
Lecture Notes On Clustering
Documento10 pagine
Lecture Notes On Clustering
yuyang zhang
Nessuna valutazione finora
Functional Analysis
Documento93 pagine
Functional Analysis
bala_murali
Nessuna valutazione finora
Instance Based Learning: November 2015
Documento11 pagine
Instance Based Learning: November 2015
Manu S
Nessuna valutazione finora
Note 6
Documento33 pagine
Note 6
nuthan manideep
Nessuna valutazione finora
Applied Regression Analysis Final Project
Documento8 pagine
Applied Regression Analysis Final Project
bqa5055
Nessuna valutazione finora
Digital Assignmen T-3: Mat 2001 Statistics For Engineers
Documento14 pagine
Digital Assignmen T-3: Mat 2001 Statistics For Engineers
Harsha Vardhan
Nessuna valutazione finora
Package Outliers': R Topics Documented
Documento15 pagine
Package Outliers': R Topics Documented
José Ortigas
Nessuna valutazione finora
Spss 8
Documento4 pagine
Spss 8
Jacob Tan
Nessuna valutazione finora
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
Documento16 pagine
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
Swapnil Parkhe
Nessuna valutazione finora
Ejercicios de Maraton
Documento16 pagine
Ejercicios de Maraton
Nick Fiury
Nessuna valutazione finora
K Nearest Neighbor (Revised)
Documento20 pagine
K Nearest Neighbor (Revised)
Aradhya
Nessuna valutazione finora
Metslib TR
Documento29 pagine
Metslib TR
Henne Popenne
Nessuna valutazione finora
SQ Uniform (PPT) Best
Documento19 pagine
SQ Uniform (PPT) Best
Er Aditya Singh
Nessuna valutazione finora
Greedy Algorithm
Documento18 pagine
Greedy Algorithm
Nick Tunac
Nessuna valutazione finora
Algoritmi Genetici
Documento7 pagine
Algoritmi Genetici
Ștefan Cristea
Nessuna valutazione finora
Ryan Adams 140814 Bayesopt Ncap
Documento84 pagine
Ryan Adams 140814 Bayesopt Ncap
John Smith
Nessuna valutazione finora
The Green TaoTheorem
Documento60 pagine
The Green TaoTheorem
chenhu90
Nessuna valutazione finora
6.034 Quiz 2 November 17, 2003: Name Email
Documento20 pagine
6.034 Quiz 2 November 17, 2003: Name Email
Maria Silva
Nessuna valutazione finora
Max Cliq Report PDF
Documento9 pagine
Max Cliq Report PDF
younes
Nessuna valutazione finora
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
Documento54 pagine
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
Yousef Aboamara
Nessuna valutazione finora
Package Breakpoint': R Topics Documented
Documento25 pagine
Package Breakpoint': R Topics Documented
Andres Caldas
Nessuna valutazione finora
Divisibility
Documento12 pagine
Divisibility
wajid ali
Nessuna valutazione finora
Algorithms - K Nearest Neighbors
Documento23 pagine
Algorithms - K Nearest Neighbors
Xander Rodriguez
Nessuna valutazione finora
Outliers
Documento15 pagine
Outliers
Aman Mahajan
Nessuna valutazione finora
36 Greedy
Documento19 pagine
36 Greedy
Narender Reddy
Nessuna valutazione finora
RM Assignment 2
Documento9 pagine
RM Assignment 2
7 Edu
Nessuna valutazione finora
Module 1 Statistical Inference
Documento67 pagine
Module 1 Statistical Inference
nrsiddiqui
Nessuna valutazione finora
6.034 Quiz 2 November 17, 2003: Name Email
Documento20 pagine
6.034 Quiz 2 November 17, 2003: Name Email
sssbulbul
Nessuna valutazione finora
AESComplete
Documento20 pagine
AESComplete
Shehzad Haider
Nessuna valutazione finora
Boxcoxmix PDF
Documento19 pagine
Boxcoxmix PDF
Rheza Yogatama
Nessuna valutazione finora
Nonparametric Density Estimation Nearest Neighbors, KNN
Documento31 pagine
Nonparametric Density Estimation Nearest Neighbors, KNN
iam_ecko
Nessuna valutazione finora
ADB Slides 4
Documento47 pagine
ADB Slides 4
muhammadwalidmido
Nessuna valutazione finora
Least Squares Adjustment
Documento47 pagine
Least Squares Adjustment
jjpaul
Nessuna valutazione finora
Supervised Example KNN
Documento22 pagine
Supervised Example KNN
jamalabbasi281998
Nessuna valutazione finora
MA225 - Differentiation
Documento14 pagine
MA225 - Differentiation
Rebecca Rumsey
Nessuna valutazione finora
7 K-Means Clustering
Documento27 pagine
7 K-Means Clustering
Luka Filipovic
Nessuna valutazione finora
Sobel Edge Detector Using VHDL - Report
Documento25 pagine
Sobel Edge Detector Using VHDL - Report
Rahul Agarwal
0% (1)
Instance Based Learning
Documento16 pagine
Instance Based Learning
Swathi Reddy
Nessuna valutazione finora
Probabilistic Method Lecture Note
Documento85 pagine
Probabilistic Method Lecture Note
Thanh Binh Le
Nessuna valutazione finora
Hidden Markov Models: Ramon Van Handel
Documento123 pagine
Hidden Markov Models: Ramon Van Handel
math_mallikarjun_sap
Nessuna valutazione finora
Mclogit
Documento19 pagine
Mclogit
kyotopinheiro
Nessuna valutazione finora
Have You Ever : Shot A Rifle? Played Darts? Played Basketball?
Documento39 pagine
Have You Ever : Shot A Rifle? Played Darts? Played Basketball?
foofool
Nessuna valutazione finora
Juan
Documento11 pagine
Juan
Asad
Nessuna valutazione finora
Lectures on Resolution of Singularities (AM-166)
Da Everand
Lectures on Resolution of Singularities (AM-166)
János Kollár
Nessuna valutazione finora
Statistics in Theory and Practice
Da Everand
Statistics in Theory and Practice
Robert Lupton
Valutazione: 3 su 5 stelle
3/5 (1)
Computational Methods for Modeling of Nonlinear Systems by Anatoli Torokhti and Phil Howlett
Da Everand
Computational Methods for Modeling of Nonlinear Systems by Anatoli Torokhti and Phil Howlett
Anatoli Torokhti
Nessuna valutazione finora
Test For Certificate - Coursera - Passed
Documento1 pagina
Test For Certificate - Coursera - Passed
mitochondri
100% (2)
Test For Certificate - Coursera - Answers
Documento1 pagina
Test For Certificate - Coursera - Answers
mitochondri
0% (1)
Test For Certificate - Coursera1
Documento1 pagina
Test For Certificate - Coursera1
mitochondri
0% (1)
Test1 For Certificate - Coursera
Documento1 pagina
Test1 For Certificate - Coursera
mitochondri
0% (2)
Practice Quiz 1-8
Documento1 pagina
Practice Quiz 1-8
mitochondri
Nessuna valutazione finora
Process of Strategic Management-2
Documento12 pagine
Process of Strategic Management-2
mitochondri
Nessuna valutazione finora
Letter of Recommendation
Documento1 pagina
Letter of Recommendation
mitochondri
Nessuna valutazione finora
Hani Tamim, PHD Associate Professor Department of Internal Medicine American University of Beirut
Documento44 pagine
Hani Tamim, PHD Associate Professor Department of Internal Medicine American University of Beirut
mitochondri
Nessuna valutazione finora
Ops Startegy
Documento2 pagine
Ops Startegy
mitochondri
Nessuna valutazione finora
S&DM - Assignment - Group 9 - Section B
Documento2 pagine
S&DM - Assignment - Group 9 - Section B
mitochondri
Nessuna valutazione finora
Steps To Be Followed For Subscribing To Mega Sports Complex Facilities
Documento5 pagine
Steps To Be Followed For Subscribing To Mega Sports Complex Facilities
mitochondri
Nessuna valutazione finora
Course Oultine - International Finance
Documento5 pagine
Course Oultine - International Finance
mitochondri
Nessuna valutazione finora
Submitted To: Mrs - Shanthi M.SC (N), Reader, C.S.I.JACON. Submitted By: J.C.Frank, I Year M.SC (N)
Documento25 pagine
Submitted To: Mrs - Shanthi M.SC (N), Reader, C.S.I.JACON. Submitted By: J.C.Frank, I Year M.SC (N)
Prasann Roy
Nessuna valutazione finora
Sampling
Documento26 pagine
Sampling
darlene jane maynes
Nessuna valutazione finora
BRM Proposal - Final
Documento23 pagine
BRM Proposal - Final
Rumy Aishan
Nessuna valutazione finora
Solutions Manual
Documento73 pagine
Solutions Manual
Gustavo David
Nessuna valutazione finora
War On Drugs
Documento18 pagine
War On Drugs
Mark Jessie Panopio Magsaysay
100% (2)
Annual Report 2010
Documento152 pagine
Annual Report 2010
საქართველოს სტრატეგიული კვლევებისა და განვითარების ცენტრი
Nessuna valutazione finora
David Sm13 PPT 01
Documento46 pagine
David Sm13 PPT 01
sonicgeat
Nessuna valutazione finora
San Francisco St. Butuan City 8600, Region XIII Caraga, Philippines
Documento4 pagine
San Francisco St. Butuan City 8600, Region XIII Caraga, Philippines
kuro hanabusa
Nessuna valutazione finora
Analytic Hierarchy Process: Presented by Gueddouche Roumaissa
Documento15 pagine
Analytic Hierarchy Process: Presented by Gueddouche Roumaissa
Roumaissa Gueddouche
Nessuna valutazione finora
Competency Model For The Work Force of Head of The District Health Office (DHO) in North Sumatra Province-Indonesia
Documento15 pagine
Competency Model For The Work Force of Head of The District Health Office (DHO) in North Sumatra Province-Indonesia
reviewjre
Nessuna valutazione finora
Vision: Godrej & Boyce Manufacturing Company Profile
Documento4 pagine
Vision: Godrej & Boyce Manufacturing Company Profile
Varun Awtani
Nessuna valutazione finora
ST 4 Statistical Techniques Class Notes
Documento7 pagine
ST 4 Statistical Techniques Class Notes
21101130 naresh
Nessuna valutazione finora
A Comprehensive Modeling Framework For Trans 2016 Transportation Research Pa
Documento20 pagine
A Comprehensive Modeling Framework For Trans 2016 Transportation Research Pa
Chương Thiện Nguyễn
Nessuna valutazione finora
May June 2016 As and A Level Timetable
Documento3 pagine
May June 2016 As and A Level Timetable
HarshilPatel
Nessuna valutazione finora
Customer Sanctification Towards Super Store
Documento8 pagine
Customer Sanctification Towards Super Store
Vasant Mahraj
Nessuna valutazione finora
A Study On Employee Absenteeism in Home Appliance Company, Chennai
Documento7 pagine
A Study On Employee Absenteeism in Home Appliance Company, Chennai
Dhevanand E
Nessuna valutazione finora
BS Assignment 2: σ given, z−test, H H
Documento28 pagine
BS Assignment 2: σ given, z−test, H H
Niharika Anand
Nessuna valutazione finora
ELT Color Coding
Documento10 pagine
ELT Color Coding
PPL PBI Unisri
Nessuna valutazione finora
Previewpdf
Documento25 pagine
Previewpdf
Saif Sidd
Nessuna valutazione finora
Logical Organization, Precise Word Choice, Formal Tone and Evidence-Based Arguments Are Important Characteristics of Academic Writing
Documento13 pagine
Logical Organization, Precise Word Choice, Formal Tone and Evidence-Based Arguments Are Important Characteristics of Academic Writing
Mardia Gadjali
Nessuna valutazione finora
Investment Equity Research Analyst in Portland OR Resume Tammi Ortega
Documento2 pagine
Investment Equity Research Analyst in Portland OR Resume Tammi Ortega
TammiOrtega
Nessuna valutazione finora
SSRC Policy Final
Documento38 pagine
SSRC Policy Final
dmahiu
Nessuna valutazione finora
2.4-Description of Coursework Sep 2023
Documento6 pagine
2.4-Description of Coursework Sep 2023
HUANG WENCHEN
Nessuna valutazione finora
Monitoring and Evaluating Advocacy
Documento50 pagine
Monitoring and Evaluating Advocacy
بكرة احلى
Nessuna valutazione finora
Conceptual Framework
Documento16 pagine
Conceptual Framework
Catherine Irada Igharas-Carmen
Nessuna valutazione finora
Third Order ODE Example
Documento2 pagine
Third Order ODE Example
BuiThaiChien
Nessuna valutazione finora
Soldiers Heart Thesis Statement
Documento6 pagine
Soldiers Heart Thesis Statement
kimberlyharrisbaltimore
100% (2)
Form Skripsi Consultation Sheet 01042021
Documento4 pagine
Form Skripsi Consultation Sheet 01042021
Psikologi Cahaya Nurani
Nessuna valutazione finora
Archaeology - Coursebook Chapter 1
Documento22 pagine
Archaeology - Coursebook Chapter 1
Ping He
Nessuna valutazione finora
Business Research Methods: Alan Bryman, Emma Bell
Documento2 pagine
Business Research Methods: Alan Bryman, Emma Bell
Jonas Singbo
100% (1)