Benvenuto in Scribd!

CS229 Lecture Notes: The K-Means Clustering Algorithm

Caricato da

Il 0% ha trovato utile questo documento (0 voti)

73 visualizzazioni3 pagine

K-means algorithm is used to group data into cohesive "clusters" each training example is assigned to the closest cluster centroid. Each cluster centroid is moved to the mean of the points assigned to it.

Descrizione originale:

Titolo originale

Clustering

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Attribution Non-Commercial (BY-NC)

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

73 visualizzazioni3 pagine

CS229 Lecture Notes: The K-Means Clustering Algorithm

Caricato da

Satheesh Chandran

Copyright:

Attribution Non-Commercial (BY-NC)

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 3

Cerca all'interno del documento

CS229 Lecture notes

Andrew Ng

The k-means clustering algorithm

In the clustering problem, we are given a training set {x(1) , . . . , x(m) }, and want to group the data into a few cohesive clusters. Here, x(i) Rn as usual; but no labels y (i) are given. So, this is an unsupervised learning problem. The k-means clustering algorithm is as follows: 1. Initialize cluster centroids 1 , 2 , . . . , k Rn randomly. 2. Repeat until convergence: { For every i, set c(i) := arg min ||x(i) j ||2 .
j

For each j, set j := }

m (i) = j}x(i) i=1 1{c . m (i) = j} i=1 1{c

In the algorithm above, k (a parameter of the algorithm) is the number of clusters we want to nd; and the cluster centroids j represent our current guesses for the positions of the centers of the clusters. To initialize the cluster centroids (in step 1 of the algorithm above), we could choose k training examples randomly, and set the cluster centroids to be equal to the values of these k examples. (Other initialization methods are also possible.) The inner-loop of the algorithm repeatedly carries out two steps: (i) Assigning each training example x(i) to the closest cluster centroid j , and (ii) Moving each cluster centroid j to the mean of the points assigned to it. Figure 1 shows an illustration of running k-means.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 1: K-means algorithm. Training examples are shown as dots, and cluster centroids are shown as crosses. (a) Original dataset. (b) Random initial cluster centroids (in this instance, not chosen to be equal to two training examples). (c-f) Illustration of running two iterations of k-means. In each iteration, we assign each training example to the closest cluster centroid (shown by painting the training examples the same color as the cluster centroid to which is assigned); then we move each cluster centroid to the mean of the points assigned to it. (Best viewed in color.) Images courtesy Michael Jordan. Is the k-means algorithm guaranteed to converge? Yes it is, in a certain sense. In particular, let us dene the distortion function to be:
m

J(c, ) =
i=1

||x(i) c(i) ||2

Thus, J measures the sum of squared distances between each training example x(i) and the cluster centroid c(i) to which it has been assigned. It can be shown that k-means is exactly coordinate descent on J. Specically, the inner-loop of k-means repeatedly minimizes J with respect to c while holding xed, and then minimizes J with respect to while holding c xed. Thus, J must monotonically decrease, and the value of J must converge. (Usually, this implies that c and will converge too. In theory, it is possible for

3 k-means to oscillate between a few dierent clusteringsi.e., a few dierent values for c and/or that have exactly the same value of J, but this almost never happens in practice.) The distortion function J is a non-convex function, and so coordinate descent on J is not guaranteed to converge to the global minimum. In other words, k-means can be susceptible to local optima. Very often k-means will work ne and come up with very good clusterings despite this. But if you are worried about getting stuck in bad local minima, one common thing to do is run k-means many times (using dierent random initial values for the cluster centroids j ). Then, out of all the dierent clusterings found, pick the one that gives the lowest distortion J(c, ).

Potrebbero piacerti anche

cs229 Notes7a
Documento3 pagine
cs229 Notes7a
chris
Nessuna valutazione finora
13: Clustering: Unsupervised Learning - Introduction
Documento4 pagine
13: Clustering: Unsupervised Learning - Introduction
marc
Nessuna valutazione finora
Improved K-Means Algorithm Based On The: Clustering Reliability Analysis
Documento8 pagine
Improved K-Means Algorithm Based On The: Clustering Reliability Analysis
David Moreno
Nessuna valutazione finora
Lecture 3
Documento15 pagine
Lecture 3
nguyenhoangnguyennt
Nessuna valutazione finora
Text Clustering
Documento47 pagine
Text Clustering
Alex Ciocan
Nessuna valutazione finora
MCMC Original
Documento6 pagine
MCMC Original
Aleksey Orlov
Nessuna valutazione finora
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
Documento10 pagine
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
nxp He
Nessuna valutazione finora
U L D R: Nsupervised Earning and Imensionality Eduction
Documento58 pagine
U L D R: Nsupervised Earning and Imensionality Eduction
SanaullahSunny
Nessuna valutazione finora
Roch Mmids Intro 3clustering
Documento15 pagine
Roch Mmids Intro 3clustering
aslamzohaib
Nessuna valutazione finora
Practice Midterm
Documento4 pagine
Practice Midterm
Arka Mitra
Nessuna valutazione finora
Daa Test Key 2
Documento29 pagine
Daa Test Key 2
Satish Peetha
Nessuna valutazione finora
ML DSBA Lab7
Documento6 pagine
ML DSBA Lab7
Houssam Fouki
Nessuna valutazione finora
Unsupervised Learning
Documento24 pagine
Unsupervised Learning
ayesha bashir
Nessuna valutazione finora
Clustering: CMPUT 466/551 Nilanjan Ray
Documento34 pagine
Clustering: CMPUT 466/551 Nilanjan Ray
Richa Jain
Nessuna valutazione finora
Tutte Cose
Documento8 pagine
Tutte Cose
Mauro Piazza
Nessuna valutazione finora
Ex 7
Documento17 pagine
Ex 7
carlos_cabrera_51
Nessuna valutazione finora
Problem Tutorial: "Salty Fish"
Documento3 pagine
Problem Tutorial: "Salty Fish"
Roberto Franco
Nessuna valutazione finora
Applied Econometrics: Department of Economics Stern School of Business
Documento27 pagine
Applied Econometrics: Department of Economics Stern School of Business
chatfieldlohr
Nessuna valutazione finora
Clustering MIT 15.097 Course Notes
Documento9 pagine
Clustering MIT 15.097 Course Notes
alok
Nessuna valutazione finora
Practice Midterm 2010
Documento4 pagine
Practice Midterm 2010
Erico Archeti
Nessuna valutazione finora
A8 Solution
Documento4 pagine
A8 Solution
Madhu Meghana
Nessuna valutazione finora
Dynamic Programming
Documento35 pagine
Dynamic Programming
Muhammad Junaid
Nessuna valutazione finora
n25 PDF
Documento8 pagine
n25 PDF
Christine Straub
Nessuna valutazione finora
Daa All
Documento24 pagine
Daa All
Poison Remark
Nessuna valutazione finora
10-601 Machine Learning: Homework 7: Instructions
Documento5 pagine
10-601 Machine Learning: Homework 7: Instructions
Sasanka Sekhar Sahu
Nessuna valutazione finora
K-Means Clustering and PCA
Documento17 pagine
K-Means Clustering and PCA
exquy school
Nessuna valutazione finora
01 K Means - Merged
Documento26 pagine
01 K Means - Merged
Pattranit Teerakoson
Nessuna valutazione finora
H HD Cvbhhy
Documento7 pagine
H HD Cvbhhy
Vinh Tq
Nessuna valutazione finora
Ass 1
Documento3 pagine
Ass 1
Vibhanshu Lodhi
Nessuna valutazione finora
Mock Question Samples
Documento3 pagine
Mock Question Samples
MUHAMMAD AHAD
Nessuna valutazione finora
Pres 956 Rucci
Documento24 pagine
Pres 956 Rucci
Ans Sembiring
Nessuna valutazione finora
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
Documento10 pagine
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
Suchan Khankluay
Nessuna valutazione finora
Daa Unit Vi Notes
Documento15 pagine
Daa Unit Vi Notes
Poorva
Nessuna valutazione finora
Clustering: ISOM3360 Data Mining For Business Analytics
Documento28 pagine
Clustering: ISOM3360 Data Mining For Business Analytics
Claire Lee
Nessuna valutazione finora
Thi Giac May Tinh - Vo Quang Hoang Khang - Ex6 (Cuuduongthancong - Com)
Documento3 pagine
Thi Giac May Tinh - Vo Quang Hoang Khang - Ex6 (Cuuduongthancong - Com)
Tiến Anh Nguyễn
Nessuna valutazione finora
Colour Image Analysis
Documento12 pagine
Colour Image Analysis
ArunKumar
Nessuna valutazione finora
Review: Generalized Least Squares: 3.1. Covariance Matrices
Documento12 pagine
Review: Generalized Least Squares: 3.1. Covariance Matrices
frubio
Nessuna valutazione finora
Ada Module 3 Notes
Documento40 pagine
Ada Module 3 Notes
Lokko Prince
Nessuna valutazione finora
Dynamic Programming
Documento74 pagine
Dynamic Programming
21dce106
Nessuna valutazione finora
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
Documento6 pagine
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
Teerapat Jenrungrot
Nessuna valutazione finora
MATH5340M Risk Management in Template
Documento6 pagine
MATH5340M Risk Management in Template
Shedrine Wamukekhe
Nessuna valutazione finora
W9a Autoencoders Pca
Documento7 pagine
W9a Autoencoders Pca
zeliawillscumberg
Nessuna valutazione finora
Eigen Solu Beer 3dca9775 - Hw4soln
Documento13 pagine
Eigen Solu Beer 3dca9775 - Hw4soln
balikisyakubu64
Nessuna valutazione finora
SAHADEB - Logistic Reg - Sessions 8-10
Documento145 pagine
SAHADEB - Logistic Reg - Sessions 8-10
Writabrata Bhattacharya
Nessuna valutazione finora
Optimal Interval Clustering: Application To Bregman Clustering and Statistical Mixture Learning
Documento10 pagine
Optimal Interval Clustering: Application To Bregman Clustering and Statistical Mixture Learning
María del Carmen Contreras
Nessuna valutazione finora
10.3 - Eigenvalues and Eigenvectors - Engineering LibreTexts Bien Explicado Matrix Cal
Documento15 pagine
10.3 - Eigenvalues and Eigenvectors - Engineering LibreTexts Bien Explicado Matrix Cal
Masterkey
Nessuna valutazione finora
IMVFX 1 HistGMM F23 S
Documento41 pagine
IMVFX 1 HistGMM F23 S
Sushiman
Nessuna valutazione finora
13: Clustering: Unsupervised Learning - Introduction
Documento6 pagine
13: Clustering: Unsupervised Learning - Introduction
swetha sastry
Nessuna valutazione finora
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
Documento12 pagine
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
Fadhili Dunga
Nessuna valutazione finora
MATLAB Assignment 4 Solns
Documento9 pagine
MATLAB Assignment 4 Solns
Franklin Deo
Nessuna valutazione finora
Solution First Point ML-HW4
Documento6 pagine
Solution First Point ML-HW4
Juan Sebastian Otálora Montenegro
100% (1)
What NPC
Documento5 pagine
What NPC
maurice.gauche
Nessuna valutazione finora
Class Pattern Recognition
Documento6 pagine
Class Pattern Recognition
Mehvish Tamkeen
Nessuna valutazione finora
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
Documento28 pagine
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
Andreas Damanik
Nessuna valutazione finora
MMC 1
Documento7 pagine
MMC 1
Hai Nguyen Thanh
Nessuna valutazione finora
COMP90038 2022S1 A1 Solutions
Documento4 pagine
COMP90038 2022S1 A1 Solutions
francisco reales
Nessuna valutazione finora
hw2 311
Documento4 pagine
hw2 311
john doe
Nessuna valutazione finora
Clustering
Documento61 pagine
Clustering
efi
Nessuna valutazione finora
Daa C5
Documento15 pagine
Daa C5
amanterefe99
Nessuna valutazione finora
Numerical Analysis II Essentials
Da Everand
Numerical Analysis II Essentials
The Editors of REA
Nessuna valutazione finora
DPR2011
Documento20 pagine
DPR2011
Satheesh Chandran
Nessuna valutazione finora
Awp
Documento137 pagine
Awp
Jeslin Antonio
Nessuna valutazione finora
Awp
Documento137 pagine
Awp
Jeslin Antonio
Nessuna valutazione finora
L12 Loop
Documento20 pagine
L12 Loop
Raj Hakani
Nessuna valutazione finora
JNN
Documento18 pagine
JNN
Satheesh Chandran
Nessuna valutazione finora
12
Documento4 pagine
12
Satheesh Chandran
Nessuna valutazione finora
TED Hong 08
Documento10 pagine
TED Hong 08
Satheesh Chandran
Nessuna valutazione finora
Carbon Nanotubes:: Next Generation of Electronic Materials
Documento4 pagine
Carbon Nanotubes:: Next Generation of Electronic Materials
Satheesh Chandran
Nessuna valutazione finora
Forlong - Rivers of Life
Documento618 pagine
Forlong - Rivers of Life
Celephaïs Press / Unspeakable Press (Leng)
100% (15)
Substitution Reactions - PM
Documento64 pagine
Substitution Reactions - PM
prasoon jha
Nessuna valutazione finora
Preparation, Characterization, and Evaluation of Some Ashless Detergent-Dispersant Additives For Lubricating Engine Oil
Documento10 pagine
Preparation, Characterization, and Evaluation of Some Ashless Detergent-Dispersant Additives For Lubricating Engine Oil
Nelson Enrique Bessone Madrid
Nessuna valutazione finora
The Politics of Genre
Documento21 pagine
The Politics of Genre
Arunabha Chaudhuri
Nessuna valutazione finora
Montessori Vs Waldorf
Documento4 pagine
Montessori Vs Waldorf
Abarna
Nessuna valutazione finora
GDCR - Second Revised
Documento290 pagine
GDCR - Second Revised
bhaveshbhoi
Nessuna valutazione finora
0801871441
Documento398 pagine
0801871441
xLeelahx
50% (2)
Assignment # 1 POM
Documento10 pagine
Assignment # 1 POM
naeem
Nessuna valutazione finora
Dosificación Gac007-008 Sem2
Documento2 pagine
Dosificación Gac007-008 Sem2
Ohm Ega
Nessuna valutazione finora
Dadm Assesment #2: Akshat Bansal
Documento24 pagine
Dadm Assesment #2: Akshat Bansal
Akshat
Nessuna valutazione finora
Hssive-Xi-Chem-4. Chemical Bonding and Molecular Structure Q & A
Documento11 pagine
Hssive-Xi-Chem-4. Chemical Bonding and Molecular Structure Q & A
Arties M
Nessuna valutazione finora
Early Childhood Education and Care
Documento53 pagine
Early Childhood Education and Care
Bianca ALbuquerque
Nessuna valutazione finora
Elad Shapira - Shall We Play A Game - Lessons Learned While Playing CoreWars8086
Documento61 pagine
Elad Shapira - Shall We Play A Game - Lessons Learned While Playing CoreWars8086
james wright
Nessuna valutazione finora
E WiLES 2021 - Broucher
Documento1 pagina
E WiLES 2021 - Broucher
Ashish Hingnekar
Nessuna valutazione finora
Inglês - Advérbios - Adverbs.
Documento18 pagine
Inglês - Advérbios - Adverbs.
Khyashi
Nessuna valutazione finora
LC 67002000 B Hr-Jumbo en
Documento2 pagine
LC 67002000 B Hr-Jumbo en
Julio Ortega
Nessuna valutazione finora
Fuel System
Documento24 pagine
Fuel System
Hammad Uddin Jamily
Nessuna valutazione finora
Oxygenation - NCP
Documento5 pagine
Oxygenation - NCP
Cazze Sunio
Nessuna valutazione finora
Review Test 1: Circle The Correct Answers. / 5
Documento4 pagine
Review Test 1: Circle The Correct Answers. / 5
Xenia
Nessuna valutazione finora
Grade 8 Least Mastered Competencies Sy 2020-2021: Handicraft Making Dressmaking Carpentry
Documento9 pagine
Grade 8 Least Mastered Competencies Sy 2020-2021: Handicraft Making Dressmaking Carpentry
HJ HJ
Nessuna valutazione finora
The Impact of Online Games To The Academic
Documento20 pagine
The Impact of Online Games To The Academic
Jessica Bacani
Nessuna valutazione finora
What Role Does Imagination Play in Producing Knowledge About The World
Documento1 pagina
What Role Does Imagination Play in Producing Knowledge About The World
Nathanael Samuel Kuruvilla
Nessuna valutazione finora
IJREAMV06I0969019
Documento5 pagine
IJREAMV06I0969019
UNITED CADD
Nessuna valutazione finora
Enter Absence API
Documento45 pagine
Enter Absence API
EngOsamaHelal
Nessuna valutazione finora
Leonard Nadler' Model
Documento3 pagine
Leonard Nadler' Model
Piet Gabz
67% (3)
Textdocument
Documento254 pagine
Textdocument
Saurabh Sihag
Nessuna valutazione finora
PNP Loan Application Form February 2021 16
Documento6 pagine
PNP Loan Application Form February 2021 16
Wilhelm Regalado
Nessuna valutazione finora
Imaging Anatomy Brain and Spine Osborn 1 Ed 2020 PDF
Documento3.130 pagine
Imaging Anatomy Brain and Spine Osborn 1 Ed 2020 PDF
the gaangster
100% (1)
Film Interpretation and Reference Radiographs
Documento7 pagine
Film Interpretation and Reference Radiographs
Enrique Tavira
67% (3)