Sei sulla pagina 1di 28

MLPandRBFNasClassifiers

Dr.RajeshB.Ghongade
Professor,Vishwakarma InstituteofInformationTechnology,
Pune411048
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
Agenda
PatternClassifiers
Classifierperformancemetrics
CaseStudy:TwoClassQRSclassificationwith
MLP
DataPreprocessing
MLPClassifierMATLABDEMO
CaseStudy:TwoClassQRSclassificationwith
RBFN
RBFNClassifierMATLABDEMO
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
PatternClassifiers
Patternrecognitionultimatelyisusedforclassificationofa
pattern
Identifytherelevantfeaturesaboutthepatternfromthe
originalinformationandthenuseafeatureextractorto
measurethem
Thesemeasurementsarethenpassedtoaclassifierwhich
performstheactualclassification,i.e.,determinesatwhichof
theexistingclassestoclassifythepattern
Hereweassumetheexistenceofnaturalgrouping,i.e.wehave
someaprioriknowledgeabouttheclassesandthedata
Forexamplewemayknowtheexactorapproximatenumber
oftheclassesandthecorrectclassificationofsomegiven
patternswhicharecalledthetrainingpatterns
Thistypeofinformationandthetypeofthefeaturesthat
maysuggestwhichclassifiertoapplyforagivenapplication
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ParametricandNonparametric
Classifiers
Aclassifieriscalledaparametricclassifierifthediscriminant
functionshaveawelldefinedmathematicalfunctionalform
(Gaussian)thatdependsonasetofparameters(meanand
variance)
Innonparametricclassifiers,thereisnoassumedfunctional
formforthediscriminants.Nonparametricclassificationis
solelydrivenbythedata.Thesemethodsrequireagreatdeal
ofdataforacceptableperformance,buttheyarefreefrom
assumptionsabouttheshapeofdiscriminantfunctions(or
datadistributions)thatmaybeerroneous.
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
MinimumDistanceClassifiers
Ifthetrainingpatternsseemtoformclustersweoftenuseclassifiers
whichusedistancefunctionsforclassification.
Ifeachclassisrepresentedbyasingleprototypecalledthecluster
center,wecanuseaminimumdistanceclassifiertoclassifyanew
pattern.
Asimilarmodifiedclassifierisusedifeveryclassconsistsofseveral
clusters.
Thenearestneighborclassifierclassifiesanewpatternbymeasuring
itsdistancesfromthetrainingpatternsandchoosingtheclassto
whichthenearestneighborbelongs
Sometimestheaprioriinformationistheexactorapproximate
numberofclassesc.
Eachtrainingpatternisinoneoftheseclassesbutitsspecific
classificationisnotknown.Inthiscaseweusealgorithmsto
determinethecluster(class)centersbyminimizingsome
performanceindexandarefounditerativelyandthenanewpattern
isclassifiedusingaminimumdistanceclassifier.
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
StatisticalClassifiers
Manytimesthetrainingpatternsofvariousclassesoverlap
forexamplewhentheyareoriginatedbysomestatistical
distributions.
Inthiscaseastatisticalapproachisappropriate,particularly
whenthevariousdistributionfunctionsoftheclassesare
known
Astatisticalclassifiermustalsoevaluatetheriskassociated
witheveryclassificationwhichmeasurestheprobabilityof
misclassification.
TheBayesclassifierbasedonBayesformulafromprobability
theoryminimizesthetotalexpectedrisk
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
FuzzyClassifiers
Quiteoftenclassificationisperformedwithsomedegreeof
uncertainty
Eithertheclassificationoutcomeitselfmaybeindoubt,or
theclassifiedpatternxmaybelonginsomedegreetomore
thanoneclass.
Wethusnaturallyintroducefuzzyclassificationwherea
patternisamemberofeveryclasswithsomegradeof
membershipbetween0and1
ForsuchasituationthecrispkMeansalgorithmis
generalizedandreplacedbytheFuzzyk Meansandafterthe
clustercentersaredetermined,eachincomingpatternis
givenafinalsetofgradesofmembershipwhichdetermine
thedegreesofitsclassificationinthevariousclusters.
Thesearethenonparametricclassifiers.
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ArtificialNeuralNetworks
Theneuralnetapproachassumesasotherapproaches
beforethatasetoftrainingpatternsandtheircorrect
classificationsisgiven
Thearchitectureofthenetwhichincludesinputlayer,output
layerandhiddenlayersmaybeverycomplex
Itischaracterizedbyasetofweightsandactivationfunction
whichdeterminehowanyinformation(inputsignal)isbeing
transmittedtotheoutputlayer.
Theneuralnetistrainedbytrainingpatternsandadjuststhe
weightsuntilthecorrectclassificationsareobtained
Itisthenusedtoclassifyarbitraryunknownpatterns
Thereareseveralpopularneuralnetclassifiers,likethe
multilayeredperceptron(MLP),radialbasisfunctionneural
nets(RBFN),selforganizingfeaturemaps(SOFM)and
supportvectormachine(SVM)
Thesebelongtothesemiparametricclassifiertype
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
PatternClassifiers
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ClassifierPerformanceMetrics
The Confusion Matrix
The confusion matrix is a table where
the true classification is compared with
the output of the classifier
Let us assume that the true
classification is the row and the
classifier output is the column
The classification of each sample
(specified by a column) is added to the
row of the true classification
A perfect classification provides a
confusion matrix that has only the
diagonal populated
All the other entries are zero
The classification error is the sum of off
diagonal entries divided by the total
number of samples
TP:Truepositive
TN:Truenegative
FP:Falsepositive
FN:Falsenegative
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
1. Sensitivity (Se) is the fraction of abnormal ECG beats that are correctly detected
among all the abnormal ECG beats
2. Specificity (SPE) is the fraction of normal ECG beats that are correctly classified
among all the normal ECG beats
3. Positive predictivity is the fraction of real abnormal ECG beats in all detected
beats
4. False Positive Rate is the fraction of all normal ECG beats that are not rejected
ClassifierPerformanceMetrics
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ClassifierPerformanceMetricscontinued
5. Classification rate (CR) is the fraction of all correctly classified ECG beats, regardless of
normal or abnormal among all the ECG beats
6. Mean squared error (MSE) is a measure used only as the stopping criterion while
training the ANN
7. Percentage average accuracy is the total accuracy of the classifier
8. Training Time is the CPU time required for training an ANN described in terms of time
per epoch per total exemplars in seconds
9. Preprocessing time is the CPU time required for generating the transform part of the
feature vector in seconds
10. Resources consumed for the ANN topology is the sum of weights and biases for the
first layer and the second layer also called as adjustable or free parameters of the
network
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ReceiverOperatingCharacteristics(ROC)
Thereceiveroperatingcharacteristicisametricusedtocheck
thequalityofclassifiers
Foreachclassofaclassifier,rocappliesthresholdvalues
acrosstheinterval[0,1]tooutputs.
Foreachthreshold,twovaluesarecalculated,theTrue
PositiveRatio(thenumberofoutputsgreaterorequaltothe
threshold,dividedbythenumberofonetargets),andthe
FalsePositiveRatio(thenumberofoutputslessthanthe
threshold,dividedbythenumberofzerotargets)
ROCgivesustheinsightoftheclassifierperformance
especiallyinthehighsensitivityandhighselectivityregion
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
AtypicalROC
Foranidealclassifier
theareaundercurve
ofROC=1
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
CaseStudy:TwoClassQRS
classificationwithMLP
ProblemStatement:
DesignasystemtocorrectlyclassifyextractedQRS
complexesintoTWOclasses:NORMALandABNORMAL
usingMLP
ABNORMAL NORMAL
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
DataPreprocessing
MeanadjustthedatatoremovetheDC
component
Wecanuseall180pointsfortrainingbutitposes
acomputationaloverhead
Usingfeaturesfortrainingminimizesthe
responseofANNtonoisepresentinthesignal
TrainingtimeoftheANNreduces
OverallaccuracyoftheANNimproves
Generalizationofthenetworkimproves
Featureextractionintransformdomaincanuse
DiscreteCosineTransform(DCT)
Onehotencodingtechniqueforthetarget
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
Selectionofsignificantcomponents
Metrics for component selection
Components that retain 99.5% of the signal energy
Percent Root Mean Difference (PRD)
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
DiscreteCosineTransform
Thirty DCT components contribute to 99.86%
of the signal energy
PRD(30 components)=0.5343%
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
Sr. Transform Coefficients PRD % Signal Energy
(%)
1
DCT
5 9.8228 35.81
2 15 5.4096 80.53
3 30 0.5343 99.86
4 40 0.3134 99.93
DiscreteCosineTransform selectionofsignificantcoefficients
EffectoftruncatingDCT
coefficients
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
DatasetCreation
Itisalwaysdesirablethatwehaveequalnumber
ofexemplarsfromeachclassforthetraining
dataset
Thispreventsfavoringanyclassduring
training
Ifthenumberofexemplarsareunequalwehave
todeskewtheclassifierdecision
Deskewingsimplyscalestheoutputaccording
totheprobabilityoftheinputclasses
Datarandomizationbeforetrainingisamust
otherwiserepetitivetrainingbysameclassmay
notallowthenetworktoconverge,remember
thattheerrorgradientisimportantforachieving
globalminima,ifitexists!
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
PartitionthedataintoTHREEdisjointsets
Trainingset
Crossvalidationset
Testingset
Beforethedataispresentedtothenetfortrainingwe
havetonormalize thedatainrange[1,1],thishelpsin
fasternetworklearning
AmplitudeandOffsetaregivenas:
Tonormalizedata:
Todenormalizedata:
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
Methodology
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
MLPClassifierMATLABDEMO
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
CaseStudy:TwoClassQRS
classificationwithRBFN
ProblemStatement:
DesignasystemtocorrectlyclassifyextractedQRS
complexesintoTWOclasses:NORMALandABNORMAL
usingRBFN
ABNORMAL NORMAL
Weusethesamepreprocesseddatafrompreviouscasestudy
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
RBFNClassifierMATLABDEMO
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e
ThankYou!
@
D
r
.
R
.
B
.
G
h
o
n
g
a
d
e

Potrebbero piacerti anche