Sei sulla pagina 1di 260

AICTE SPONSORED

STAFF DEVELOPMENT PROGRAMME


on
PATTERN CLASSIFICATION TECHNIQUES FOR AUDIO
AND VIDEO PROCESSING
(23-11-2009 to 4-12-2009)

Patrons
Prof. B. Palaniappan
Prof. A.M. Sameeullah

Coordinators
Dr. V. Ramalingam, Professor
Dr. S. Palanivel, Reader

ANNAMALAI UNIVERSITY
FACULTY OF ENGG. AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
ANNAMALAI NAGAR-608 002
TAMIL NADU.
Programme Schedule
Date Time
9.30 am - 11.00 am 11.15 am - 12.45 pm 2.00 pm - 5.00 pm
23-11-09 Inauguration Introduction to Matlab Working with Matlab
(Dr. V. Ramalingam and
Mr. K. Rajan)
24-11-09 Research in speech Basics of speech Working with
processing processing-I C++ and Matlab for speech
(Dr. S. Palanivel) (Mrs. S. Jothilakshmi) processing
25-11-09 Basics of speech Research in image Working with C++ and Mat-
processing-II processing lab for speech and image
(Mrs. S. Jothilakshmi) (Dr. S. Palanivel) processing
26-11-09 Basics of image processing-I Basics of image processing-II Working with C++ and
(Mrs. AN. Sigappi) (Mrs. S. Abirami) Matlab for image processing
27-11-09 Normal densities and Bayes K-means and E-M Matlab for densities, Bayes,
theory algorithms K-means and E-M
(Mr. M. Balasubramanian) (Mrs. P. Dhanalakshmi)
28-11-09 PCA and LDA RBFNN Matlab for PCA, LDA and
(Mrs. M. Kalaiselvigeetha) (Mrs. A. Geetha) RBFNN
30-11-09 GMM SVM Matlab for GMM and SVM
(Mrs. T.S. Subashini) (Mr. M. Balasubramanian)
1-12-09 HMM and HTK toolkit BBN and CART Matlab for BBN, CART and
(Mrs. AN. Sigappi) (Mrs. G. Arulselvi) HTK toolkit usage
2-12-09 Perceptron, BPNN and AANN Matlab for perceptron,
PNN (Mrs. M. Arulselvi) (Dr. S. Palanivel) BPNN, PNN and AANN
3-12-09 Applications of speech and image processing
Person authentication Audio indexing Tracking and Speaker
(Mr. M. Balasubramanian) (Mrs. P. Dhanalakshmi) facial expres- diarization
sion recogni- (Mrs.
tion S.Jothilakshmi)
(Mrs.A.Geetha)
4-12-09 Applications of image processing
Video indexing Medical image classification Valedictory
(Mrs. M. Kalaiselvigeetha) (Mrs. T.S. Subashini)

i
TABLE OF CONTENTS

List of Tables viii

List of Figures ix

1 Introduction to Matlab 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Matlab Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Data Analysis and Statistical Functions . . . . . . . . . . . . . . . . . . 11
1.6 Matrix Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Basics of Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.9 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.10 Accessing Image and Audio Files . . . . . . . . . . . . . . . . . . . . . 23

2 Basics of Speech Processing 25


2.1 Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Speech Production Mechanism . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Digitizing Speech Signals . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Phoneme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Syllable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Speech Sounds Categories . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 Formant Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.10 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.11 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.12 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.13 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.14 Low Pass Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.15 High Pass Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.16 Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.17 Features of Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.17.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.17.2 Short Time Energy . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.17.3 Short Time Average Zero crossing Rate . . . . . . . . . . . . . . 49
2.17.4 Short Time Autocorrelation . . . . . . . . . . . . . . . . . . . . 51
2.17.5 Pitch Period Computation . . . . . . . . . . . . . . . . . . . . . 52
2.17.6 Linear Prediction Coefficients . . . . . . . . . . . . . . . . . . . 55
2.17.7 Linear Prediction Cepstral Coefficients . . . . . . . . . . . . . . 56
2.17.8 Mel Frequency Cepstral Coefficients . . . . . . . . . . . . . . . . 56

3 Basics of Image Processing-I 62


3.1 Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Steps in Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Applications of Image Processing . . . . . . . . . . . . . . . . . . . . . 63
3.4 Sampling and Quantization . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.1 Point Processing Techniques . . . . . . . . . . . . . . . . . . . . 65
3.5.1.1 Simple Intensity Transformations . . . . . . . . . . . . 65
3.5.1.2 Histogram Processing . . . . . . . . . . . . . . . . . . 71
3.5.1.3 Image Subtraction . . . . . . . . . . . . . . . . . . . . 72
3.5.2 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.2.1 Linear and Nonlinear Spatial Filters . . . . . . . . . . 74
3.5.2.2 Smoothing Filters . . . . . . . . . . . . . . . . . . . . 75
3.5.2.3 Sharpening Filters . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Filtering in the Frequency Domain . . . . . . . . . . . . . . . . 78
3.5.3.1 Lowpass Frequency Domain Filters . . . . . . . . . . . 78
3.5.3.2 Highpass Frequency Domain Filters . . . . . . . . . . . 81

4 Basics of Image Processing-II 85


4.1 Image Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.1 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.1.1 Discrete Fourier Transform . . . . . . . . . . . . . . . 85
4.1.1.2 Matlab Code for FFT . . . . . . . . . . . . . . . . . . 86
4.1.2 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . 86
4.1.2.1 Matlab Code for DCT . . . . . . . . . . . . . . . . . . 87
4.1.3 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . 87
4.1.3.1 Matlab Function for DWT . . . . . . . . . . . . . . . . 88

iii
4.2 Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 Morphological Operations . . . . . . . . . . . . . . . . . . . . . 89
4.2.1.1 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.1.2 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.2 Image Segmentation using Matlab . . . . . . . . . . . . . . . . . 92
4.4 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.1 Image Compression using DCT . . . . . . . . . . . . . . . . . . 94
4.5 Basics of Color Image and Video . . . . . . . . . . . . . . . . . . . . . 95

5 Normal Distribution and Bayes Theory 98


5.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.1 Univariate Density . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.2 Matlab Code for Univariate Density . . . . . . . . . . . . . . . . 100
5.1.3 Multivariate Density . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.4 Matlab Code for Multivariate Density . . . . . . . . . . . . . . . 105
5.2 Bayes Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.1 Matlab Code for Bayes Theory . . . . . . . . . . . . . . . . . . 109

6 k-means Clustering 112


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 k-means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3 Matlab Code for k-means Clustering . . . . . . . . . . . . . . . . . . . 117
6.4 Expectation Maximization (E-M) Algorithm . . . . . . . . . . . . . . . 119
6.5 Matlab Code for E-M Algorithm . . . . . . . . . . . . . . . . . . . . . . 120

7 Principle Components Analysis and Linear Discriminant Analysis 122


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2 Principal Components Analysis . . . . . . . . . . . . . . . . . . . . . . 122
7.2.1 Background Mathematics . . . . . . . . . . . . . . . . . . . . . 123
7.2.1.1 Standard Deviation . . . . . . . . . . . . . . . . . . . . 123
7.2.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.1.3 Covariance . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.1.4 Covariance Matrix . . . . . . . . . . . . . . . . . . . . 125
7.2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.2.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . 127
7.2.3 Principal Components Analysis - Algorithm . . . . . . . . . . . 129
7.2.3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . 129

iv
7.3 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . 138
7.3.1 Linear Discriminant Analysis - Algorithm . . . . . . . . . . . . 138

8 Radial Basis Function Neural Network 144


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.2 Architecture of Radial Basis Function Neural Network . . . . . . . . . 144
8.3 Training of RBFNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4 Matlab Code for RBFNN . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

9 Gaussian Mixture Model 154


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.2.1 Mean Vector and Covariance Matrix . . . . . . . . . . . . . . . 155
9.2.2 The Multivariate Normal Distribution . . . . . . . . . . . . . . 155
9.2.3 Univariate Normal Distribution . . . . . . . . . . . . . . . . . . 156
9.2.4 Bivariate Normal Distribution: . . . . . . . . . . . . . . . . . . . 156
9.2.5 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3 Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3.1 GMM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.3.1.1 The EM Algorithm . . . . . . . . . . . . . . . . . . . . 159
9.3.1.2 Initialization Issues . . . . . . . . . . . . . . . . . . . . 159
9.3.2 GMM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.5 Matlab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.5.1 GMM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.5.1.1 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . 163
9.5.2 GMM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.5.3 GMM Training - Matlab Output . . . . . . . . . . . . . . . . . 165
9.5.4 GMM Testing - Matlab Output . . . . . . . . . . . . . . . . . . 166

10 Support Vector Machines 168


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.2 SVM Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.2.1 SVM for Linearly Separable Data . . . . . . . . . . . . . . . . . 169
10.2.2 SVM for Linearly Non-separable Data . . . . . . . . . . . . . . 170
10.3 Determining Support Vectors . . . . . . . . . . . . . . . . . . . . . . . 170
10.4 Inner Product Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
10.4.1 Example for Polynomial Kernel: OR Problem . . . . . . . . . . 172

v
10.4.2 Example for Gaussian Kernel: OR Problem . . . . . . . . . . . 174
10.5 SVM Tool Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . 177

11 Hidden Markov Models 180


11.1 Need for Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . 180
11.2 Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
11.3 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
11.3.1 Notations and Model Parameters used in HMM . . . . . . . . . 183
11.3.2 Order of HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
11.3.3 Types of HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
11.3.4 Applications of HMM . . . . . . . . . . . . . . . . . . . . . . . . 185
11.4 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
11.4.1 Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . . 186
11.4.2 Decoding Problem . . . . . . . . . . . . . . . . . . . . . . . . . 187
11.4.3 Learning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 187
11.5 Implementation Example . . . . . . . . . . . . . . . . . . . . . . . . . . 188
11.6 Introduction to HTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.7 Overview of HTK Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.8 Generic Properties of a HTK Tool . . . . . . . . . . . . . . . . . . . . . 192
11.9 HTK Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.9.1 Data Preparation Tools . . . . . . . . . . . . . . . . . . . . . . . 193
11.9.2 Training Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
11.9.3 Recognition Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.9.4 Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.10 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12 Basics of Neural Networks 198


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
12.1.1 Structure and Learning Process of Human Brain . . . . . . . . . 199
12.1.2 Structure and Learning Process of Artificial Neuron . . . . . . . 200
12.2 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.2.1 Perceptron Training Algorithm . . . . . . . . . . . . . . . . . . 201
12.2.2 Matlab Code for Perceptron . . . . . . . . . . . . . . . . . . . . 203
12.3 Backpropagation Neural Network . . . . . . . . . . . . . . . . . . . . . 204
12.3.1 Backpropagation Training Algorithm . . . . . . . . . . . . . . . 205
12.3.2 Matlab Code for BPNN . . . . . . . . . . . . . . . . . . . . . . 206
12.4 Probabilistic Neural Network . . . . . . . . . . . . . . . . . . . . . . . 207
12.4.1 Algorithm for PNN . . . . . . . . . . . . . . . . . . . . . . . . . 208
12.4.2 Matlab Code for 2-class Classification using PNN . . . . . . . . 209

vi
13 Autoassociative Neural Network Model 213
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13.1.1 Characteristics of Autoassociative Neural Network Models . . . 213
13.1.2 Matlab Implementation of Autoassociative Neural Network . . . 215
13.1.3 Applications of Autoassociative Neural Network . . . . . . . . . 215
13.1.3.1 Face Recognition . . . . . . . . . . . . . . . . . . . . . 215
13.1.3.2 Speaker Authentication . . . . . . . . . . . . . . . . . 218

14 Bayesian Belief Networks, Classification and Regression Tree 220


14.1 Bayesian Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 220
14.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
14.1.2 Why Bayesian Networks? . . . . . . . . . . . . . . . . . . . . . 221
14.1.3 Representation of BBN . . . . . . . . . . . . . . . . . . . . . . . 222
14.1.4 Bayes Rule, Beliefs and Evidence . . . . . . . . . . . . . . . . . 222
14.1.5 Examples for BBN . . . . . . . . . . . . . . . . . . . . . . . . . 223
14.2 Classification and Regression Tree . . . . . . . . . . . . . . . . . . . . . 229
14.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
14.2.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
14.2.2.1 Advantages of Decision Trees . . . . . . . . . . . . . . 231
14.2.2.2 Disadvantages of Decision Trees . . . . . . . . . . . . . 231
14.2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14.2.4 Classification and Regression Tree . . . . . . . . . . . . . . . . . 232
14.2.4.1 Finding the Initial Split . . . . . . . . . . . . . . . . . 232
14.2.4.2 Pruning the Tree . . . . . . . . . . . . . . . . . . . . . 233

Appendix A 237
A.1 C++ Code for Audio Processing . . . . . . . . . . . . . . . . . . . . . . 237

Appendix B 240
B.1 C++ Code for Processing Gray (PGM) and Color (PPM) Images . . . 240

Bibliography 247

vii
List of Tables

5.1 Height and weight of males . . . . . . . . . . . . . . . . . . . . . . . . . 103


5.2 Heights of males and females . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1 k-means clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . 113


6.2 Training set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3 Final grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.1 PCA original data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130


7.2 PCA mean subtracted data . . . . . . . . . . . . . . . . . . . . . . . . 130
7.3 PCA projected data with 2 eigenvectors . . . . . . . . . . . . . . . . . 133
7.4 PCA reconstructed data . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

10.1 Types of SVM inner product kernels . . . . . . . . . . . . . . . . . . . 172


10.2 OR table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

11.1 Visible actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182


11.2 Input file with sequence of feature vectors . . . . . . . . . . . . . . . . 195
11.3 Input file with sequence of feature vectors . . . . . . . . . . . . . . . . 196
11.4 Testfile with sequence of feature vectors . . . . . . . . . . . . . . . . . . 197

14.1 Training data for CART . . . . . . . . . . . . . . . . . . . . . . . . . . . 233


List of Figures

2.1 Speech signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


2.2 Physiology of speech production. . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Sampling and quantization processes . . . . . . . . . . . . . . . . . . . . 29
2.4 Formant frequencies of speech sounds ”oh” and ”ee” . . . . . . . . . . . . 31
2.5 Principle of FFT and IFFT . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Graphical representation of DCT weights . . . . . . . . . . . . . . . . . . 35
2.7 Computation of DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.8 Energy compaction of DCT . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.9 Amplitude response of LPF . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10 Low pass filtering of a signal . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.11 High pass filtering of a signal . . . . . . . . . . . . . . . . . . . . . . . . 44
2.12 Spectrogram of a speech signal . . . . . . . . . . . . . . . . . . . . . . . 45
2.13 Preprocessing of a speech frame . . . . . . . . . . . . . . . . . . . . . . . 48
2.14 Short time energy of a speech signal . . . . . . . . . . . . . . . . . . . . 49
2.15 Short time average ZCR of a speech signal . . . . . . . . . . . . . . . . . 51
2.16 Short time autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.17 Block diagram of LPC computation . . . . . . . . . . . . . . . . . . . . . 55
2.18 Extraction of MFCC from speech signal . . . . . . . . . . . . . . . . . . . 57
2.19 Mel scale filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.20 Computation of MFCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.1 Mapping between original values and imcomplement function . . . . . . . 65


3.2 Image negative.(a) Input image (b) Complemented image . . . . . . . . . 66
3.3 Form of contrast stretching transformation function . . . . . . . . . . . . 67
3.4 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Contrast stretching.(a) Input image (b) Contrast stretched image for E=4
(c) E=5 (d) E=10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6 Contrast stretching.(a) Input image (b) Stretched image . . . . . . . . . . 69
3.7 Compression of dynamic range. (a) Input image (b) Images with dynamic
range compressed for values of c=1 (c) c=2 (d) c=5 . . . . . . . . . . . . 70
3.8 Histogram.(a) Low contrast input image (b) Histogram of low contrast image 71
3.9 Histogram.(a) Intensity adjusted image (b) Histogram of intensity adjusted
image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Histogram equalization. (a) Original input image (b) Histogram equalized
image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.11 Image subtraction. (a) Input image (b) Subtracted image . . . . . . . . . 74
3.12 Averaging filtering. (a) Original image (b) Filtered image . . . . . . . . . 75
3.13 Median filtering. a) Original input image (b) Image with salt and pepper
noise (c) Median filtered image . . . . . . . . . . . . . . . . . . . . . . . 77
3.14 Unsharp masking. (a) Original input image (b) Filtered image . . . . . . . 78
3.15 Basic steps for filtering in frequency domain . . . . . . . . . . . . . . . . 78
3.16 Ideal lowpass filter. (a) Perspective plot of filter transfer function (b) Filter
cross section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.17 Ideal lowpass filtering. (a) Input image (b) Filtered image . . . . . . . . . 81
3.18 Ideal highpass filter. (a) Perspective plot of filter transfer function (b) Filter
cross section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.19 Ideal highpass filtering. (a) Input image (b) Sharpened image . . . . . . . 84

4.1 Discrete Fourier transform. . . . . . . . . . . . . . . . . . . . . . . . . . 86


4.2 Discrete cosine transform. . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Two-dimensional wavelet decomposition . . . . . . . . . . . . . . . . . . 88
4.4 Two-dimensional wavelet transformation (Original image). . . . . . . . . 89
4.5 Two-dimensional wavelet transformation (Filtered images). . . . . . . . . 90
4.6 Morphological dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Morphological erosion. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8 Image segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.9 Image compression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.10 Color image and its components. . . . . . . . . . . . . . . . . . . . . . . 97
4.11 Two consecutive frames in the video. . . . . . . . . . . . . . . . . . . . 97

5.1 Peak of the univariate normal distribution occurs at x = µ . . . . . . . . 99


5.2 Univariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 Bivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Gaussian distribution for hm and hf . . . . . . . . . . . . . . . . . . . . 108

6.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


6.2 Objects in the feature space . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Initial value of centroids . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.4 Centroids in iteration 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.5 Centroids in iteration 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

x
7.1 Curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Orthogonal principal eigenvectors . . . . . . . . . . . . . . . . . . . . . . 130
7.3 Plot of PCA original data . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4 Plot of PCA mean subtracted data with eigen vectors . . . . . . . . . . . 132
7.5 Plot of PCA projected data with 2 eigen vectors . . . . . . . . . . . . . . 134

8.1 Radial basis function neural network. . . . . . . . . . . . . . . . . . . . . 145


8.2 Plot of train and test data . . . . . . . . . . . . . . . . . . . . . . . . . . 151

9.1 Mixture of two gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . 157


9.2 GMM-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.3 GMM-Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

10.1 Architecture of the SVM (Ns is the number of support vectors). . . . . . 169
10.2 SVM example to classify a person into two classes: overweighed, not over-
weighed; two features are pre-defined: weight and height. Each point rep-
resents a person. Dark circle point (•) : overweighed; star point (∗) : not
overweighed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
10.3 An example for SVM kernel function Φ(x) maps two dimensional input
space to higher three dimensional feature space. (a) Nonlinear problem.
(b) Linear problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

11.1 Markov model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182


11.2 State diagram of HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
11.3 4-State ergodic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
11.4 4-State left-right model . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
11.5 Isolated word recogniser . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
11.6 Using HMMs for isolated word recognition . . . . . . . . . . . . . . . . . 191
11.7 Software architecture of HTK . . . . . . . . . . . . . . . . . . . . . . . . 191

12.1 Structure of a biological neuron . . . . . . . . . . . . . . . . . . . . . . . 199


12.2 Structure of artificial neuron . . . . . . . . . . . . . . . . . . . . . . . . . 200
12.3 Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.4 Architecture of perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . 202
12.5 Architecture of backpropagation neural network . . . . . . . . . . . . . . . 204
12.6 Architecture of probabilistic neural network . . . . . . . . . . . . . . . . . 208

13.1 A five layer AANN model. . . . . . . . . . . . . . . . . . . . . . . . . . 214

xi
13.2 Distribution capturing ability of AANN model. From [1]. (a) Artificial 2
dimensional data. (b) 2 dimensional output of AANN model with the struc-
ture 2L 10N 1N 10N 2L. (c) Probability surfaces realized by the network
structure 2L 10N 1N 10N 2L. . . . . . . . . . . . . . . . . . . . . . . . 215
13.3 Real time facial feature extraction for varying size, orientation and background.216
13.4 Snapshot of the real time person recognition system . . . . . . . . . . . . 217

14.1 Simple belief net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223


14.2 BBN for weather condition . . . . . . . . . . . . . . . . . . . . . . . . . 223
14.3 BBN for fish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
14.4 BBN - linear and loop structure . . . . . . . . . . . . . . . . . . . . . . 229
14.5 Decision region and unpruned Classification tree . . . . . . . . . . . . . . 234
14.6 Decision region and classification tree . . . . . . . . . . . . . . . . . . . 235

1
Chapter 1

Introduction to Matlab

by
Dr. V. RAMALINGAM
Professor, Department of CSE,
Annamalai University.
and
K. RAJAN
Lecturer (Senior Scale), Department of EEE,
Muthiah Polytechnic College.

1.1 Introduction
Matlab (Matrix Laboratory) [2] [3], distributed by the MathWorks, is a technical com-
puting environment for high performance numeric computation and visualization. It
integrates numerical analysis, matrix computation, signal processing, and graphics in
an easy-to-use environment. Matlab [4] also features a family of application-specific
solutions called toolboxes. Toolboxes are comprehensive collections of Matlab func-
tions that extend its environment in order to solve particular classes of problems

Starting Matlab
Matlab can be started by double clicking on the Matlab icon on the desktop of the
computer. This brings up the window called the Command Window. This window
allows a user to enter simple commands. The prompt >> is displayed in the command
window, and when the command window is active, a blinking cursor appears to the
right of the prompt. To perform simple computations, type a command and press the
Enter or Return key.

Exiting Matlab
To close Matlab, type exit in the Command Window, and press Enter or Return key.
A second way to close your current Matlab session is to select File in the Matlab’s
menubar and click on Exit Matlab option. All unsaved information residing in the

2
Matlab Workspace will be lost. To abort a command in Matlab, press Control+C.

1.2 Matlab Environment


Workspace
The data and variables created in the command window reside in what is called the
Matlab workspace or Base workspace. In addition to viewing variables in the
Workspace window, we can see what variable names exist in the Matlab workspace,
by issuing who command. To get help type “help” (will give you a list of help topics).
If you don’t know the exact name of the topic or command you are looking for, type
”lookfor keyword” (e.g., ”lookfor regression”)

Windows used in Matlab


When writing a long Matlab statement that exceeds a single row use “...” to continue
statement to next row. Using semicolon “;” after an expression or statement will
suppress printing. If ”;” is omitted then Matlab will display the result.
The percent ”%” symbol is used to begin comments. Just like a calculator, Matlab
can do basic mathematical operations.
>> 15+43+37 ans = 95
Command Issues commands to Matlab for processing
Command Running history of prior commands issued in the com-
History mand window
Current GUI for directory and file manipulation in Matlab
Directory
Workspace GUI for viewing, editing, loading, and saving Matlab
variables.
Help GUI for finding and viewing on-line documentation.
Editor Text editor for creating M-files
Profiler Tool for optimizing M-file performance.
Recalling previous commands
Pressing the UP arrow recalls the most recent command to the Matlab prompt. Re-
peatedly pressing the UP arrow scrolls back through prior commands, one at a time.
In a similar manner pressing DOWN arrow scrolls forward through commands.

Data types
Matlab has three basic data types: strings, scalars and matrices. Arrays are just
matrices that have only one row. Matlab has also lots of built-in functions to work

3
with these things.
Scalar – A single value. When a matrix has one row and one column (1 X 1), it is
referred to as a scalar.
A vector is an ordered list of numbers. A vector of any length can be entered by typing
a list of numbers, separated by commas or spaces, inside square brackets. When a
matrix has one row or one column, it is also referred to as a vector. (row vector or
column vector)
Matrix - a set of numbers arranged in a rectangular grid of rows and columns. Matlab
has two types of numbers: integers and real numbers.
Integer data : Matlab supports signed and unsigned integer data types having 8,
16, 32 and 64 bits. The upper and lower limits of integers are given by the intmax()
and intmin().
The default data type in Matlab is double precision, or simply double. The real values
are represented as single(single-precision data type ), double (double-precision data
type)
realmax(‘type’) and realmin(‘type’) return maximum and minimum real values for
the specified type.

4
Operators
Arithmetic Operators Relational operators.
plus - Plus + eq - Equal ==
uplus - Unary plus + ne - Not equal ˜=
minus - Minus - lt - Less than <
uminus - Unary minus - gt - Greater than >
mtimes - Matrix multiply * le - Less than or equal <=
times - Array multiply .* ge - Greater than or equal >=
mpower - Matrix power ˆ Logical operators.
power - Array power .ˆ relop - Short-circuit logical AND &&
mldivide - Backslash or left ma- relop - Short-circuit logical OR ||
trix divide \ and - Element-wise logical AND &
mrdivide - Slash or right matrix or - Element-wise logical OR |
divide / not - Logical NOT ˜
ldivide - Left array divide .\ xor - Logical EXCLUSIVE OR
rdivide - Right array divide ./ any - True if any element of vector is
kron - Kronecker tensor product nonzero
kron all - True if all elements of vector are
nonzero

Variables
All variables and function names are case sensitive.
Exisiting variables can be listed using who and whos commands.
Before using a new variable name, you can check to see if it is valid with the isvarname
function. Note that isvarname does not consider names longer than namelength-
max characters to be valid.
To test whether a proposed variable name is already used as a function name, type
which -all <name>
Matlab uses the characters i and j to represent imaginary units. Avoid using i and j
for variable names if you intend to use them in complex arithmetic. If you want to
create a complex number without using i and j, you can use the complex function eg.
complex(5,4)

Saving a Matlab session


diary filename causes a copy of all subsequent command window input and most of
the resulting command window output to be appended to the named file. If no file is
specified, the file ’diary’ is used.

5
diary off suspends it. diary on turns it back on. diary, by itself, toggles the diary
state. (Command History does not show the results).

Saving and restoring Matlab variables


Variables can be stored in files by using save command. Matrices can be read from
or written into a data file. Matlab can interface to two different types of data files-
MAT file and ASCII file. A MAT file contains data in a binary format and an ASCII
file contains data in a standard text format.
Save file1 x1 x2; will save the matrices x1 and x2 in a file named file1.mat.
save - saves workspace variables to disk.
save filename saves all workspace variables to the binary ”mat-file”
save filename x saves only x.
save filename x y z saves x, y, and z. the wildcard ’*’ can be used to save only those
variables that match a pattern.
Save filename x,y,z –ascii saves variables in ascii format.
All these matrices can be restored using load command from the files.
load file1; % Loads all variables found in the file1.mat. (They overwrite existing
variables)
To load specific variables from MAT-file, variable names can be specified.
>>load filename var1 var2 var3
The load command can also open ASCII text files.
>>load filename.ext opens the file filename.ext and loads the data into a single
double-precision data array named filename.
To find out whether a data file exists and what variables it holds, Matlab commands
exist and whos are valuable.
Data files can be deleted by using delete command.>> delete filename.ext

Clear commands
clc clears command screen
clf clears figure
clear clears workspace
clear var1 var2 ... clears the variables specified. The wildcard character ’*’ can be
used to clear variables that match a pattern

Reserved words
for, end, if, while, function, return, else, elseif, case, otherwise, switch,
continue, try, catch, global, persistent, break.
This list is returned as an output of the iskeyword function. Words similar to key-

6
words by capitalizing one or more letters can be used as variables

To add a directory to the search path, use either of the following:


At the menubar, select File -> Set Path OR
At the command line, use the addpath function
addpath dir1 dir2 dir3 ... prepends all the specified directories to the path.We can set
the desired directory as our current directory.
Special values
pi Represents PI (π = 22/7)
i,j Represents the value of imaginary number -1 = sqrt(-1)
inf Represents infinity (∞)
NaN or nan Not a Number, typically occurs when an expression is undefined, as in
the division of zero by zero.
clock Represents the current time in six element row vector containing year, month,
day, hour, minute, and seconds.
date Represents the current date in character string format, such as 25-Jun-98
eps Represents the floating-point precision for the computer being used. This is the
smallest amount with which two values can differ in the computer.
ans Represents a value computed by an expression but not stored in a variable name.

1.3 Arrays
Array construction
Arrays can be created in many ways in Matlab. In a direct way, to create an array, start
with a left bracket, and enter the desired values separated with spaces (or comma),
and close the array with a right bracket.
Individual elements are accessed by using subscripts; which start from 1.
>> x=[ 15 8 12 32] creates a row vector containing 4 elements.
X=first:last creates a row vector X starting with first, counting by 1, ending at last or
before last. X=first:increment:last
>> y=1:10
>>z=1:0.5:20
X=linspace(first,last,n) creates linearly spaced row vector X starting with first, ending
at last, having n elements.
X=logspace(first,last,n) creates logarithmically spaced row vector X starting with
10first ending at 10last and having n elements.
>>a=linspace(1,10,8)

7
a = Columns 1 through 8

1.0000 2.2857 3.5714 4.8571 6.1429 7.4286 8.7143 10.0000

Separating elements by spaces or commas specifies elements in different columns,


whereas separating elements by semicolon specifies elements in different rows. Try
the following.
>>a=[5 6 7; 8 9 4; 5 2 3]

>>b=[5 6 7
8 9 4
5 2 3]

In addition to using semicolons, pressing the ENTER key while entering an array tells
to start a new row. All rows must contain same number of columns.
Creating scalar values, vectors and matrices
N = 5 % a scalar
v = [1 0 0] % a row vector
v = [1;2;3] % a column vector (size is 3 X 1 )
v=[1:10] %a vector of 1 to 10
v=[0:pi/8:2*pi] % a vector of values 0 to 2pi incremented by pi/8
v = [1:.5:3] % a vector in a specified range:
v = pi*[-4:4] % [start: stepsize: end] multiplied with pi
v = [] % empty vector
m = [1 2 3; 4 5 6] % a matrix of 2 rows and 3 columns
m = m’ % transpose of a matrix ( or a vector) with single quote.
m = zeros(2,3) % a matrix of zeros of size 2 rows and 3 columns
v = ones(1,3) % a matrix of ones of size 1 X 3
m = eye3 % identity matrix of size 3 X 3
v = rand(3,1) % matrix of random numbers of size 3 X 1
r = randn(n) %returns an n-by-n matrix containing pseudo-random values drawn from
a normal distribution with mean zero and standard deviation one.
m=magic(3) % displays a magic square of size 3

Accessing elements
v = [23 45 38];
v(3) % access 3rd element of a vector. Subscript starts from 1
Let m = [7 2 8; 5 3 6], try the following commands.
m(1,3) element from first row third column.
m(2,:) all elements of row 2. (a row vector)

8
m(:,1) all rows of column 1 (a column vector)
m(:) lists all elements of the matrix.
A range of subscript can be specified using colon ( : ) operator.
>> a=round(rand(4,3)*10); To generate a random matrix of size 4-by-3.
>>b= a(1:2,2:3); to select a portion of a matrix. Try printing a and b
>>a
>>b

Basic array information

size Size of array. Eg. size(arrayname)


length Length of a vector. It is equivalent
to max(size(x))
ndims Number of dimensions.
numel Number of elements.
disp Display matrix or text.
isempty True for empty array.
isequal True if arrays are numerically equal.
isequalwithequalnans True if arrays are numerically equal.

To delete rows and columns


A(:,2)=[] deletes second column of A matrix.
A(1,:) =[] deletes the first row.
Element by element operations
a= [1 2 3 4]; vector
2*a scalar multiplication
a/4 scalar division
b = [5 6 7 8]; vector
a+b element wise vector addition
a-b element wise vector subtraction
a .ˆ 2 element wise vector squaring (note .)
a .* b element wise vector multiply (note .)
a ./ b element wise vector division (note .)

9
1.4 Mathematical Functions
Selected elementary math functions
Trigonometric
sin Sine. (Argument in radians.)
sind Sine of argument in degrees.
sinh Hyperbolic sine.
asin Inverse sine. (result in radians)
asind Inverse sine, result in degrees.
asinh Inverse hyperbolic sine.
cos Cosine.
tan Tangent.
Exponential
exp Exponential.
log Natural logarithm.
log10 Common (base 10) logarithm.
log2 Base 2 logarithm and dissect floating
point number.
pow2 Base 2 power and scale floating point
number.
sqrt Square root.
nthroot Real n- th root of real numbers.

Complex
abs Absolute value.
angle Phase angle.
complex Construct complex data from real and
imaginary parts.
conj Complex conjugate.
imag Complex imaginary part.
real Complex real part.
isreal True for real array.

10
Rounding and remainder
fix Round towards zero.
floor Round towards minus infinity.
ceil Round towards plus infinity.
round Round towards nearest integer.
mod Modulus (signed remainder after
division).
rem Remainder after division.
sign Signum. Returns 1 if the element is
greater than zero, 0 if it is equal to zero
and -1 if it is less than zero

1.5 Data Analysis and Statistical Functions


Built-in Matlab functions operate on vectors, if a matrix is given, then the function
operates on each column of the matrix

a = [1 4 6 3] Vector

sum(a) sum of vector elements


mean(a) mean of vector elements
var(a) Variance
std(a) standard deviation
min(a) minimum of a
max(a) Maximum

11
a = [1 2 3; 4 5 6] Matrix
sum(a) or sum(a,1) Sum of columns
sum(a,2) Sum of rows
mean(a) mean of each column
max(a) max of each column
max(max(a)) to obtain max of matrix
max(a(:))
diff difference and approximate derivative.
diff(x) is [x2-x1 x3-x2 ... x(n)-x(n-1)] for a vec-
tor x.
diff(x) is the matrix of row differences, [x(2:n,:)
- x(1:n-1,:)] for a matrix x.
prod(a) product of elements
median(a) median values
cumsum(a) cumulative sum of elements
cumprod(a) cumulative product of elements
cov(a) covariance matrix of a
corrcoef(a) correlation coefficient of a
sort(a) sorts in ascending or descending order
minmax (p) minimum and maximum values for each
row of p

1.6 Matrix Manipulation


rot90( A ) Rotates the matrix A 90 degree in a counter clockwise direction
rot90(A,n) Rotates the matrix A n*90 degree in a counter clockwise direction
fliplr(A) Flips the matrix A left-to-right
flipud(A) Flips the matrix A up-to-down.
reshape(A,m,n) Reshapes the matrix A into m rows and n columns. (The number of
elements in the original matrix and in the reshaped matrix must be the same)
diag(A) Extracts the main diagonal elements and store them in a column vector if A
is a matrix. If A is a vector, the function will generate a square matrix with A as the
diagonal.
diag(A,k) Extracts the k-th diagonal elements of A matrix. If A is a vector, places
the elements of A, k places up or down from the main diagonal.
triu Extract upper triangular part. triu(x) is the upper triangular part of x. triu(x,k)
is the elements on and above the k-th diagonal of x. k = 0 is the main diagonal, k >

12
0 is above the main diagonal and k < 0 is below the main diagonal.
tril Extract lower triangular part. tril(x) is the lower triangular part of x. tril(x,k)
is the elements on and below the k-th diagonal of x . k = 0 is the main diagonal, k >
0 is above the main diagonal and k < 0 is below the main diagonal.

The transpose of the matrix A is denoted by A’

B=repmat(A,M,N) creates a large matrix B consisting of an M-by-N tiling of copies


of A. Try the following
>>A=round(rand(3)*10);
>>B=repmat(A,2,2)

More than one matrices of same size can be joined to construct a larger matrix.
>> F=[C;D;E]

dot(A,B) Computes the dot product of A and B matrices.


A=[5,2,-3];
B=[4,5,-2];
C=dot(A,B)
The dot product can also be computed with C=sum(A.*B);
If A is a matrix, A.ˆ2 is the operation that squares each element in the matrix.
If we want to compute A*A, we can use the operation Aˆ2. To perform matrix
multiplication the number of rows in the first matrix and the number of columns
in the second matrix must be same; therefore, to raise a matrix to a power, the matrix
must be a square matrix.
rank(A) computes the rank of the matrix.
inv(A) computes the inverse of the matrix A, if it exists. (If the rank is equal to the
number of rows, an inverse exists).
det(A) computes the determinant of a square matrix A.
eig() computes the eigenvalues and eigen vectors of a matrix.
svd() computes the SVD factorization of a matrix. [U S V]=svd(A), where U and V
are orthogonal matrices and S is a diagonal matrix.
s = sparse(x) converts a sparse or full matrix to sparse form by squeezing out any
zero elements.
a = full(x) converts a sparse matrix s to full storage organization. If x is a full matrix,
it is left unchanged.
nz = nnz(s) is the number of nonzero elements in s.
To swap rows 3 and 5 in a matrix

13
>> a( [3, 5], :) = a([5, 3], :) the 5th and 3rd row in a will become the 3rd and 5th
row of a.
>> A=[8 3 4; 3 5 7; 6 2 5]
>> B= A(:,[3 2 1]) will interchange the columns in the specified order
>>B= A([ 2 3 1],:) will interchange the rows in the specified order
Handling complex numbers
>> a = 3-5i
a = 3.0000 - 5.0000i
>> b = -9+3i
b = -9.0000 + 3.0000i
>> a + b
ans = -6.0000 - 2.0000i
>> a - b
ans = 12.0000 - 8.0000i
>> a*b
ans = -12.0000 +54.0000i
>> a/b
ans = -0.4667 + 0.4000i
The find command lets you find the places in an array where the entries obey a specified
condition. Example.
>> v = [ 0.6, 0.55, 0.3, 0.1, 0.98]
>> index = find ( v > 0.5)
>> a = v(index) the result will be, a = [0.6, 0.55, 0.98]

1.7 Basics of Programming


User defined function
A function file (’m-file’) must begin with a function definition line. In this line
we define the name of the function, the input and the output variables. Type this
example in the editor window, and assign it the name ’temperature’ (’File’ -> ’New’
-> ’M-File’):
This function is named ’temperature.m’. It has one input value ’x’ and two outputs,
’c’ and ’f’. function [c f] = temperature(x)
f = 9*x/5 + 32;
c = (x - 32) * 5/9;

Then, you can run the Matlab function from the command window, like this:

14
>> [cent fahr] = temperature(32)
%—-user defined function for finding mean and standard deviation—
function [m,sd] = stat(x)
n = length(x);
m = sum(x) / n;
sd = sqrt(sum((x - m).ˆ2)/n);

%-Calling a function
>>x=rand(1,100)*50 % generate 100 random data between 0 and 50
>>[a s]=stat(x) %stat is the name of the user defined function
Creating M-Files from Command History
If there is part of your current Matlab session that you would like to put into an M-file,
this is easily done using the Command History window:
1 Open this window by selecting View -> Command History.
2 Use Shift+Click or Ctrl+Click to select the lines you want to use. Matlab
highlights the selected lines.
3 Right click once, and select Create M-File from the menu that appears. Matlab
creates a new Editor window displaying the selected code.
Commenting out a block of code
To comment out a block of text or code within the Matlab editor,
1 Highlight the block of text you would like to comment out.
2 Holding the mouse over the highlighted text, select Text -> Comment (or Un-
comment, to do the reverse) from the toolbar. (You can also get these options by
right-clicking the mouse.)

Matlab control statements


i) IF statement
if expression
statements
[elseif expression
statements]
[else
statements]
end %...end of if
ii) for statement
for variable = initial value : step value : final value
statements
end %...end of for

15
iii) while statement
while expression
statements
end %...end of while
iv) switch statement
switch expression
case constant(s)
statements
case constant(s)
statements
otherwise
statements
end %...end of switch
%more than one constants can be enclosed in braces. eg. case { v1,v2 }
break terminate execution of while or for loop
continue pass control to the next iteration of for or while
return statement can be used to force an early return.(Termination)

Sample programs
1. %....program for demonstrating if ...else/ if . . . elseif. . .
clc
n=input (’Enter any integer number:’);
if mod(n,2)==0
% disp ’ is an even number’
fprintf(’%d is an even number’,n)
else
fprintf(’%d is an odd number’,n)
% disp ,’ is an odd number’
end
if sign(n)==-1
fprintf(’%d is a negative number’,n)
elseif sign(n)>0
fprintf(’%d is a positive number’,n)
else
fprintf(’Number is zero’)
end
2. % program for printing fibonacci series and forms a vector result using
for statement

16
echo off
clc
f1=0;
f2=1;
fprintf(’Generating fibonacci series’)
n=input(’How many no.of terms you want?’)
f1
f2
result=[f1 f2]
for i=1:n
f3=f1+f2 % ; will supress the output
result=[result f3];
f1=f2;
f2=f3;
end
disp(result)

1.8 File Operations


Accessing data from disk files
To generate a matrix of size 5 X 5 randomly
>> x=round(rand(5)*10)
>>save sample.txt x -ascii to save the values in the x variable
>> dir sam*
sample.txt
>> m1=load(’sample.txt’);
m1 will hold the data read from the sample.txt file.

To load data that is in mixed formats, use textread instead of load. The textread
function lets you specify the format of each piece of data. If the first line of file
mydata.dat is
Annamalai 12.34 45
Reading the first line of the file as a free format file using the % format
>> [names, x, y] = textread(’mydata.dat’, ’%s %f %d’, 1) returns
names =
’Annamalai’
x=

17
12.34000000000000
y=
45
Low level file operations
Matlab provides low-level file i/o functions for reading and writing any binary or
formatted ASCII file. These functions resemble their ANSI C programming lan-
guage counterparts. fopen(), fclose(), fread(),fwrite(), fscanf(), fprintf(), fgetl(),fgets(),
sprintf(), sscanf(),ferror(), feof(), fseek(), ftell(), frewind()

Eg. Create a text file called exp.txt containing a short table of the exponential function.
(On Windows platforms, it is recommended that you use fopen with the mode set to
’wt’ to create a text file for writing.)
x = 0:.1:1; y = [x; exp(x)];
fid = fopen(’exp.txt’,’w+’); % To create a file with write mode
fprintf(fid,’%6.2f %12.8f\n’,y);
fclose(fid);
To read the data file
fid = fopen(’exp.txt’);
A = fscanf(fid, ’%g %g’, [2 inf]); %...reads column wise..
fclose(fid);
A = A’; % Transpose so that A matches the orientation of the file
frewind(fid); % Return to the beginning of the file
To display the text file
>>type exp.txt
To read the numeric data we can also use load
>>A=load(‘exp.txt’) %..all rows must contain same no.of data..
To create a file in binary mode
% Create the file and store the 5-by-5 magic square
fid = fopen(’magic5.bin’, ’w’);
fwrite(fid, magic(5));
fclose(fid);
To read a file in binary mode containing unsigned integers
% Read the contents back into an array
fid = fopen(’magic5.bin’);
m5 = fread(fid, [5, 5], ’*uint8’);
fclose(fid);

% Reading a text file

18
fn=input(’Enter the text filename:’,’s’);
fid=fopen(fn);
while 1 %....while true
tline = fgetl(fid);
if ˜ischar(tline), break, end %..commands are separated by ,
disp(tline)
end %....end of while
fclose(fid);

%....reading a text file. . . .. another approach


fn=input(’Enter the text filename:’,’s’);
fid=fopen(fn);
while ˜feof(fid)
tline = fgetl(fid);
disp(tline)
end
fclose(fid);

1.9 Graphs
The plot function has different forms depending on the input arguments. For example,
if y is a vector, plot(y) produces a linear graph of the elements of y versus the index
of the elements of y. If you specify two vectors as arguments, plot(x,y) produces a
graph of y versus x.
Matlab plots the vector on the x -axis and the value of the sine function on the y-axis.
t = 0:pi/100:2*pi;
y1 = sin(t);
y2 = cos(t);
y3 = sin(5*t);
plot(t,y1)
Matlab automatically selects appropriate axis ranges and tick mark locations.
You can assign different line styles to each data set by passing line style identifier
strings to plot. Line styles are useful if you are printing the graph on a black and
white printer. For example,
figure(2) % opens a second figure window
plot(t,y,’-’,t,y2,’–’,t,y3,’:’)
legend(’First’,’Second’,’Third’);
The plot function opens a Figure window. If a Figure window already exists, plot

19
function normally clears the current Figure window and draws a new plot.
>>x=linspace(0,2*pi,30);
>>y=sin(x);
>>plot(x,y), title(‘Figure 1 : Sine Wave ‘)

We can specify colors, markers, and linestyles by giving plot a third argument after
each pair of data arrays. This optional argument is a character string consisting of
one or more characters from the following list.

Symbol Color Markers Line Styles


y yellow . point - solid
m magenta o circle : dotted
c cyan x x-mark -. dashdot
r red + plus – dashed
g green * star
b blue s square
w white d diamond
k black v triangle (down)
ˆ triangle (up)
< triangle (left)
> triangle (right)
p pentagram
h hexagram
. point

hold command holds current graph.


hold on holds the current plot and all axis properties so that subsequent graphing
commands add to the existing graph.
hold off returns to the default mode whereby plot commands erase the previous plots
and reset all axis properties before drawing new plots.

To plot subplots
>> y=fix(rand(3,2)*50+1)
y=
34 24
35 28
37 7
>> subplot(2,2,1)

20
>> bar(y,’Group’)
>> title ’ Group’
>> subplot(2,2,2)
>> bar(y,’Stack’)
>> title ’ Stack’
>> subplot(2,2,3)
>> bar(y,1.5)
>> title ’ Width 1.5’
>> subplot(2,2,4)
>> barh(y,’Stack’)
>> x= round(rand(5)*50);
>> y=sum(x)
y=
88 184 144 249 152
>> explode=[0 0 1 0 1]
explode =
0 0 1 0 1
>> pie(y,explode)
>> pie(y,explode)
>> figure %...........New figure.............
>> pie([2 4 3 5],{’North’,’South’,’East’,’West’})
>> pie([2 4 3 5],[0 0 1 0],{’North’,’South’,’East’,’West’})
>> pie3([2 4 3 5],[0 0 1 0],{’North’,’South’,’East’,’West’})
3-D Plots, Mesh and Surface
Line Plots in 3-D
The 3-D analog of the plot function is plot3. If x, y, and z are three vectors of the
same length, plot3(x,y,z) generates a line in 3-D through the points whose coordinates
are the elements of x, y, and z and then produces a 2-D projection of that line on the
screen. For example these statements produce a helix:
t = 0:pi/50:10*pi;
plot3(sin(t),cos(t),t)
axis square; grid on

Mesh and Surface Plots


The mesh and surf functions create 3-D surface plots of matrix data. If Z is a matrix
for which the elements Z(i,j) define the height of a surface over an underlying (i,j)
grid, then mesh(Z) generates a colored, wire-frame view of the surface and displays it
in a 3-D view. Similarly, surf(Z) generates a colored, faceted view of the surface and

21
displays it in a 3-D view.
[X,Y] = meshgrid([-2:.1:2]);
Z = X.*exp(-X.ˆ2-Y.ˆ2);
plot3(X,Y,Z)

Pie Plots
pie(x) draws pie plots of the data in the vector x. The values in x are normalized
via x/sum(x) to determine the area of each slice of pie. If sum(x) is less or equal to 1,
the values in x directly specify the area of the pie slices. Only a partial pie will be
drawn if sum(x) is less than < 1. The ’%’ sign indicates that there is a comment in
that line and Matlab does not do anything with it. It is as if it were inexistent, and it
exists only for explanatory prposes.
Example
% Clears variables, command window, and closes all figuresclc; clear; close
all% These are the names of the slicesnames = char(’Region 1’, ’Region 2’, ’Distr.
3’, ’Distr. 4’);% These are the numbers to be plotteddata = [1200, 500, 300,
120];pie(data)% gtext(’string’) displays the graph window, puts up a cross-
hair, and waits for a
% mouse button or keyboard key to be pressed. (Check the output )for
i=1:4 gtext(names(i,:));endtitle(’Sales’, ’fontsize’, 15)

3D-Pie graphs

p = [3.3 2.6 .69 .4 .3];pie3(p)title(’Interesting Chart’)


In the expression ’pie3(V, explode, labels)’, ’explode’ specifies whether to separate
a slice from the center of the plot. V(i,j) is apart from the center of the pie plot if
explode(i,j) is nonzero. ’explode’ must be the same size as V . ’labels’ specifies text
labels for the slices. The number of labels must equal the number of elements in V.
Example
d = [1 2 3 2.5];pie3(d, [1 0 1 0],... {’Label 1’, ’Label 2’, ’Label 3’, ’Label 4’})title(’Pie
Chart showing explosions...’)

Graph Annotation
Matlab provides commands to label each axis and place text at arbitrary locations on
the graph. These commands include:
title – adds a title to the graph
xlabel – adds a label to the x -axis
ylabel – adds a label to the y-axis

22
zlabel – adds a label to the z -axis
legend – adds a legend to an existing graph
text – displays a text string at a specified location
gtext – places text on the graph using the mouse

1.10 Accessing Image and Audio Files


%—–Reading and Displaying image files—–
>>kr=imread(‘rajan.jpg’) % to read the image file bmp, jpg etc.
>>imshow(double(kr)) % to display the image
>>hist(double(kr)) % to find and plot histogram
>>histeq(double(x)) % to find and plot histogram equalized image
x=1:size(kr,1); y=1:size(kr,2); z= double(kr(:,:,1));
>>mesh(x,y,z)
>>figure2 %...opens a second figure window
>> contour(x,y,z)
>> contour3(x,y,z)
>> contourf(x,y,z)
>>shading interp;
%—–Reading and Playing Wave files—–
>> Y=wavread(’test.wav’)
>> wavplay(Y)(or) sound(Y)
>> [Y F]=wavread(’test.wav’)
>> wavplay(Y,F)
%.....function to record audio in a given file
function record(fn)
yn=input(’Press ENTER key to record your voice...’)
Y=wavrecord(8000,8000,1);
wavwrite(Y,8000,8,fn);

%.....function to play a given file


function play(fn)
[y f]=wavread(fn);
wavplay(y,f);
%...program to find the descriptive statistics of
%...a given audio data file
fn=input(’Enter the filename:’,’s’);

23
d=load(fn);
fprintf(’Mean = %f\n’,mean(d))
fprintf(’Standard deviation = %f\n’,std(d))
fprintf(’Variance = %f\n’,var(d))
fprintf(’Average power =%f\n’,mean(d.ˆ2))
fprintf(’Average magnitude = %f\n’,mean(abs(d)))
prod=d(1:length(d)-1).*d(2:length(d));
crossing=length(find(prod<0));
fprintf(’ Zero crossings =%.0f \n’,crossing);
subplot(2,1,1),plot(d)
title(’Plot of the data’)
xlabel(’Index’)
grid
%—–Finding Eigen values and Eigen Vectors——-
>> b=[4 3 2; 5 6 7; 8 2 3];
>> [EV Eval]=eig(b);
%—–Creating EXE file from .m file—-
>> mex –setup % to specify the C++/VC++ compiler
>> mcc -B sglcpp filename.m % to create exe using C++ compiler

24
Chapter 2

Basics of Speech Processing

by
S. JOTHILAKSHMI
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

2.1 Signal
Anything which carries information is a signal. Mathematically a signal is a function
of one or more variables. When the function depends on a single variable, the signal
is said to be one dimensional such as speech signal (Fig. 2.1). When the function
depends on two or more variables, the signal is said to be multidimensional. An
image is representing the two dimensional signal, vertical and horizontal coordinates
representing the two dimensions. A video signal is a sequence of images. A point in a
video is identified by its position (two-dimensional) and by the time at which it occurs,
so a video signal has a three-dimensional domain. Processing a signal to extract some
useful information is known as signal processing.

2.2 Speech Signal


Speech is a natural mode of communication among human beings. Speech signal car-
ries information about the message to be conveyed, speaker identity and language
information. For communication among human beings, there is no need for speech
processing, since they are endowed with both speech production and perception mech-
anisms. But, if a machine is placed in the communication chain, it needs speech
processing because it does not have the knowledge of production and perception. All
the information required to perform the basic speech processing tasks is implicitly
present in the speech. The fundamental issue in speech processing is how to extract
specific features to perform the desired speech processing tasks.

25
Fig. 2.1: Speech signal.

2.3 Speech Production Mechanism


Speech is produced by exciting time varying vocal tract system with time varying
excitation. The schematic diagram of the physiology of speech production is shown in
Fig. 2.2.

Fig. 2.2: Physiology of speech production.

26
Speech production mechanism essentially consists of a vibrating source of sound
coupled to a resonating system. For a majority of the sounds produced, the larynx
acts as the vibrating source and the air column from larynx to the lips, referred to as
the vocal tract acts, as the system. The vocal tract system consists of pharynx, oral
cavity and nasal cavity. But to produce some special sounds called nasal sounds, the
nasal tract also plays an important role along with the vocal tract. The nasal tract
begins at the velum and ends at the nostrils. When the velum is lowered, the nasal
tract is acoustically coupled to the vocal tract to produce the nasal sounds of speech.
But it is a known fact that no sound can be produced without a supply of force or
energy. It is the breathing mechanism, consisting of the lungs and muscles of the chest
and abdomen, that constitutes the energy supply. By the use of laryngeal muscles the
vocal cords can be brought together so as to form a shelf across the air way, which leads
from the lungs into the trachea. While the edges of cords are held together, pressure
on the underside of the shelf rises. When it reaches a certain level, it is sufficient to
overcome the resistance offered by the obstruction, and so the vocal cords open. The
ligaments and muscle fibers that make up the vocal cords have a degree of elasticity,
and having been forced out of position. They tend to return as rapidly as possible to
their initial position. The pressure rises again and the cycle of opening and closing
repeated. The major excitation for speech production is due to periodic vibration of
vocal folds. This is also known as voiced excitation, since all vowels are produced with
this excitation. Other excitations are due to either complete or narrow constriction at
different places in the vocal tract system. Vocal tract system produces different sound
units in response to different excitations by assuming different shapes.

2.4 Digitizing Speech Signals


For processing speech signal, the acoustic variations (pressure variations) are to be
represented in digital domain. This can be done as follows: A microphone is used
to pickup these acoustic variations (air pressures) and convert them into equivalent
analog electrical variations. This analog electrical signal is converted to digital signal
using an analog to digital converter, which consists of sampler followed by quantizer.
Speech signal can be recorded in windows using
Start⇒Programs⇒Accessories⇒Entertainment⇒Sound recorder

2.4.1 Sampling
Sampling takes a snapshot of the input signal at an instant of time. Sampling process
converts an analog signal x(t) into a corresponding sequence of samples x(n) that are

27
spaced uniformly in time. It is convenient to represent the sampling operation by a
switch. The switch closes for a very short interval of time, during which the signal
presents at the output. The time interval between successive samples is ts seconds,
which is called as sampling period and the sampling frequency is given by fs =
1
ts
Hz.
Thus using sampling process, if we sample the signal x(t) at a uniform rate for
example once in every ts seconds, then we obtain the sequence of samples spaced ts
seconds apart. When we produce the sequence x(n) by sampling x(t), we want to
ensure that all the information in the original signal is preserved in the samples. If
there is no information loss, then we can exactly recover the continuous time signal
from the samples. Usually the sampling rate must be selected properly, so that the
sequence of samples uniquely defines the original analog signal. The sampling theorem
states the condition for selecting the sampling rate.
The sampling theorem states that the original analog signal can be reconstructed
from the sequence of samples with minimal distortion if the sampling rate fs is greater
than twice the highest frequency present in the signal.
fs ≥ 2fmax (2.1)
where fmax - Highest frequency in the signal. The sampling rate fs = 2fmax is
known as the Nyquist rate.
The sampling rate must be always greater than the Nyquist rate in order to avoid
frequency aliasing. The superimposition of high frequency component on the low
frequency is known as frequency aliasing. The highest frequency present in speech
signal is approximately 4 KHz, so the sampling rate fs ≥ 8 KHz.

2.4.2 Quantization
The sampled analog signal must be converted from a voltage value to a binary number
that the computer can read. The conversion from an infinitely precise amplitude to
a binary number is called quantization. During quantization, the A/D converter
uses a finite number of evenly spaced values to represent the analog signal. The
number of different values is determined by the number of bits used for the conversion.
Typically, the converter selects the digital value that is closest to the actual sampled
value. Fig. 2.3 depicts the sampling and quantization process.

2.5 Phoneme
The basic sounds of a language, for example, ”a” in the word ”father,” are called
phonemes. A phoneme is a member of the set of the smallest units of speech that

28
Fig. 2.3: Sampling and quantization processes

serve to distinguish one utterance from another in a language. A unit of speech is


considered a phoneme if replacing it in a word results in a change of meaning. Here
are some examples of phonemes:

• pin becomes bin


• bat becomes rat
• cot becomes cut

29
Each language has its own phonemes. For example there are 42 phonemes in American
English [5].

2.6 Syllable
A word can be divided into syllables. Each syllable is a sound that can be said without
interruption and are usually a vowel which can have consonants before and/or after
it. A syllable is a unit of organization for a sequence of speech sounds. For example,
the word water is composed of two syllables: wa and ter. Syllable has a vowel at its
core, or center. This part of the syllable is known as the nucleus. It may also have
an onset (one or more consonants at the beginning) and/or a coda (one or more
consonants at the end). In this way, a minimal syllable would consist of a single vowel
(such as the word I); and other combinations would be as follows:

• onset + nucleus - tea


• nucleus + coda - eat
• onset - nucleus + coda - food

2.7 Speech Sounds Categories


Speech sounds are broadly classified into the three distinct classes as follows

1. Periodic (Sonorants, Voiced) - created by vibrating vocal cords periodically.


2. Noisy (Fricatives, Un-Voiced) - air moving quickly through a small hole.
3. Impulsive (Plosive) - pressure built up behind a blockage is suddenly released.

Example: In the word shop, the sh, o, and p are unvoiced, voiced and plosive
sounds, respectively.

2.8 Pitch
In vowel production, air forced from the lungs flows through the vocal cords, which are
two masses of flesh, causing periodic vibration of the cords whose rate gives the pitch
of the sound. The pitch is the fundamental frequency of the vocal cords vibration (also
called F0) followed by 4-5 formants (F1-F5) at higher frequencies. The typical values
of pitch for

• male: 85-155 Hz;

30
• female: 165-255 Hz;

Pitch period (fundamental period) is the reciprocal of the pitch. Average values for
the pitch period are around 8 ms for male speakers, and 4 ms for female speakers.

2.9 Formant Frequencies


The resonance frequencies of the vocal tract tube are called formant frequencies or
simply formants. The formant frequencies depend upon the shape and dimensions of
the vocal tract. Each shape is characterized by a set of formants. Different sounds
are formed by varying the shape of the vocal tract. Formants in the sound of the
human voice are particularly important because they are essential components in the
intelligibility of speech. For example, the distinguishability of the vowel sounds can
be attributed to the differences in their first three formant frequencies.

Fig. 2.4: Formant frequencies of speech sounds ”oh” and ”ee”

The ideal frequency response of the vocal tract as it produces the sounds ”oh” and
”ee” are shown in Fig. 2.4 on the top left and top right, respectively. The spectral peaks
are known as formants, and are numbered consecutively from low to high frequency.
The bottom plots show speech waveforms corresponding to these sounds.

2.10 Fast Fourier Transform


Fourier analysis is extremely useful for data analysis, as it breaks down a signal into
constituent sinusoids of different frequencies. For sampled vector data, Fourier analysis
is performed using the discrete Fourier transform (DFT). The Fast Fourier trans-
form (FFT) [6] is an efficient algorithm for computing the DFT of a sequence; it is
not a separate transform. It is particularly useful in areas such as signal and image

31
processing, where its uses range from filtering, convolution, and frequency analysis to
power spectrum estimation.
For the N input sequence x, the DFT is a vector X of length N . FFT and IFFT
implement the relationships

N
X −1
x(n) exp−jk( N )n , 0 ≤ k ≤ N − 1

X(k) = (2.2)
n=0

N −1
1 X
X(k) expjk( N )n , 0 ≤ n ≤ N − 1

x(n) = (2.3)
N k=0
Fig. 2.5 describes the computation of FFT of the signal x which is the combination
of two sinusoids 50 Hz and 200 Hz and IFFT. The corresponding Matlab code is given
below.

Fig. 2.5: Principle of FFT and IFFT

32
Matlab code for FFT and IFFT

t = 0:0.001:0.1;
x1 = s i n ( 2 ∗ p i ∗50∗ t ) ;
s u b p l o t ( 3 , 2 , 1 ) p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x1 ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ s i g n a l x1=s i n ( 2∗ p i ∗50∗ t ) ’ ) ;

x2 = 0 .5 ∗ s i n ( 2 ∗ p i ∗200∗ t ) ;
s u b p l o t ( 3 , 2 , 2 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x2 ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ s i g n a l x2 =0.5∗ s i n ( 2 ∗ p i ∗200∗ t ) ’ ) ;

x=x1+x2 ;
s u b p l o t ( 3 , 2 , 3 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ s i g n a l x=x1+x2 ’ ) ;

Y = f f t (x ,256);
m = abs (Y) ;
f = 1000∗(1:128)/256;
subplot ( 3 , 2 , 4 ) ;
p l o t ( f ,m( 1 : 1 2 8 ) )
x l a b e l ( ’ Frequency i n Hz ’ ) ; y l a b e l ( ’ Magnitude ’ ) ;
t i t l e ( ’ FFT o f s i g n a l x ’ ) ;

y = i f f t (Y)
s u b p l o t ( 3 , 2 , 5 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , y ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ IFFT s i g n a l ’ ) ;

2.11 Discrete Cosine Transform


A discrete cosine transform (DCT) expresses a sequence of finite data points in terms
of a sum of cosine functions oscillating at different frequencies. The DCT is a mathe-
matical operation that transform a set of data, which is sampled at a given sampling
rate, to its frequency components. The number of samples should be finite, and power
of two for optimal computation time. The DCT is closely related to the discrete

33
Fourier transform. The DCT, however, has better energy compaction properties, with
just a few of the transform coefficients representing the majority of the energy in the
sequence. The energy compaction properties of the DCT make it useful in applications
requiring data reduction.
The DCT of an input sequence x is
N −1  
X (2n + 1)kπ
X(k) = α(k) x(n) cos ;0 ≤ k ≤ N −1 (2.4)
n=0
2N

where
 q
1

N
, if k = 0
α(k) = q (2.5)
2

N
, if 0 ≤ k ≤ N − 1

A one-dimensional DCT converts an array of numbers, which represent signal


amplitudes at various points in time, into another array of numbers, each of which
represents the amplitude of a certain frequency components from the original array.
The resultant array contains the same number of values as the original array. The first
element in the resultant array is a simple average of all the samples in the input array
and is referred to as DC coefficient. The remaining elements in the resultant array
indicate the amplitude of a specific frequency component of the input array, and are
known as AC coefficients. The frequency content of the sample set at each frequency
is calculated by taking a weighted average of the entire set as shown in Fig. 2.7. These
weight coefficients is like a cosine wave, whose frequency is proportional to the resultant
array index as shown in Fig 2.6.
Inverse Discrete Cosine Transform (IDCT)
IDCT computes the inverse DCT for an input sequence, which is reconstructing a
signal from a complete or partial set of DCT coefficients. The inverse discrete cosine
transform is
N −1  
X (2n + 1)kπ
x(n) = α(k)X(k) cos ;0 ≤ n ≤ N − 1 (2.6)
k=0
2N

The Matlab code for computing DCT and IDCT is given below.

Matlab code for DCT and IDCT

t = 0:0.001:0.3;
x1 = s i n ( 2 ∗ p i ∗50∗ t ) ;
s u b p l o t ( 4 , 2 , 1 ) p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x1 ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;

34
Fig. 2.6: Graphical representation of DCT weights

t i t l e ( ’ s i g n a l x1=s i n ( 2∗ p i ∗50∗ t ) ’ ) ;

x2 = 0 .5 ∗ s i n ( 2 ∗ p i ∗200∗ t ) ;
s u b p l o t ( 4 , 2 , 2 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x2 ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ s i g n a l x2 =0.5∗ s i n ( 2 ∗ p i ∗200∗ t ) ’ ) ;

x=x1+x2 ;
s u b p l o t ( 4 , 2 , 3 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , x ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ; t i t l e ( ’ s i g n a l x=x1+x2 ’ ) ;

Y = dct ( x ) ; s u b p l o t ( 4 , 2 , 4 ) ; p l o t (Y)
x l a b e l ( ’DCT c o e f f i c i e n t Index ’ ) ; y l a b e l ( ’ Magnitude ’ ) ;
t i t l e ( ’DCT o f s i g n a l x ’ ) ;

35
Fig. 2.7: Computation of DCT

−−−−−−−−−%I d c t s i g n a l with co mplet e s e t o f c o e f f i c i e n t s−−−−−−−−−−


y = i d c t (Y ) ; s u b p l o t ( 4 , 2 , 5 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , y ( 1 : 1 0 0 ) )
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ; t i t l e ( ’ IDCT s i g n a l ’ ) ;

−−−−−−−−−%I d c t s i g n a l with p a r t i a l s e t o f c o e f f i c i e n t s−−−−−−−−−−−


y2 = f i n d ( abs (Y) < 0 . 5 ) ;
Y( y2 ) = z e r o s ( s i z e ( y2 ) ) ;
s u b p l o t ( 4 , 2 , 6 ) ; p l o t (Y)
x l a b e l ( ’DCT c o e f f i c i e n t Index ’ ) ; y l a b e l ( ’ Magnitude ’ ) ;
t i t l e ( ’DCT o f s i g n a l x with p a r t i a l s e t o f c o e f f i c i e n t s ’ ) ;

z = i d c t (Y) ;
s u b p l o t ( 4 , 2 , 7 ) ; p l o t ( 1 0 0 0∗ t ( 1 : 1 0 0 ) , z ( 1 : 1 0 0 ) )

36
x l a b e l ( ’ Time i n ms ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ I d c t s i g n a l with p a r t i a l s e t o f c o e f f i c i e n t s ’ ) ;
Because of the energy compaction mentioned above, it is possible to reconstruct a
signal from only a fraction of its DCT coefficients. For example, generate a sinusoidal
sequence. Compute the DCT of this sequence and reconstruct the signal using only
those components with the absolute value greater than 0.5. Plot of the original and
reconstructed sequences. It will be as shown in Fig. 2.8. The reconstructed signal
retains approximately 85 percent of the energy in the original signal.

Fig. 2.8: Energy compaction of DCT

2.12 Convolution
Convolution is a mathematical way of combining two signals to form a third signal.
Convolution is a formal mathematical operation, just as multiplication and addition.
Addition takes two numbers and produces a third number, while convolution takes two
signals and produces a third signal. Convolution can be understood in two separate
ways. The first looks at convolution from the viewpoint of the input signal. This
involves analyzing how each sample in the input signal contributes to many points
in the output signal. The second way looks at convolution from the viewpoint of

37
the output signal. This examines how each sample in the output signal has received
information from many points in the input signal.
The convolution is the same operation as multiplying the polynomials whose coef-
ficients are the elements of two signals. The convolution of two signals u(n) and v(n)
is given by
X∞
w(k) = u(j)v(k + 1 − j) (2.7)
j=−∞

The output sample is simply a sum of products involving simple arithmetic opera-
tions such as additions, multiplications and delays. But practically u(n) and v(n) are
finite in length. If the lengths of the two sequences being convolved are m and n, then
the resulting sequence after convolution is of length m+n-1 and is given by
min(k,m)
X
w(k) = u(j)v(k + 1 − j) (2.8)
j=max(1,k+1−n)

When m = n, this gives


w(1) = u(1)v(1)
w(2) = u(1)v(2) + u(2)v(1)
w(3) = u(1)v(3) + u(2)v(2) + u(3)v(1)
..
.
w(n) = u(1)v(n) + u(2)v(n − 1) + ... + u(n)v(1)
..
.
w(2n − 1) = u(n)v(n)
The convolution theorem says, roughly, that convolving two sequences is the same
as multiplying their Fourier transforms. So it can be simply implemented as follows

convolution(u, v) = IF F T (UV ) (2.9)

where U and V are the FFT of u and v respectively. The Matlab code using this
equation and the Matlab function directly is given below.

Matlab code for convolution

u =[−2 0 1 −1];
v =[1 2 0 −1];

−−−−−−−−−% Using t he e q u a t i o n−−−−−−−−


U = f f t ( [ u z e r o s ( 1 , l e n g t h ( v ) − 1 ) ] ) ; %z e r o padding
V = f f t ( [ v z e r o s ( 1 , l e n g t h ( u ) − 1 ) ] ) ; %z e r o padding

38
w1 = round ( i f f t (U. ∗V) )

−−−−−−% Using Matlab f u n c t i o n d i r e c t l y−−−−−−


w = conv ( u , v )

OUTPUT:
w1 =−2 −4 1 3 −2 −1 1
w = −2 −4 1 3 −2 −1 1

2.13 Correlation
Correlation is basically used to compare two signals. Correlation measures the simi-
larity between two signals. It is divided into two types namely cross correlation and
auto correlation. Cross correlation between a pair of signals x(n) and y(n) is given
by

X
rxy (l) = x(n)y(n − l), l = 0, ±1, ±2, . . . (2.10)
n=−∞

The parameter l called lag, indicates the time-shift between the pair. The time
sequence y(n) is said to be shifted by l samples with respect to the reference sequence
x(n) to the right for the positive values of l, and shifted by l samples to the left for
negative values of l.
The correlation process is essentially the convolution of two sequences in which
one of the sequence has been reversed. The Matlab code is given below.

Matlab code for correlation

u=[1 2 1 1];
v=[1 1 2 1];
r=conv ( u , f l i p l r ( v ) )

OUTPUT:
r =

1 4 6 6 5 2 1

−−−−−−−−−−−−−−−−−%Using Matlab f u n c t i o n d i r e c t l y−−−−−−−−−−−


u=[1 2 1 1];

39
v=[1 1 2 1];
r 1=x c o r r ( u , v )

OUTPUT:
r1 =

1.0000 4.0000 6.0000 6.0000 5.0000 2.0000 1.0000


When the same signal is used for x(n) and y(n), cross correlation becomes an
autocorrelation.

Matlab code for autocorrelation

u=[1 2 1 1];
auto=x c o r r ( u )

OUTPUT:
auto
1.0000 3.0000 5.0000 7.0000 5.0000 3.0000 1.0000

2.14 Low Pass Filter


A digital filter is just a filter that operates on digital signals. It is a computation
which takes one sequence of numbers (the input signal) and produces a new sequence
of numbers (the filtered output signal). A digital filter is only a formula for obtaining
one digital signal from another. It may exist as an equation on paper, as a function
in a program, or as a handful of integrated circuit chips properly interconnected.
A low-pass filter is one which does not affect low frequencies and rejects high
frequencies. The spectrum of a signal gives the distribution of signal energy as a
function of frequency. The ratio of the peak output amplitude to the peak input
amplitude is the filter gain at that frequency. The phase of the output signal minus
the phase of the input signal is the phase response of the filter at that frequency.
The function giving the gain of a filter at every frequency is called the amplitude
response (or magnitude frequency response).
The signal components are eliminated (“stopped”) at all frequencies above the
cut-off frequency known as stop band, while lower-frequency components are “passed”
unchanged to the output known as pass band. The amplitude response of the ideal
lowpass filter is shown in Fig. 2.9. Its gain is 1 in the passband, which spans frequen-
cies from 0 Hz to the cut-off frequency fc Hz, and its gain is 0 in the stopband (all

40
Fig. 2.9: Amplitude response of LPF

frequencies above fc ). The output spectrum is obtained by multiplying the input spec-
trum by the amplitude response of the filter. The simplest lowpass filter is specified
by
y(n) = x(n) + x(n − 1), n = 1, 2, . . . , N (2.11)
This simple filter is a special case of an important class of filters called linear time-
invariant (LTI) filters. LTI filters are guaranteed to produce a sinusoid in response to
a sinusoid–and at the same frequency. LTI filters are important in audio engineering
because they are the only filters that preserve signal frequencies. The basic concept of
LPF is illustrated in Fig. 2.10.
In Matlab, there is a built-in function, which will implement the simple low pass
filter. The syntax is y = filter (A, B, x) where
x is the input signal (a vector of any length)
y is the output signal (returned equal in length to x)
A is a vector of filter feedforward coefficients (numerator coefficients of equation 2.11)
B is a vector of filter feedback coefficients (denominator coefficients of equation 2.11)

The simplest lowpass filter given in equation 2.11 is nonrecursive (no feedback), so
the feedback coefficient vector B is set to 1. The Matlab code is given below. Different
methods are available for designing LPFs, only the simple design is mentioned here.
LPF is basic form of filter, any other filter can be derived easily from this.
Matlab code for low pass filter

x1 = s i n ( 2 ∗ p i ∗50∗ t ) ;
x2= s i n ( 2 ∗ p i ∗450∗ t ) ;
x=x1+x2 ;
A = [1,1];
B = 1;
y = f i l t e r (A,B , x ) ;

41
Fig. 2.10: Low pass filtering of a signal

2.15 High Pass Filter


A high pass filter is just opposite to low pass filter. A high-pass filter is one which
does not affect low frequencies and rejects high frequencies. So the amplitude response
is just opposite to the response of LPF. It passes the frequencies above the cut-off
frequency and rejects the frequencies below the cut-off frequency. So any low pass
filter can be easily converted into high pass filter. The simple LPF mentioned in the
previous section can be converted into HPF as follows

y(n) = x(n) − x(n − 1), n = 1, 2, . . . , N (2.12)

42
The basic concept of HPF is illustrated in Fig. 2.11. The Matlab code is given below.

Matlab code for high pass filter

x1 = s i n ( 2 ∗ p i ∗50∗ t ) ;
x2= s i n ( 2 ∗ p i ∗450∗ t ) ;
x=x1+x2 ;
A = [1,−1];
B = 1;
y = f i l t e r (A,B , x ) ;

2.16 Spectrogram
A spectrogram is an image that shows how the spectral density of a signal varies
with time. It converts a two dimensional speech waveform(amplitude/time) into a
three dimensional pattern (amplitude/frequency/time). With time and frequency on
the horizontal and vertical axes, respectively, amplitude is noted by the darkness of
the display. The spectrogram is the magnitude of the windowed discrete-time Fourier
transform of a signal using a sliding window. A shorter window provides a wideband
spectrogram while a larger window results in a narrowband spectrogram [7]. The
Matlab code to display spectrogram is given below and the corresponding spectrogram
is given in Fig. 2.12.

Matlab code for spectrogram

s=wavread ( ’ wavsep1 . wav ’ ) ;


s1=s ( 1 : 8 0 0 0 ) ;
specgram ( s1 , 2 4 , 8 0 0 0 ) %wideband spect r o g r a m
specgram ( s1 , [ ] , 8 0 0 0 ) %narrowband spect r o g r a m

2.17 Features of Speech Signal


In speech analysis, we wish to extract features directly pertinent for different appli-
cations, while suppressing redundant aspects of the speech. The original signal may
approach optimality from the point of view of human perception. But it has much
repetitive data when processed, by computer; eliminating such redundancy aids accu-
racy in computer applications.
Speech is dynamic or time-varying: some variation is under speaker control, but
much is random. Speech analysis usually assumes that the signal properties change

43
Fig. 2.11: High pass filtering of a signal

relatively slowly with time. This allows examination of a short-time window of speech
to extract parameters presumed to remain fixed for the duration of the window. Most
techniques yield parameters averaged over the course of the time window. Thus, to
model dynamic parameters, the signal must be divided into successive windows or
analysis frames, so that the parameters can be calculated often enough to follow
relevant changes.

44
Fig. 2.12: Spectrogram of a speech signal

2.17.1 Preprocessing
To extract the features from the speech signal, the signal must be preprocessed and
divided into successive windows or analysis frames [8]. So the following steps are
performed before extracting the features.

• Preemphasis: The higher frequencies of the speech signal are generally weak.
As a result there may not be high frequency energy present to extract features at
the upper end of the frequency range. Preemphasis is used to boost the energy
of the high frequency signals. The output of the preemphasis, ŝ(n) is related to
the input s(n) by the difference equation

ŝ(n) = s(n) − αs(n − 1) (2.13)

The typical value for α is 0.95.


• Frame blocking : Speech analysis usually assumes that the signal properties
change relatively slowly with time. This allows examination of a short time

45
window of speech to extract parameters presumed to remain fixed for the du-
ration of the window. Thus to model dynamic parameters, we must divide the
signal into successive windows or analysis frames, so that the parameters can
be calculated often enough to follow the relevant changes. The preemphasized
speech signal, ŝ(n) is blocked into frames of N samples (frame size), with ad-
jacent frames being separated by M samples (frame shift). If we denote the lth
frame of speech by xl (n), and there are L frames within the entire speech signal,
then

xl (n) = ŝ(Ml + n), 0 ≤ n ≤ N − 1, 0 ≤ l ≤ N − 1 (2.14)

• Windowing: The next step in the processing is to window each individual


frame so as to minimize the signal discontinuities at the beginning and end of
the frame. The window must be selected to taper the signal to zero at the
beginning and end of each frame. If we define the window as w(n), 0 ≤ n ≤
N-1, then the result of windowing the signal is

x̃l (n) = xl (n)w(n), 0 ≤ n ≤ N − 1 (2.15)

The Hamming window is commonly used, which has the form


 
2πn
w(n) = 0.54 − 0.46 cos ,0 ≤ n ≤ N − 1 (2.16)
N −1

The Matlab code1 is given below. The corresponding waveforms are shown in
Fig. 2.13.

Matlab code for preprocessing

s=wavread ( ’ wavsep1 . wav ’ ) ;


nos=s i z e ( s ) ; % To t a l no . o f sa mples
f s =8000; % sa mpling r a t e
m = 64; % Frame s h i f t
n = 1 2 8 ; % Frame s i z e
subplot ( 5 , 1 , 1 )
plot ( s (832:960))
t i t l e ( ’ s i g n a l sa mples ’ ) ;
1
If matrix size mismatches while changing the frame size and frame shift, change the name of the
arrays.

46
−−−−−−−−−−−−−−−−−−−−%Preemphasis−−−−−−−−−−−−−−−−−−−−−−
f o r k = 2 : nos
p ( k)= s ( k )−0.95∗ s ( k −1);
end
subplot ( 5 , 1 , 2 )
plot (p(832:960))
t i t l e ( ’ A f t e r pr eempha sis ’ ) ;
−−−−−−−−−−−−%C a l c u l a t i n g t o t a l no . o f f r a mes−−−−−−−−−−
n o f = f l o o r ( ( nos ( 1 , 1 ) − n ) / m) + 1 ;
−−−−−−−−−−−−−−−%Frame b l o c k i n g−−−−−−−−−−−−−−−−−−−−−−−−
for i = 1:n
f o r j = 1: nof
f r ( i , j ) = p ( ( ( j − 1 ) ∗ m) + i ) ;
end
end
subplot ( 5 , 1 , 3 )
plot ( fr ( : , 1 3 ) )
t i t l e ( ’A frame o f s p e e c h s i g n a l ’ ) ;
−−−−−−−−−−−−−−−−−−−−−%Windowing−−−−−−−−−−−−−−−−−−−−−−
h = hamming ( n ) ;
subplot ( 5 , 1 , 4 )
plot (h)
t i t l e ( ’ Hamming window ’ ) ;
h f r = d i a g ( h )∗ f r ;
subplot ( 5 , 1 , 5 )
plot ( hfr ( : , 1 3 ) )
x l a b e l ( ’ Frames ’ ) ; y l a b e l ( ’ Amplitude ’ ) ;
t i t l e ( ’ Hamming windowed frame ’ ) ;

2.17.2 Short Time Energy


The energy of a discrete time signal x is defined as

X
E= x2 (m) (2.17)
m=−∞

It gives little information about the time dependent properties of the speech signals.
In speech processing the energy of each analysis frame is important instead of the
complete signal energy. This is known as short time energy. The short time energy

47
Fig. 2.13: Preprocessing of a speech frame

of lth frame is simply the sum of squares of all the samples in the frame.

N
X −1
El = (x̃l (n))2 (2.18)
n=0

where x̃l (n) is the windowed analysis frame obtained at the end of preprocessing
step. The Matlab code for computing energy is given below and the energy of a speech
signal is given in Fig. 2.14.

Matlab code for short time energy

−−−−−−−−−−−%A f t e r a p p l y i n g p r e p r o c e s s i n g use t h i s code−−−−−−−−−−−−


en = sum ( h f r . ˆ 2 ) ;
p l o t ( en ) ;

48
Fig. 2.14: Short time energy of a speech signal

2.17.3 Short Time Average Zero crossing Rate


A zero crossing rate (ZCR) is said to occur if successive samples have different
algebraic signs. The rate at which zero crossing occurs is a simple measure of the
frequency content of a signal. Thus the average zero crossing rate gives a reasonable
way to estimate the frequency content of a speech wave. Rough estimates of spectral
properties can be obtained using a representation based on short term average zero
crossing rate.

N −1
X 1
Zl = |sgn(x̃l (n)) − sgn(x̃l (n + 1))| (2.19)
n=0
2
where
(
1, if x(n) ≥ 0
sgn(x̃(n)) = (2.20)
−1, if x(n) < 0

All that it required is to check samples in pairs to determine where the zero crossing
occur and then the average is computed over N consecutive samples. High frequencies

49
imply high zero crossing rates, and low frequencies imply low zero crossing rates. The
Matlab code is given below and the ZCR of a speech signal is given in Fig. 2.15.

Matlab code for ZCR

A f t e r a p p l y i n g p r e p r o c e s s i n g use t h i s code
f o r j =1: n o f
f o r i =1:n
i f h f r ( i , j )>= 0
sgn ( i , j )=1;
e l s e i f h f r ( i , j )< 0
sgn ( i , j )=−1;
end
end
end
f o r j =1: n o f
z c r ( j )=0;
f o r i =1:n−1
d i f f =sgn ( i , j )−sgn ( i +1, j ) ;
z c r ( j )= z c r ( j )+abs ( d i f f ) / 2 ;
end
z c r ( j )= z c r ( j ) / ( n −1);
end
plot ( zcr )
x l a b e l ( ’ Frames ’ ) ; y l a b e l ( ’ ZCR ’ ) ;
t i t l e ( ’ Sho r t time a v e r a g e ZCR ’ ) ;

OUTPUT: ( f o r an 8 sa mples frame )


0 . 0 0 5 9 0 . 0 0 1 5 −0 .0 6 1 9 −0 .0 5 8 9 −0 .0 8 1 9 0.0170 −0.0249 0.0157
ZCR
4
a v e r a g e ZCR
0.57

50
Fig. 2.15: Short time average ZCR of a speech signal

2.17.4 Short Time Autocorrelation


The autocorrelation function is a special case of the cross-correlation function [9],

X
φsy (k) = s(m)y(m − k) (2.21)
m=−∞

Which measures the similarity of two signals s(n) and y(n) as a function of the time
delay between them. By summing the products of a signal sample and a delayed sample
from another signal, the cross-correlation is large if at some delay the two signals have
similar waveforms. The range of summation is usually limited (i.e., windowed), and
the function can be normalized.
The short-time autocorrelation function is obtained by windowing s(n). The au-
tocorrelation function is a special case of the cross-correlation function,

X
Rn (k) = s(m)w(n − m)s(m − k)w(n − m + k) (2.22)
m=−∞

Different methods are followed to compute autocorrelation, the simplest method

51
is using FFT.

Autocorrelation of a frame of signal s=IFFT(SS∗)

where S= FFT of the signal s; S ∗ =Complex conjugate of S.


The Matlab code for autocorrelation using the Matlab function is given below2 .

Matlab code for short time autocorrelation

A f t e r a p p l y i n g p r e p r o c e s s i n g use t h i s code
ac = x c o r r ( s1 4 ) ;

OUTPUT
5 sa mples frame
0.2628 0.4238 0.3942 0.2362 0.1101
A u t o c o r r e l a t i o n o f t he frame
0.0289 0.1087 0.2471 0.3976 0.4720 0.3976 0.2471 0.1087 0.0289
In Fig. 2.16 the property of the short-time autocorrelation to reveal periodicity
in a signal is demonstrated. Notice how the autocorrelation of the voiced speech
segment retains the periodicity. On the other hand, the autocorrelation of the unvoiced
speech segment looks like a noise. In general, autocorrelation is considered as a robust
indicator of periodicity.

2.17.5 Pitch Period Computation


Pitch period can be calculated as follows:

• For each frame of the signal calculate autocorrelation function


• Convert it to a binary signal such that binary signal being set to logical ”1”
where the autocorrelation exceeds a pre-selected threshold and to logical ”0”
where the autocorrelation does not exceed the pre-selected threshold
• Calculate autocorrelation function of the binary signal
• Detect peaks in the autocorrelation function of the binary signal
• Use distance between peaks in the autocorrelation function of the binary signal
as an estimate of the pitch
The Matlab code is given below

Matlab code for pitch period estimation


2
For N length sequence 2N-1 length autocorrelation will be generated by xcorr function of Matlab.

52
Fig. 2.16: Short time autocorrelation

[ s1 s ]= wavread ( ’ wavsep1 . wav ’ ) ;


x = s1 ( 1 2 0 0 : 1 4 0 0 ) ; %use v o i c e d p i t c h
a=x c o r r ( x ) ;
ns=s i z e ( a ) ;
ma=max( a ) ;
t =0.75∗ma ;

f o r i =1: ns
i f ( a ( i )> t )
a ( i )=1;
else
a ( i )=0;
end
end

a1=x c o r r ( a ) ;

53
n=s i z e ( a1 ) ;
pk=s o r t ( a1 , ’ descend ’ ) ;
ma1=max( pk ) ;
f g =0;

f o r i =1:n
i f ( ( a1 ( i )==pk ( 1 ) ) & ( f g ==0))
p1=i
f g =1;
end
end

f g 1 =0;
f o r i=p1 +5:p1+50
i f ( ( a1 ( i )==pk ( 2 ) ) & ( f g 1 ==0))
p2=i
f g 1 =1;
end
end

i f ( f g 1 ==0)
f o r i=pk1 −5: pk1−50
i f ( ( a1 ( i )==pk ( 2 ) ) & ( f g 1 ==0))
p2=i
f g 1 =1;
end
end
end
pp=abs ( p2−p1 ) ∗ ( 1 / s ) %p i t c h p e r i o d
p f =1/pp %p i t c h f r e q u e n c y

OUTPUT:
p1 =
401
p2 =
433
pp =

54
0.0040
pf =
250

2.17.6 Linear Prediction Coefficients


A given speech sample at time n, s(n), can be approximated as a linear combination
of the past p speech samples, such that

s(n) ≈ a1 s(n − 1) + a2 s(n − 2) + a3 s(n − 3) + · · · + ap s(n − p) (2.23)

Fig. 2.17: Block diagram of LPC computation

where the coefficients a1 , a2 , · · · , ap are assumed constants over the analysis frame.
The steps for computing LPC is illustrated in Fig. 2.17. All other steps except LPC
analysis have been already explained. After obtaining the autocorrelation of a win-
dowed frame, the linear prediction coefficients are obtained using Levinson-Durbin
recursive algorithm. This is known as LPC analysis. The Matlab code for obtaining
LPC is given below3 .
Matlab code for LPC

A f t e r a p p l y i n g p r e p r o c e s s i n g use t h i s code
a = lpc ( hfr ( : , 5 ) , 8 ) ;

OUTPUT
LPC o f 5 th frame
1.0000 −0.3205 −0.3074 −0.3699 −0.2327 −0.2199 0.2364 0.1110 0.1094
3
For pth order linear prediction p+1 coefficients will be generated

55
2.17.7 Linear Prediction Cepstral Coefficients
The cepstrum is a common transform used to gain information from a person’s speech
signal. It can be used to separate the excitation signal (which contains the words and
the pitch) and the transfer function (which contains the voice quality). The cepstrum
can be seen as information about rate of change in the different spectrum bands.
The cepstral coefficients are the coefficients of the Fourier transform representation
of the logarithm magnitude spectrum. Cepstral coefficients of a sequence x are the
coefficients of the inverse discrete Fourier transform (IDFT) of the log magnitude
short-time spectrum

IDF T (log(|DF T (x)|)) (2.24)


If x is LPC, the cepstral coefficients are known as Linear Prediction Cepstral
Coefficients(LPCC). The LPC parameter conversion block in Fig. 2.17 will convert
LPC to LPCC. The Matlab code is given below

Matlab code for LPCC

%A f t e r G et t ing LPC use t h i s code


%Using t he e q u a t i o n
y = r e a l ( i f f t ( l o g ( abs ( f f t ( a ) ) ) ) ) ; % This i s f o r 5 th frame o n l y

OUTPUT
LPCC o f 5 th frame
−0.3856 −0.5202 −0.5225 −0.5645 −0.7318 −0.7318 −0.5645 −0.5225 −0.5202

%Using t he Matlab f u n c t i o n d i r e c t l y
y1 = r c e p s ( a );% This i s f o r 5 th frame o n l y

OUTPUT
LPCC o f 5 th frame
−0.3856 −0.5202 −0.5225 −0.5645 −0.7318 −0.7318 −0.5645 −0.5225 −0.5202

2.17.8 Mel Frequency Cepstral Coefficients


The procedure of MFCC computation is shown in Fig. 2.18 and described as follows:
After preprocessing, the spectral coefficients of the windowed frames are computed
using fast Fourier transform (discussed in Section 2.10). The results of the FFT will
be information about the amount of energy at each frequency band. Human hearing

56
Fig. 2.18: Extraction of MFCC from speech signal

is not equally sensitive at all frequency bands. It is less sensitive at higher frequencies
roughly above 1000 Hz. MFCC is extracted using this principle. The mapping of
frequency in mel scale is linear below 1000Hz and logarithmic above 1000 Hz. So
the band edges and center frequencies of the filters are linear for low frequency and
logarithmically increase with increasing frequency as shown in Fig. 2.19. We call these
filters as mel-scale filters and collectively a mel-scale filter bank. As can be seen, the
filters used are triangular and they are equally spaced along the mel-scale which is
defined by
f
Mel(f ) = 2595 log10 (1 + ) (2.25)
700

Fig. 2.19: Mel scale filter bank

Each short term Fourier transform (STFT) magnitude coefficient is multiplied by


the corresponding filter gain and the results are accumulated. Then DCT is applied to
the log of the mel spectral coefficients to obtain the mel frequency cepstral coefficients.
The waveforms corresponding to these steps are given in Fig. 2.20. The Matlab code4
4
Use after applying preprocessing

57
Fig. 2.20: Computation of MFCC

is given below.

Matlab code for MFCC

s=wavread ( ’ wavsep1 . wav ’ ) ;


nos=s i z e ( s ) % To t a l no . o f sa mples
f s =8000; % sa mpling r a t e

58
m = 6 4 ; % Frame s h i f t
n = 1 2 8 ; % Frame s i z e

−−−−−−−−−−−−−%Preemphasis−−−−−−−−−−−−−−−
f o r k = 2 : nos
p ( k)= s ( k )−0.95∗ s ( k −1);
end

%C a l c u l a t i n g t o t a l no . o f f r a mes
n o f = f l o o r ( ( nos ( 1 , 1 ) − n ) / m) + 1 ;

−−−−−−−−−−−−−%Frame b l o c k i n g−−−−−−−−−−−−
for i = 1:n
f o r j = 1: nof
f r ( i , j ) = p ( ( ( j − 1 ) ∗ m) + i ) ;
end
end
subplot ( 5 , 1 , 1 )
plot ( fr ( : , 1 3 ) )
x l a b e l ( ’ sa mples ’ ) ; y l a b e l ( ’Amp . ’ ) ;
t i t l e ( ’A frame o f s p e e c h s i g n a l ’ ) ;

−−−−−−−−−−−−% Windowing−−−−−−−−−−−−−−−−−
h = hamming ( n ) ;
h f r = d i a g ( h )∗ f r ;
subplot ( 5 , 1 , 2 )
plot ( hfr ( : , 1 3 ) )
x l a b e l ( ’ Frames ’ ) ; y l a b e l ( ’Amp . ’ ) ;
t i t l e ( ’ Hamming windowed frame ’ ) ;

−−−%Computing FFT o f windowed s i g n a l−−−−


f o r i = 1: nof
frame ( : , i ) = f f t ( h f r ( : , i ) ) ;
end
subplot ( 5 , 1 , 3 )
p l o t ( abs ( frame ( : , 1 3 ) ) )
x l a b e l ( ’ S p e c t r a l sa mples ’ ) ; y l a b e l ( ’ Mag . ’ ) ;

59
t i t l e ( ’ FFT o f windowed s i g n a l ’ ) ;

−−−−−%Mel f i l t e r implement a t io n−−−−−−−−−


f l = 2595 ∗ log (60/700 + 1);
f h = 2595 ∗ l o g ( 8 0 0 0 / 7 0 0 + 1 ) ;
M=45;

dp=( f h− f l ) /M+1;
f o r i =1:M
pc ( i )= i ∗dp ;
end
f o r i =1:M
f c ( i ) = 7 0 0 ∗ ( 1 0 ˆ ( pc ( i ) / 2 5 9 5 ) − 1 ) ;
end

f o r k=1:n
f ( k)=k∗ f s /n ;
end
f o r k=1:n
f o r m=2:21
i f f ( k)< f c (m−1)
H( k ,m−1)=0;
e l s e i f ( f c (m−1)<= f ( k ) ) & ( f ( k)< f c (m) )
H( k ,m−1)=( f ( k)− f c (m−1))/( f c (m)− f c (m− 1 ) ) ;
e l s e i f ( f c (m)<= f ( k ))&( f ( k)< f c (m+1))
H( k ,m−1)= ( f ( k)− f c (m+1))/( f c (m)− f c (m+ 1 ) ) ;
e l s e i f ( f ( k)>= f c (m+1))
H( k ,m−1)=0;
end
end
end
end
end
end
subplot ( 5 , 1 , 4 )
p l o t (H)
x l a b e l ( ’ sa mples ’ ) ; y l a b e l ( ’Amp . ’ ) ;

60
t i t l e ( ’ Mel f i l t e r ’ ) ;

−−−−−−−−%Binning t he s i g n a l with t he mel f i l t e r−−−−−−−−−−−


z = abs ( frame ’ ) ∗H;

−−−−−−−%F inding C e p s t r a l c o e f f i c i e n t s−−−−−−


mfc = dct ( l o g ( z ) ) ;
subplot ( 5 , 1 , 5 )
stem ( mfc ( 2 3 , : ) , ’ f i l l ’ , ’ − − ’ )
x l a b e l ( ’ F i l t e r index ’ ) ; y l a b e l ( ’ Energy ’ ) ;
t i t l e ( ’ Mel f i l t e r c o e f f i c i e n t s ’ ) ;

61
Chapter 3

Basics of Image Processing-I

by
AN. SIGAPPI
Lecturer (Selection Grade),
Department of CSE, Annamalai University.

3.1 Digital Image


The term image refers to a two-dimensional light intensity function f (x, y), where x
and y denote the spatial coordinates and the value of f at any point (x,y) is proportional
to the brightness of the image at that point. A digital image is an image f (x, y)
that has been discretized both in spatial coordinates and brightness. A digital image
is represented as a two-dimensional array of data (matrix), where each pixel value
corresponds to the brightness of the image at the point (x,y). The elements of such a
matrix (digital array) are called image elements, picture elements, or pixels.
The types of image data are:

1. Binary images can take on two values, typically black and white, or 0 and 1.
It takes only one bit to represent each pixel.
2. Gray scale images are referred to as monochrome images. They contain
brightness information, but not color information. The number of bits used for
each pixel is typically 8, and hence 28 = 256 different brightness (gray) levels.
3. Color images can be modeled as three-band monochrome image data, where
each band of data corresponds to a different color. Typical color images are
represented as RGB (Red, Green, Blue) images, and the color image would
have 24 bits/pixel, with 8 bits for each of the three color bands.

The Matlab functions used for reading, displaying, and writing images are given
below [10]:

62
• imread() - The function imread reads the given image into image array f.
Example: f = imread(filename);
• imshow() - The function imshow displays the image stored in image array f,
and G is the number of intensity levels used to display it. Example: imshow(f,
G)
• imwrite() - The function imwrite is used to write images to disk. Example:
imwrite(f, filename)

3.2 Steps in Image Processing


The fundamental steps in image processing [11] are:

• Image acquisition - to acquire a digital image using an imaging sensor (e.g.


camera) and to digitize the signal produced by the sensor.
• Image preprocessing - to improve the image in ways that increase the perfor-
mance of the subsequent steps. Example: enhancing contrast, removing noise.
• Segmentation - Partitions an image into its constituent parts or objects. Ex-
ample: In a character recognition application, to extract the individual charac-
ters from the background.
• Representation and Description - Representation specifies a method for
describing the data, so that features of interest are highlighted. Description
(also called feature selection), deals with extracting features that result in some
quantitative information of interest.
• Recognition and Interpretation - Recognition assigns a label to an object
based on the information provided by its descriptors. Interpretation involves
assigning meaning to an ensemble of recognized objects.

3.3 Applications of Image Processing


The following is an illustrative list of application areas of image processing:

• Meteorology
• Seismology
• Autonomous navigation
• Inspection
• Digital Library

63
• Radar, SAR
• Remote Sensing
• Internet
• Surveillance
• Robotic Assembly
• Microscopy
• Ultrasonic imaging
• Radiology
• Astronomy

3.4 Sampling and Quantization


To be processed by a computer, an image function f (x, y) must be digitized both
spatially and in amplitude. Digitization of the spatial coordinates (x,y) is called image
sampling. Digitizing the amplitude values is called gray-level quantization. The
resolution of an image (the degree of discernible detail) depends strongly on the
following two parameters:

1. Number of samples
2. Number of gray levels

Reducing the spatial resolution from N = 1024 to N = 512, 256, 128, 64, and 32,
while keeping the number of gray levels constant (say 256) results in a checkerboard
effect, which is particularly visible in images of lower resolution. Decreasing the
number of gray levels in an image, while keeping the spatial resolution constant, results
in false contouring.

3.5 Image Enhancement


The objective of enhancement techniques is to process an image so that the result is
more suitable than the original image for a specific application. For instance, a method
used for enhancing medical images cannot be used for enhancing satellite images. The
approaches for image enhancement fall under two broad categories:

1. Spatial Domain methods - based on the direct manipulation of the pixels in


an image

64
2. Frequency Domain methods - based on modifying the Fourier transform of
an image

Image enhancement can be done by:

1. Point processing techniques Enhancement at any point in an image depends


only on the gray level value at that point.
2. Spatial filtering methods Enhancement techniques based on the the use of
spatial masks for image processing.

In the following discussion, T denotes a gray level transformation function (map-


ping function) of the form s = T (r), where r denotes the intensity of the input image
and s denotes the intensity of the output image.

3.5.1 Point Processing Techniques


3.5.1.1 Simple Intensity Transformations

1. Image Negative: The negative of a digital image is obtained by using the


transformation function
s = 255 − r (3.1)
That is to reverse the order from black to white. The graph given in Fig. 3.1
shows the mapping between the original values (x-axis) and imcomplement.

Fig. 3.1: Mapping between original values and imcomplement function

MATLAB has a function imcomplement (f) to create photographic negatives.


The Matlab code to obtain image negatives is given below:
% Image Neg a t ive
% Method 1 u s i n g t he f u n c t i o n s = 2 5 5 − r

65
% do uble ( I ) g i v e s do uble p r e c i s i o n v a l u e s
% mat2gray (K) c o n v e r t s ma t r ix K t o t he i n t e n s i t y image O
I=imread ( ’ t i r e . t i f ’ ) ;
f i g u r e ; imshow ( I )
K = 2 5 5 − do uble ( I ) ;
O = mat2gray (K) ;
f i g u r e ; imshow (O) ;
% Method 2 u s i n g imcomplement ( I )
% imcomplement ( I ) computes t he n e g a t i v e o f i n p u t image I
J=imcomplement ( I ) ;
f i g u r e , imshow ( J ) ;
% Method 3 u s i n g i m a d j u s t ( ) f u n c t i o n
H = imadjust ( I , [ 0 , 1 ] , [ 1 , 0 ] ) ;
f i g u r e , imshow (H) ;

The input image and its negative are given in Fig. 3.2.

(a) (b)
Fig. 3.2: Image negative.(a) Input image (b) Complemented image

2. Contrast Stretching: The idea behind contrast stretching is to increase the


dynamic range of the gray levels in the image being processed.
Fig. 3.3 shows a typical transformation used for contrast stretching. The location
of points (r1 , s1 ) and (r2 , s2 ) control the shape of the transformation function.
In this context, the following three cases are to be understood:

• Case 1 : If r1 = s1 and r2 = s2 , the transformation is a linear function that


produces no change in the output.
• Case 2 : If r1 = r2 , s1 = 0, and s2 = L − 1, the transformation becomes a
thresholding function (Fig. 3.4) that creates a binary image.
• Case 3 : Intermediate values of (r1 , s1 ) and (r2 , s2 ) produce various degrees
of spread in the gray levels of the output image, thus affecting its contrast.

66
Fig. 3.3: Form of contrast stretching transformation function

Fig. 3.4: Thresholding

The transformation function for contrast stretching used in the Matlab code
given below has the form
1
s = T (r) = (3.2)
1 + mr E

Example 1: The Matlab code for contrast stretching transformation is given


below:
% Co nt r a st S t r e t c h i n g − Case 1 u s i n g E = 4 , 5 , 1 0
% Co nt r a st s t r e t c h i n g i s done u s i n g t he e x p r e s s i o n
% 1 . / ( 1 + (M. / ( I 2 + eps ) ) . ˆ E)
% M must be i n t he r a ng e [ 0 , 1 ]
% t he d e f a u l t v a l u e f o r M i s mean2 ( imdouble ( I )
% t he d e f a u l t v a l u e f o r E i s 4
% EPS , with no arguments , i s t he d i s t a n c e from 1 . 0 t o
% t he next l a r g e r do uble p r e c i s i o n number ,
% t h a t i s EPS with no arguments r e t u r n s 2 ˆ ( − 5 2 ) .

67
% mean2 ( I 2 ) computes t he mean o f t he v a l u e s i n I 2

I=imread ( ’ t i r e . t i f ’ ) ;
I 2=im2double ( I ) ;
m=mean2 ( I 2 )
c o n t r a s t 1 =1./(1+(m. / ( I 2+eps ) ) . ˆ 4 ) ;
c o n t r a s t 2 =1./(1+(m. / ( I 2+eps ) ) . ˆ 5 ) ;
c o n t r a s t 3 =1./(1+(m. / ( I 2+eps ) ) . ˆ 1 0 ) ;
imshow ( I 2 )
f i g u r e , imshow ( c o n t r a s t 1 )
f i g u r e , imshow ( c o n t r a s t 2 )
f i g u r e , imshow ( c o n t r a s t 3 )

The input image and the contrast stretched images are shown in Fig. 3.5.

(a) (b)

(c) (d)
Fig. 3.5: Contrast stretching.(a) Input image (b) Contrast stretched image for E=4 (c)
E=5 (d) E=10

Example 2 : The Matlab code for widening the dynamic range of the gray
levels in the image being processed is given below:
% Co nt r a st s t r e t c h i n g − Case 2
% D e c r e a s e t he c o n t r a s t o f an image by na r r o wing
% t he r a ng e o f t he data .

68
I = imread ( ’ cameraman . t i f ’ ) ;
J = imadjust ( I , [ 0 0 . 2 ] , [ 0 . 5 1 ] ) ;
f i g u r e , imshow ( I )
t i t l e ( ’ O r i g i n a l image ’ ) ;
f i g u r e , imshow ( J )
t i t l e ( ’ image with dynamic r a ng e widened ’ ) ;

The input image and the stretched image are shown in Fig. 3.6.

(a) (b)
Fig. 3.6: Contrast stretching.(a) Input image (b) Stretched image

In Fig. 3.6(a) it can be observed that the man’s coat is too dark to reveal any
detail. The Matlab function imadjust() maps the range [0,51] in the uint8 input
image to [128,255] in the output image. This brightens the image consider-
ably,and also widens the dynamic range of the dark portions of the original
image, making it much easier to see the details in the coat. Note, however, that
because all values above 51 in the original image are mapped to 255 (white) in
the adjusted image shown in Fig. 3.6(b), the adjusted image appears washed
out.
3. Compression of Dynamic Range: Sometimes the dynamic range of a pro-
cessed image far exceeds the capability of the display device. An effective way to
compress the dynamic range of pixel values is to perform the following intensity
transformation:
s = c ∗ log(1 + |r|) (3.3)

The Matlab code to compress the dynamic range of pixel values using logarithm
transformation is given below:
% L o g a r i t h m i c T r a n s f o r m a t i o n s with v a l u e s f o r c = 1 , 2 , 5
% L o g a r i t h m i c t r a n s f o r m a t i o n s a r e implemented u s i n g t he
% e x p r e s s i o n g = c ∗ l o g (1+ do uble ( f ) )

69
% f i s t he i n p u t image and c i s a c o n s t a n t
% image i s c o n v e r t e d t o do uble u s i n g im2double ( )

I=imread ( ’ t i r e . t i f ’ ) ;
imshow ( I )
I 2=im2double ( I ) ;
J1=1∗ l o g (1+ I 2 ) ;
J2=2∗ l o g (1+ I 2 ) ;
J3=5∗ l o g (1+ I 2 ) ;
f i g u r e , imshow ( J1 )
f i g u r e , imshow ( J2 )
f i g u r e , imshow ( J3 )

The input image and the images whose dynamic values are compressed for c =
1, 2, and 5 are given in Fig. 3.7

(a) (b)

(c) (d)
Fig. 3.7: Compression of dynamic range. (a) Input image (b) Images with dynamic range
compressed for values of c=1 (c) c=2 (d) c=5

70
3.5.1.2 Histogram Processing

1. Histogram: The histogram of a digital image with gray levels in the range
[0, L − 1] is a discrete function
nk
p(rk ) = , 0 ≤ k ≤ L − 1 (3.4)
n
where rk is the k th gray level, nk is the number of of pixels in the image with that
gray level, n is the total number of pixels in the image with k = 0, 1, 2, ..L − 1.
In short, p(rk ) gives an estimate of the probability of occurrence of gray level
rk .
The Matlab function imhist() is used to plot a histogram. The Matlab code to
plot a histogram of a low contrast image and the histogram of the image after
adjusting the intensity values using imadjust() function is given below:
% A low c o n t r a s t image with i t s hist o g r a m
I = imread ( ’ pout . t i f ’ ) ;
f i g u r e , imshow ( I )
t i t l e ( ’A low c o n t r a s t image ’ ) ;
f i g u r e , imhist ( I ,64)
t i t l e ( ’ Histogram o f low c o n t r a s t image ’ ) ;
% a d j u s t t he i n t e n s i t y v a l u e s i n an image u s i n g t he i m a d j u s t f u n c t i o
% where you s p e c i f y t he r a ng e o f i n t e n s i t y v a l u e s i n t he output imag
J = imadjust ( I ) ;
f i g u r e , imshow ( J ) ;
t i t l e ( ’ I n t e n s i t y a d j u s t e d image ’ ) ;
f i g u r e , imhist (J ,64)
t i t l e ( ’ Histogram o f I n t e n s i t y a d j u s t e d image ’ ) ;

The low contrast image and its histogram are shown in Fig. 3.8.

(a) (b)
Fig. 3.8: Histogram.(a) Low contrast input image (b) Histogram of low contrast image

The intensity adjusted image and its histogram are shown in Fig. 3.9.

71
(a) (b)
Fig. 3.9: Histogram.(a) Intensity adjusted image (b) Histogram of intensity adjusted
image

2. Histogram Equalization: The technique used for obtaining a uniform his-


togram is known as histogram equalization or histogram linearization.
The steps to perform histogram equalization are:

(a) Find the probability of occurrence of each gray level (rk ) in the input image
using equation 1.2.
(b) Use the transformation function sk = T (rk ) = to obtain the histogram
equalized image

Histogram equalization significantly improves the visual appearance of the im-


age. Similar enhancement results could be achieved using the contrast stretching
approach. However the advantage of histogram equalization is that it is fully
automatic.
The Matlab code for histogram equalization using histeq() function is given
below:
% Histogram E q u a l i z a t i o n
I = imread ( ’ pout . t i f ’ ) ;
J = histeq ( I );
i m w r i t e ( J , ’ boyheqn . t i f ’ , ’ TIFF ’ )
f i g u r e , imshow ( I ) , f i g u r e , imshow ( J )

The original image and the histogram equalized image are given in Fig. 3.10.

3.5.1.3 Image Subtraction

The difference between two images f (x, y) and h(x, y), expressed as

g(x, y) = f (x, y) − h(x, y) (3.5)

72
(a) (b)
Fig. 3.10: Histogram equalization. (a) Original input image (b) Histogram equalized
image

is obtained by computing the difference between all pairs of corresponding pixels from f
and h. Image subtraction finds useful applications in segmentation and enhancement.
The Matlab code to perform image subtraction is given below:
% Image s u b t r a c t i o n u s i n g i m s u b t r a c t f u n c t i o n
% imopen p e r f o r m s m o r p h o l o g i c a l o pening on t he
% g r a y s c a l e o r b i n a r y image I with t he s t r u c t u r i n g
% element g i v e n by s t r e l .
% s t r e l ( ’ d i s k ’ , 1 5 ) c r e a t e s a f l a t , d i s k shaped
% s t r u c t u r i n g element whose r a d i u s = 15
clear all ;
I = imread ( ’ r i c e . png ’ ) ;
f i g u r e , imshow ( I ) ;
% remove e l e m e n t s ha ving r a d i u s < 1 5 p i x e l s
background = imopen ( I , s t r e l ( ’ d i s k ’ , 1 5 ) ) ;
f i g u r e , imshow ( background ) ;
Ip = i m s u b t r a c t ( I , background ) ;
f i g u r e , imshow ( Ip , [ ] ) ;
The input image is subtracted from the background image and the results are
shown in Fig. 3.11.

3.5.2 Spatial Filtering


The use of spatial masks for image processing is usually called spatial filtering and
the masks themselves are called spatial filters. The other technique that makes use
of the Fourier transform is called frequency domain filtering. There are three types
of filters. They are:

1. Low-pass Filters: These filters attenuate or eliminate high-frequency com-

73
(a) (b)
Fig. 3.11: Image subtraction. (a) Input image (b) Subtracted image

ponents in the Fourier domain while leaving low frequencies untouched. High
frequency components characterize edges and other sharp details in an image,
so the net effect of low pass filtering is image blurring.
2. High-pass Filters: These filters attenuate or eliminate low frequency compo-
nents. Low frequency components are responsible for slowly varying character-
istics of an image, such as overall contrast and average intensity, and hence the
net result of high pass filtering is apparent sharpening of edges and other
sharp details.
3. Bandpass Filters: These filters remove selected frequency regions between low
and high frequencies and are used for image restoration.

The two types of spatial filters used for image enhancement are:

1. Linear Spatial filters


2. Nonlinear Spatial Filters

3.5.2.1 Linear and Nonlinear Spatial Filters

The term spatial domain refers to the aggregate of pixels composing an image, and
image processing functions in the spatial domain may be expressed as

g(x, y) = T [f (x, y)] (3.6)

where f (x, y) is the input image, g(x, y) is the processed image, and T is an operator
on f , defined over some neighbourhood of (x, y). The basic approach in linear spatial
filtering is to sum products between the mask coefficients and the intensities of the
pixels under the mask at a specific location in the image. Denoting the gray levels of
pixels under the mask at any location by z1 , z2 , ..., z9 , the response of a linear 3x3
mask is R = w1 z1 + w2 z2 + ..... + w9 z9 . The gray level of the pixel located at (x, y)
is replaced by R. The mask is then moved to the next pixel location in the image and

74
the process is repeated. This continues until all pixel locations have been covered. A
new image must be created to store the values of R. Typical examples of linear spatial
filters are averaging filters.
Nonlinear spatial filters also operate on neighbourhoods. However, their operation
is based directly on the values of the pixels in the neighbourhood under consideration.
Typical examples of nonlinear spatial filters include median filters, max filters, and
min filters.

3.5.2.2 Smoothing Filters

Smoothing filters are used for blurring and for noise reduction. Some of the smoothing
filters are discussed below:

1. Averaging Filter: It is a kind of low pass filter. If the objective is to achieve


blurring, averaging filters can be used. The response R would simply be the
average of all the pixels in the area of the mask. An averaging filter can be
implemented by using the Matlab function imfilter(). The Matlab code is given
below:
% F i l t e r s an image with a 5−by−5 f i l t e r c o n t a i n i n g
% equal weights .
clear all ;
I = imread ( ’ c o i n s . png ’ ) ;
h = o nes ( 5 , 5 ) / 2 5 ;
I2 = i m f i l t e r ( I , h ) ;
imshow ( I ) , t i t l e ( ’ O r i g i n a l Image ’ ) ;
f i g u r e , imshow ( I 2 ) , t i t l e ( ’ F i l t e r e d Image ’ )

The input image and filtered image are given in Fig. 3.12.

(a) (b)
Fig. 3.12: Averaging filtering. (a) Original image (b) Filtered image

75
2. Median filter: If the objective is to achieve noise reduction, it is preferred
to use median filters. That is, the gray level of each pixel is replaced by the
median of the gray levels in a neighbourhood of that pixel, as opposed to using
the average value in an averaging filter. Thus the principal function of median
filtering is to force points with distinct intensities to be more like their neigh-
bours, thus resulting in noise reduction. The Matlab function medfilt2() is used
for this purpose. The Matlab code for median filtering is given below:
% Median f i l t e r i n g
% The v a l u e o f an output p i x e l i s det er mined by t he
% median o f t he neig hbo r ho o d p i x e l s , r a t h e r than mean
% Median f i l t e r i n g i s b e t t e r a b l e t o remove o u t l i e r s
% wit ho ut r e d u c i n g t he s h a r p n e s s o f t he image .
% i m n o i s e ( ) adds s a l t and pepper n o i s e t o t he i n p u t
% image with n o i s e d e n s i t y = 0 . 0 2
clear all ;
I = imread ( ’ e i g h t . t i f ’ ) ;
f i g u r e , imshow ( I )
t i t l e ( ’ O r i g i n a l image ’ ) ;
J = i m n o i s e ( I , ’ s a l t & pepper ’ , 0 . 0 2 ) ;
f i g u r e , imshow ( J )
t i t l e ( ’ Image with s a l t and pepper n o i s e ’ ) ;
% f i g u r e , imshow (K)
L = medfilt2 (J , [ 3 3 ] ) ;
f i g u r e , imshow (L)
t i t l e ( ’ F i l t e r e d image g o t u s i n g median f i l t e r ’ ) ;

The results of salt and pepper noise added to an input image, and then median
filtered image are given in Fig. 3.13.

3.5.2.3 Sharpening Filters

The main objective of sharpening is to highlight the fine details in an image or to


enhance details that have been blurred. This is very much useful in electronic printing,
medical imaging, industrial inspection, and other related applications. The types of
sharpening filters are:

1. Basic High-pass Spatial Filtering: To implement a highpass spatial filter,


we must have positive coefficients near the centre and negative coefficients in
the outer periphery.

76
(a) (b) (c)
Fig. 3.13: Median filtering. a) Original input image (b) Image with salt and pepper noise
(c) Median filtered image

2. High-boost Filtering: Multiplying the original image by an amplification


factor, A, yields the definition of a high-boost or, high-frequency filter. When
A = 1, it yields the standard highpass result. When A > 1, the high-boost
image looks more like the original image, with a relative degree of edge en-
hancement that depends on the value of A. The general process of subtracting
a blurred image from an original image, as given in the following equation, is
called unsharp masking.

Highboost = (A)(Original) − Lowpass (3.7)

The Matlab code to illustrate the effect of applying an unsharp masking filter
to an image is given below:
% i l l u s t r a t e s a p p l y i n g an unsharp masking f i l t e r
% t o a g r a y s c a l e image
% unsharp i s a 2D s p a t i a l f i l t e r
I = imread ( ’ moon . t i f ’ ) ;
h = f s p e c i a l ( ’ unsharp ’ ) ;
I2 = i m f i l t e r ( I , h ) ;
imshow ( I ) , t i t l e ( ’ O r i g i n a l Image ’ )
f i g u r e , imshow ( I 2 ) , t i t l e ( ’ F i l t e r e d Image ’ )

The input image and the filtered image are given in Fig. 3.14.
3. Derivative Filters: Differentiation can be expected to sharpen an image and
the most common method of differentiation in image processing applications is
the gradient. Roberts cross gradient operators, Prewitt operators, and Sobel
operators are used in computing the differences in x-direction and y-direction
and are hence used for approximating the magnitude of the gradient. These
operators are extremely useful in edge detection applications.

77
(a) (b)
Fig. 3.14: Unsharp masking. (a) Original input image (b) Filtered image

3.5.3 Filtering in the Frequency Domain


The principle behind enhancement in the frequency domain is as follows: Compute the
Fourier transform of the image to be enhanced, multiply the result by a filter transfer
function, and take the inverse transform to produce the enhanced image. The basic
steps for filtering in the frequency domain is summarized in Fig. 3.15.

Fig. 3.15: Basic steps for filtering in frequency domain

3.5.3.1 Lowpass Frequency Domain Filters

Edges and other sharp transitions in the gray levels of an image contribute significantly
to the high-frequency content of it’s Fourier transform. Hence blurring (smoothing) is
achieved in the frequency domain by attenuating a specified range of high-frequency
components in the transform of a given image. Let g(x, y) be an image formed by the
convolution of an image f (x, y) and a linear position invariant operator h(x, y), that

78
is,
g(x, y) = h(x, y) ∗ f (x, y) (3.8)
From convolution theorem, the following frequency domain relation holds:

G(u, v) = H(u, v)F (u, v) (3.9)

where G, H, and F are the Fourier transforms of g, h, and f respectively. The task
is to select the filter transfer function H(u, v) that yields G(u, v) by attenuating the
high-frequency components of F (u, v).
A 2-D ideal lowpass filter (ILPF) is one whose transfer function satisfies the relation
(
1 , if D(u, v) ≤ D0
H(u, v) = (3.10)
0 , if D(u, v) > D0

where D0 is a specified nonnegative quantity, and D(u, v) is the distance from the
point (u, v) to the origin of the frequency plane, that is,
1
D(u, v) = (u2 + v 2 ) 2 (3.11)

Fig. 3.16(a) shows a 3-D perspective plot of H(u, v) as a function of u and v. For
an ideal lowpass filter cross section, the point of transition between H(u, v) = 1 and
H(u, v) = 0 is often called the cutoff frequency. In Fig. 3.16(b), the cutoff frequency
is D0 .

(a) (b)
Fig. 3.16: Ideal lowpass filter. (a) Perspective plot of filter transfer function (b) Filter
cross section

The Matlab code to generate an ideal lowpass frequency domain filter is written
with two M-functions lpfilter, dftuv and a main program freqfilt.m. The function lpfilter
computes frequency domain lowpass filters. It uses the function dftuv to setup the
mesh grid arrays needed for computing the required distances. The program freqfilt.m
performs lowpass filtering without padding. The code is given below:

79
% main program f r e q f i l t .m
% p e r f o r m s i d e a l lo wpa ss f i l t e r i n g i n f r e q domain
clear all ;
f = imread ( ’ t e x t . png ’ ) ;
f i g u r e , imshow ( f ) ;
[M, N] = s i z e ( f ) ;
F = fft2 ( f );
sig = 10;
H = l p f i l t e r ( ’ i d e a l ’ , M, N , s i g ) ;
G = H. ∗ F ;
g = r e a l ( i f f t 2 (G) ) ;
f i g u r e , imshow ( g , [ ] )
% end o f main

f u n c t i o n [ H , D] = l p f i l t e r ( type , M, N , D0 , n )
% f u n c t i o n l p f i l t e r computes
% f r e q u e n c y domain lo wpa ss f i l t e r s
[U , V] = d f t u v (M, N) ;
D = s q r t (U. ˆ 2 + V. ˆ 2 ) ;
s w i t c h type
case ’ idea l ’
H = do uble (D <= D0 ) ;
c a s e ’ btw ’
i f n a r g i n == 4
n = 1;
end
H = 1 . / ( 1 + (D. / D0 ) . ˆ ( 2 ∗ n ) ) ;
case ’ gaussian ’
H = exp (−(D. ˆ 2 ) . / ( 2 ∗ (D0 ˆ 2 ) ) ) ;
otherwise
e r r o r ( ’ unknown f i l t e r t pe ’ )
end

f u n c t i o n [ U , V] = d f t u v (M, N)
% f u n c t i o n d f t u v s e t s up t he mesh g r i d a r r a y s
% needed f o r computing t he r e q u i r e d d i s t a n c e s
u = 0 : (M−1);

80
v = 0 : (N−1);
i d x = f i n d ( u > M/ 2 ) ;
u ( i d x ) = u ( i d x ) − M;
i d y = f i n d ( v > N/ 2 ) ;
v ( i d y ) = v ( i d y ) − N;
[U V] = meshgrid ( v , u ) ;
The input image (Fig. 3.17(a)) and the blurred image (Fig. 3.17(b)) obtained as
output demonstrate the effect of an ideal low pass filter in blurring an image.

(a) (b)
Fig. 3.17: Ideal lowpass filtering. (a) Input image (b) Filtered image

3.5.3.2 Highpass Frequency Domain Filters

Image sharpening can be acheived in the frqeuency domain by a highpass filtering


process, which attenuates the low frequency components without disturbing high fre-
quency information in the Fourier transform. A 2-D ideal highpass filter (IHPF) is
one whose transfer function satisfies the relation
(
0 , if D(u, v) ≤ D0
H(u, v) = (3.12)
1 , if D(u, v) > D0

where D0 is the cutoff distance measured from the origin of the frequency plane, and
D(u, v) is given by
1
D(u, v) = (u2 + v 2 ) 2 (3.13)
Fig. 3.18 shows a 3-D perspective plot and cross section of IHPF function. This
filter is the opposite of ILPF.
Given the transfer function Hlp (u, v) of a lowpass filter, we obtain the transfer
function of the corresponding highpass filter by using the relation

Hhp (u, v) = 1 − Hlp (u, v) (3.14)

81
(a) (b)
Fig. 3.18: Ideal highpass filter. (a) Perspective plot of filter transfer function (b) Filter
cross section

Thus the function lpfilter developed for lowpass filtering in the previous section can
be used as a basis for generating highpass filters.
The Matlab code to generate an ideal highpass frequency domain filter is written
with M-functions lpfilter, dftuv, dftfilt, paddedsize and a main program hpfex1.m. The
function lpfilter computes frequency domain lowpass filters. It uses the function dftuv
to setup the mesh grid arrays needed for computing the required distances. The
function dftfilt performs freuency domain filtering using the filter transfer function H.
The function paddedsize computes padded sizes useful for FFT-based filtering. The
function hpfilter computes frequency domain highpass filters. The program hpfex1.m
performs ideal highpass filtering with padding. The code is given below:
% Main program hpf ex1 .m
% p e r f o r m s f r e q . domain hig h p a s s f i l t e r i n g
clear all ;
f = imread ( ’ t e x t . png ’ ) ;
f i g u r e , imshow ( f ) ;
PQ = p a d d e d s i z e ( s i z e ( f ) ) ;
D0 = 0 . 0 5 ∗PQ( 1 ) ;
H = h p f i l t e r ( ’ g a u s s i a n ’ , PQ( 1 ) , PQ ( 2 ) , D0 ) ;
g = d f t f i l t ( f , H) ;
f i g u r e , imshow ( g , [ ] ) ;

f u n c t i o n H = h p f i l t e r ( type , M, N , D0 , n )
% f u n c t i o n h p f i l t e r computes f r e q u e n c y
% domain h i g h p a s s f i l t e r s
i f n a r g i n == 4

82
n = 1;
end
Hlp = l p f i l t e r ( type , M, N , D0 , n ) ;
H = 1 − Hlp ;

f u n c t i o n PQ = p a d d e d s i z e (AB, CD, PARAM)


% f u n c t i o n p a d d e d s i z e computes padded s i z e s
i f n a r g i n == 1
PQ = 2∗AB;
e l s e i f n a r g i n == 2 & ˜ i s c h a r (CD)
PQ = AB + CD −1 ;
PQ = 2 ∗ c e i l (PQ/ 2 ) ;
e l s e i f n a r g i n == 2
m = max(AB) ;

P = 2ˆ nextpow2 ( 2∗m) ;
PQ = [P , P ] ;
e l s e i f n a r g i n == 3
m = max ( [AB CD ] ) ;
P = 2ˆ nextpow2 ( 2∗m) ;
PQ = [P , P ] ;
else
e r r o r ( ’ wrong number o f i n p u t s ’ ) ;
end

f u n c t i o n g = d f t f i l t ( f ,H)
% function d f t f i l t performs frequency
% domain f i l t e r i n g
F = f f t 2 ( f , s i z e (H , 1 ) , s i z e (H , 2 ) ) ;
g = r e a l ( i f f t 2 (H. ∗ F ) ) ;
g = g (1: size ( f , 1 ) , 1 : size ( f ,2));
The input image (Fig. 3.19(a)) and the sharpened image (Fig. 3.19(b)) obtained
as output demonstrate the effect of an ideal highpass filter in sharpening an image.

83
(a) (b)
Fig. 3.19: Ideal highpass filtering. (a) Input image (b) Sharpened image

84
Chapter 4

Basics of Image Processing-II

by
Mrs. S. ABIRAMI
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

4.1 Image Transforms


4.1.1 Fast Fourier Transform
The fast Fourier transform (FFT) is an efficient algorithm for computing the discrete
Fourier transform (DFT) of a sequence; it is not a separate transform. It is particularly
useful in areas such as signal and image processing, where the uses range from filtering,
convolution, and frequency analysis to power spectrum estimation.

4.1.1.1 Discrete Fourier Transform

The Fourier transform [12], [13] is a representation of an image as a sum of complex


exponentials of varying magnitudes, frequencies, and phases. The Fourier transform
plays a critical role in a broad range of image processing applications, including en-
hancement, analysis, restoration, and compression. If f (x, y) is a function of two dis-
crete spatial variables x and y, then the two-dimensional Fourier transform of f (x, y)
is given by the relationship
M −1 N −1
1 XX
F (u, v) = f (x, y)e−j2π(ux/M +vy/N ) (4.1)
MN x=0 y=0

F (u, v) is often called the frequency-domain representation of f (x, y). The inverse of a
transform is an operation that when performed on a transformed image produces the
original image. The inverse two-dimensional Fourier transform is given by
M
X −1 N
X −1
f (x, y) = F (u, v)ej2π(ux/M +vy/N ) (4.2)
u=0 v=0

85
4.1.1.2 Matlab Code for FFT
I=imread(’cameraman.tif’);
figure;subplot(1,3,1);imshow(I);
title(’Original image’);
j=fft2(I);
subplot(1,3,2);imshow(mat2gray(log(abs(j))));
title(’DFT coefficients’);
k=ifft2(j);
subplot(1,3,3);imshow(mat2gray(abs(k)));
title(’Reconstructed image’);

The result of this Matlab code [14] is shown in Fig. 4.1.

Original image DFT coefficients Reconstructed image

Fig. 4.1: Discrete Fourier transform.

4.1.2 Discrete Cosine Transform


The discrete cosine transform (DCT) represents an image as a sum of sinusoids of
varying magnitudes and frequencies The two-dimensional DCT of a M-by-N matrix F
is defined as
M −1 N −1
X X (2x + 1)uπ (2y + 1)vπ
g(u, v) = α(u)α(v) f (x, y) cos cos , 0 ≤ u ≤ M−1, 0 ≤ v ≤ N−1
x=0 y=0
2M 2N
(4.3)
 q
1

M
, u=0
α(u) = q (4.4)
2

M
, 1≤u≤ M −1

 q
1

N
, v=0
α(v) = q (4.5)
2

N
, 1≤v ≤N −1

86
The values g(u, v) are called the DCT coefficients of F. The DCT is an invertible
transform, and its inverse is given by
M −1 N −1
X X (2x + 1)uπ (2y + 1)vπ
f (x, y) = α(u)α(v)g(u, v) cos cos , 0 ≤ x ≤ M−1, 0 ≤ y ≤ N−1
u=0 v=0
2M 2N
(4.6)

4.1.2.1 Matlab Code for DCT


P=imread(’cameraman.tif’);
figure;subplot(1,3,1);imshow(P);
title(’Original image’);
q=dct2(P);
subplot(1,3,2);imshow(mat2gray(log(abs(q))));
title(’DCT coefficients’);
r=idct2(q);
subplot(1,3,3);imshow(mat2gray(abs(r)));
title(’Reconstructed image’);
The original image, DCT coefficients and the reconstructed image from the above code
are shown in Fig. 4.2.

Original image DCT coefficients Reconstructed image

Fig. 4.2: Discrete cosine transform.

4.1.3 Discrete Wavelet Transform


For many signals, the low-frequency content is the most important part. It is what gives
the signal its identity. The high-frequency content, on the other hand, imparts flavor or
nuance. Consider the human voice. If you remove the high-frequency components, the
voice sounds different, but you can still tell what’s being said. However, if you remove
enough of the low-frequency components, you hear gibberish. In wavelet analysis,
we often speak of approximations and details. The approximations are the high-
scale, low-frequency components of the signal. The details are the low-scale, high-
frequency components. The filtering or decomposition process is shown in Fig. 4.3.

87
Lo D and Hi D are low pass and high pass decomposition filters, respectively. 2 ↓ 1
or 1 ↓ 2 represents down sampling by 2. cA and cD are the approximation and detail
coefficients.

Fig. 4.3: Two-dimensional wavelet decomposition

4.1.3.1 Matlab Function for DWT

[cA, cH, cV, cD] = dwt2(X,′ wname′ )


[cA, cH, cV, cD] = dwt2(X, LoD , HiD )
The dwt2 command performs a single-level two-dimensional wavelet decomposition
with respect to either a particular wavelet ’wname’ or for the given decomposition
filters Lo D and Hi D. [15]
Some of the wavelets are:
Daubechies: ’db1’ or ’haar’, ’db2’, ... ,’db45’
Coiflets : ’coif1’, ... , ’coif5’
Symlets : ’sym2’ , ... , ’sym8’, ... ,’sym45’
Use [Lo D,Hi D]=wfilters(’wname’) for obtaining particular wavelet decomposition fil-
ters Lo D and Hi D.
[cA, cH, cV, cD] = dwt2(X,′ wname′ ) computes the approximation coefficients matrix
cA and details coefficients matrices cH, cV , and cD (horizontal, vertical, and diagonal,
respectively), for the input matrix X. The ’wname’ string contains the wavelet name.
The function call [cA, cH, cV, cD] = dwt2(X, LoD , HiD ) performs the two-dimensional
wavelet decomposition for the specified decomposition filters. The following Matlab
code obtains the approximation and detail coefficients for the Fig. 4.4 and the result
is shown in Fig. 4.5.
S=imread(’cameraman.tif’);

88
figure; imshow(S);
title(’Original image’);
[ca,ch,cv,cd]=dwt2(S,’db1’,’mode’,’zpd’);
figure;
subplot(2,2,1);imshow(mat2gray(ca));
title(’Approximation coefficients’);
subplot(2,2,2);imshow(mat2gray(ch));
title(’Detail coefficients-horizontal’);
subplot(2,2,3);imshow(mat2gray(cv));
title(’Detail coefficients-vertical’);
subplot(2,2,4);imshow(mat2gray(cd));
title(’Detail coefficients-diagonal’);

Original image

Fig. 4.4: Two-dimensional wavelet transformation (Original image).

4.2 Morphology
Morphology is the study of the shape and form of objects. Morphological image
analysis can be used to perform object extraction, image filtering operations such as
removal of small objects or noise from an image, image segmentation operations such
as separating connected objects, measurement operations such as texture analysis and
shape description.

4.2.1 Morphological Operations


Dilation and erosion are two fundamental morphological operations. Dilation adds
pixels to the boundaries of objects in an image, while erosion removes pixels on object
boundaries. The number of pixels added or removed from the objects in an image

89
Approximation coefficients Detail coefficients−horizontal

Detail coefficients−vertical Detail coefficients−diagonal

Fig. 4.5: Two-dimensional wavelet transformation (Filtered images).

depends on the size and shape of the structuring element used to process the image.
Dilation and erosion are often used in combination to implement image processing
operations. For example, the definition of a morphological opening of an image is an
erosion followed by a dilation, using the same structuring element for both operations.
The related operation, morphological closing of an image is the reverse: it consists of
dilation followed by an erosion with the same structuring element.

4.2.1.1 Dilation

The value of the output pixel is the maximum value of all the pixels in the input pixel’s
neighborhood (structuring element). In a binary image, if any one of the pixels value
in the neighborhood is 1 then the output pixel is set to 1. The following code performs
the dilation operation with diamond as the structuring element.i.e., diamond shaped
region is considered for neighborhood. The structuring elements such as square, disk
and ball are also used in morphological operations.

A=imread(’cameraman.tif’);
figure;subplot(1,2,1);imshow(A);
title(’Original image’);
se=strel(’diamond’,3);
B=imdilate(A,se);
subplot(1,2,2);imshow(B);
title(’Dilated image’);

The right image in Fig. 4.6 is the result of dilation for the image shown in left.

90
Original image Dilated image

Fig. 4.6: Morphological dilation.

4.2.1.2 Erosion

The value of the output pixel is the minimum value of all the pixels in the input pixel’s
neighborhood. In a binary image, if any one of the pixels value in the neighborhood is
0 then the output pixel is set to 0. The following code performs the erosion operation
with diamond as the structuring element. The result is shown in Fig. 4.7.

D=imread(’cameraman.tif’);
figure;subplot(1,2,1);imshow(D);
title(’Original image’);
se=strel(’diamond’,3);
E=imerode(D,se);
subplot(1,2,2);imshow(E);
title(’Eroded image’);

4.3 Image Segmentation


Segmentation subdivides an image into its constituent regions or objects.

4.3.1 Edge Detection


The most common way to look for discontinuities is to run a mask through the image.
The response of the mask at any point in the image is given by
9
X
R= wi zi (4.7)
i=1

91
Original image Eroded image

Fig. 4.7: Morphological erosion.

where zi is the gray level of the pixel associated with mask coefficient wi . Edge is the
measure of gray level discontinuity at a point in the images. The edge or gradient of
an image is based on the partial derivatives ∂f∂x
(horizontal) and ∂f
∂y
(vertical) at every
pixel location. The horizontal and vertical partial derivatives can be obtained using
the following Sobel operators, respectively.

−1 −2 −1 −1 0 1
0 0 0 −2 0 2 (4.8)
1 2 1 −1 0 1

4.3.2 Image Segmentation using Matlab


The simplest algorithm for detecting an object is given below.

1) Read an image
2) Obtain the gradient
3) Dilate the image
4) Fill the interior gaps
5) Remove the connected objects on the border
6) Smoothen the object

Consider the original image shown in Fig. 4.8. There are two objects or cells present in
this image, but only one cell is completely visible. Our aim is to segment this cell. The
object to be segmented differs greatly in contrast from the background image. Changes
in contrast can be detected by operators that calculate the gradient of an image. The
gradient image can be obtained and a threshold is applied to create a binary mask

92
containing the segmented cell. The Matlab code for segmentation is given below. The
result of segmentation is shown in Fig. 4.8.

I = imread(’cell.tif’);
figure;subplot(2,3,1); imshow(I);
title(’Original image’);
[junk threshold] = edge(I, ’sobel’);
fudgeFactor = .5;
BWs = edge(I,’sobel’, threshold * fudgeFactor);
subplot(2,3,2); imshow(BWs); title(’Binary gradient mask’);
se90 = strel(’line’, 3, 90);
se0 = strel(’line’, 3, 0);
BWsdil = imdilate(BWs, [se90 se0]);
subplot(2,3,3); imshow(BWsdil); title(’Dilated gradient mask’);
BWdfill = imfill(BWsdil, ’holes’);
subplot(2,3,4); imshow(BWdfill);
title(’Binary image with filled holes’);
BWnobord = imclearborder(BWdfill, 4);
subplot(2,3,5); imshow(BWnobord); title(’Cleared border image’);
seD = strel(’diamond’,1);
BWfinal = imerode(BWnobord,seD);

4.4 Image Compression


Data or image compression refers to the process of reducing the amount of data re-
quired to represent a given quantity of information. If n1 and n2 denote the number
of information carrying units in two datasets that represent the same information, the
relative data redundancy rd of the first data set can be defined as
1
rd = 1 − (4.9)
cr
where cr , commonly called the compression ratio,is
n1
cr = (4.10)
n2
There are two types of compressions: Lossless (error-free) and lossy. In lossless com-
pression the reconstructed image is exactly same as the original image (i.e., pixels
values are same) whereas in the lossy compression the reconstructed image looks like
an original image on our perception but the pixels values are different. Cameras,
CD/DVD, and television broadcasting uses lossy compression techniques.

93
Original image Binary gradient mask Dilated gradient mask

Binary image with filled holes Cleared border image

Fig. 4.8: Image segmentation.

4.4.1 Image Compression using DCT


Consider an image shown in the left side of Fig. 4.9. The image is divided into blocks
and the two-dimensional DCT is computed for each block. The DCT coefficients are
then quantized, coded, and transmitted. The receiver decodes the quantized DCT
coefficients, computes the inverse two-dimensional DCT of each block, and then puts
the blocks back together into a single image. Although there is some loss of quality in
the reconstructed image, it is recognizable as an approximation of the original image
as shown in the right side of Fig. 4.9.

I = imread(’cameraman.tif’);
I = im2double(I);
T = dctmtx(8);
B = blkproc(I,[8 8],’P1*x*P2’,T,T’);
mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

94
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0];
B2 = blkproc(B,[8 8],’P1.*x’,mask);
I2 = blkproc(B2,[8 8],’P1*x*P2’,T’,T);
figure;subplot(1,2,1);
imshow(I);
title(’Original image’);
subplot(1,2,2); imshow(I2);title(’Reconstructed image’);

Original image Reconstructed image

Fig. 4.9: Image compression.

4.5 Basics of Color Image and Video


The gray level image uses 8-bits per pixel and the intensity value at a pixel varies
from 0 (black) to 255 (white). Color image uses 24 bits per pixel i.e., 8 bits for each
of the three components red, green and blue in the RGB color space. Hence the
total number of colors is 216 =16,777,216. R=255,G=0 and B=0 corresponds to pure
red color. R=0,G=0,B=0 is dark black and R=255, G=255, B=255 is pure white.
An example RGB color image and its three components are shown in Fig. 4.10. The
Matlab code for obtaining the red, green and blue components of the RGB color image
is given below.

x=imread(’x.ppm’);
subplot(2,2,1);
imshow(x)
title(’Color image’);

95
subplot(2,2,2);
imshow(x(:,:,1))
title(’Red component’);
subplot(2,2,3);
imshow(x(:,:,2))
title(’Green component’);
subplot(2,2,4);
imshow(x(:,:,3))
title(’Blue component’);

The color spaces such as YCrCb, YIQ and HSV are commonly used in image pro-
cessing applications. The cameras and monitors uses RGB color space and television
broadcasting is based on YCrCb (NTSC standard) and YIQ (PAL standard) color
spaces. Conversion from one color space to another can be obtained using a set of
equations. For example, YCrCb color space is obtained from the RGB color space
using Eq. 4.11. In YCrCb color space Y is same as the gray image, Cr and Cb are
the red and blue chrominance, respectively. The so called black and white TV (may
be called as gray level TV) uses only the Y component whereas the color TV uses Y,
Cr and Cb.


 Y = 0.299R + 0.587G + 0.114B

Cr = R − Y (4.11)


Cb = B − Y

The video can be defined as sequence of images. A video camera records 20 to 25


images within a second and plays it at the same rate. Two consecutive images in the
video are shown in Fig. 4.11. The image sequence in the video can be extracted using
a tool such as VirtualDub. The tool IrfanView can be used for basic image processing
operations. The television broadcast audio/video data and real time audio/video data
can be captured using a TV tuner card such as Pixelview and a web camera such as
Logitech Pro, respectively.

96
Color image Red component

Green component Blue component

Fig. 4.10: Color image and its components.

Image k Image k+1

Fig. 4.11: Two consecutive frames in the video.

97
Chapter 5

Normal Distribution and Bayes Theory

by
M. BALASUBRAMANIAN
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

5.1 Normal Distribution


Normal distribution or Gaussian distribution [16] is a continuous probability distri-
bution that describes data that cluster around a mean or average. The graph of the
associated probability density function is bell-shaped, with a peak at the mean, and is
known as the Gaussian function or bell curve. The variable x is distributed normally
with mean and variance.
It can be categories into

1. Univariate Density: It involves single variable (one dimension)


Example: Height of person
2. Multivariate Density: It involves multivariable (two or more dimensions)
Example: Height and weight of person

5.1.1 Univariate Density


Probability density function for univariate density is written as
"  2 #
1 1 x−µ
p (x) = √ exp − (5.1)
2πσ 2 σ

where µ is mean, σ 2 is variance


Mean (µ):
Mean
P
is the average of the given feature x and it is given by
x
µ= n
Example: Average of heights of the person

98
Variance (σ 2 ):
The spread
P
of the data can be measured using variance.
2
(x−µ)
σ2= n
Example: Measure of the variability of the heights of the person.

• The normal density is traditionally described as a bell-shaped curve and is


shown in Fig. 5.1.
• The normal distribution is symmetrical about mean (µ).
1
• Peak of the univariate normal distribution occurs at x=µ and its value is √2πσ
which are shown in Fig. 5.1.
• Width of the univariate normal distribution is proportional to standard
deviation (σ).

Fig. 5.1: Peak of the univariate normal distribution occurs at x = µ

Example for univariate density


Height (h) of Male (Adult) 165, 170, 160, 154, 175, 155, 167, 177, 158, 178
Training :
Fit an univariate normal distribution (Gaussian) for h as shown in Fig. 5.2.
P P
x h 1659
Mean (µ) = n
= n
= 10
= 165.9
P 2

Variance (σ 2 )= (x−µ)
P n 2 
= (h−µ)
n


= 1
10
(165 − 165.9)2 + (170 − 165.9)2 + · · · + (178 − 165.9)2

99
728.9

= 10
=72.89

Fig. 5.2: Univariate normal distribution

Test data :
i) Height (h) = 100
Find probability density function p(100)
P(100) = 5.208e-16 = 0
Result: The height 100 is not in the normal distribution, so the person is not an adult.

ii) Height (h) = 160


Find probability density function P(160)
P(160) = 0.0358
Result: The height 160 is in the normal distribution, so the person is an adult.

5.1.2 Matlab Code for Univariate Density


i) Training code
function [m,v] = uvtrain(H) % Train function
plot(min(H),0); % Initialize 2D plot
for i= min(H)-15:0.5:max(H)+15 % Loop varies height of person
m=mean(H); % Mean of height(m)
d=i-m; % (h-m)
v=var(H); % Variance of height
r=(d*d’)/v; % (h-m)(h-m)’/variance
p=1/(sqrt(2*pi*v))*exp(-0.5*r); % Univariate density function

100
hold on; % Use the same plot
plot(i,p,’*’); % 2D plot
grid % Print the grid point
end % End of for loop
xlabel(’height’); % x-label - height of person
ylabel(’p(x)’); % y-label - univariate probability

%Multivariate Training-Matlab Output


>> H = [165 170 160 154 175 155 167 177 158 178];
>> [m,v] = uvtrain(H)
m = 165.9000
v = 80.9889
% Output : Mean(m) and variance(v) of height of person.

ii) Testing Code


function [p] = uvtest(m,v,h) % Test function
d=h-m; % (h-m)
r=(d*d’)/v; % (h-m)(h-m)’/variance
p=1/(sqrt(2*pi*v))*exp(-0.5*r); % univariate density function
if p>0.00005 % threshold (t)=0.00005
disp ’The person is an adult’
else
disp ’The person is not an adult’
end

%Univariate Testing-Matlab Output


i) >> h = 100;
>> p = uvtest(m,v,h)
p = 1.0064e-013
The person is not an adult.

ii) >> h = 160;


>> p = uvtest(m,v,h)
p = 0.0358
The person is an adult.

5.1.3 Multivariate Density


The general multivariate normal density in d-dimensions is written as
 
1 1 t −1
p (x) = q exp − (x − µ) Σ (x − µ) (5.2)
d 2
(2π) |Σ|

where x is the d-component column vector,


µ is the d-component mean vector,

101
Σ is the d-by-d covariance matrix,
|Σ|-is determinant of covariance matrix,
Σ−1 is inverse of covariance matrix,
(x − µ)t is transpose of (x − µ).

Mean Vector (µ) and Covariance Matrix (Σ)


   
µ1 σ11 σ12 ··· σ1d
   
 µ2   σ21 σ22 ··· σ2d 
µ=  .. , Σ =  .. .. .. ,
   .. 
 .   . . . . 
µn σd1 σd2 ··· σdd
σij = σji =1n Σni,j=1 [(xi − µi ) (xj − µj )]

where σ11 is variance within x1 ,


σ12 is variance between x1 and x2 ,
..
.
σij is variance between xi and xj .

• Σ is symmetric and its diagonal elements are variances within x which can
never be negative.
• Off-diagonal elements are the covariances which can be +ve and -ve.

Statistically Dependent Variables:


The variables which are causally related are called statistically dependent variables.
Example: engine temperature and oil temperature.

Statistically Independent Variables:


The variables which are not causally related are called statistically independent vari-
ables.
Example: oil pressure in engine and air pressure in tire.
If the variables are statistically independent, the covariances are zero and covariance
matrix
 is diagonal.   
σ12 0 · · · 0 1/σ12 0 ··· 0
   
 0 σ22 · · · 0   0 1/σ 2
2 · · · 0 
Σ= . . . , Σ−1 =  . . . ,
 . . .. .   . . .. . 
 . . . .   . . . . 
2 2
0 0 · · · σd 0 0 · · · 1/σd

|Σ| = [σ12 × σ22 × · · · × σd2 ]

102
Multivariate Density (Bivariate density):

P (x1 , x2 ) is a hill shaped surface over the x1 , x2 plane. Peak of the bivariate normal
distribution occurs at the point (x1 , x2 ) = (µ1 , µ2 ) that is at the mean vector. The
shape of the hump depends on the two variances σ12 , σ22 and correlation coefficient(ρ)
by
σ12
ρ= (5.3)
σ1 σ2
where σ12 is variance between x1 and x2 .
σ1 is variance of x1 .
σ2 is variance of x2 .

Fig. 5.3: Bivariate normal distribution

Example for multivariate density (Bivariate density):

Table 5.1: Height and weight of males


Height (h) of males 165 170 160 154 175 155 167 177 158 178
Weight (w) of males 78 71 60 53 72 51 64 65 55 69

Training :

Fit a bivariate normal distribution (Gaussian) for h and w. Bivariate normal distri-
bution is shown in Fig. 5.3.

103
Mean Vector (µ)
" #
µ1
µ= ,
µ2
P
h 1659
µ1 = n
= 10
= 165.9,
P
w 638
µ2 = n
= 10
= 63.8,
" #
165.9
µ=
63.8
Covariance Matrix (Σ)
" #
σ11 σ12
Σ= ,
σ21 σ22
1
σ11 = 10 ((165 − 165.9)2 + (170 − 165.9)2 + · · · + (178 − 165.9)2)
= 72.89

1
σ12 = 10
((165
− 165.9)(78 − 63.8) + (170 − 165.9)(71 − 63.8) + · · · + (178 − 165.9)
(69 − 63.8))
= 52.78

1
σ21 = ((78
− 63.8)(165 − 165.9) + (71 − 63.8)(170 − 165.9) + · · · + (69 − 63.8)
10
(178 − 165.9))
= 52.78

1
σ22 = 10 ((78 − 63.8)2 + (71 − 63.8)2 + · · · + (69 − 63.8)2)
= 72.16
" #
72.89 52.78
=
52.78 72.16

Test data:
i) Height, weight= 75, 25
Find probability density function P(75, 25)
P(75,25) = 1.26e-29 = 0
Result: Height and weight are not in the bivariate normal distribution, so the person
is not an adult.

104
ii) Height, weight= 160, 60
Find probability density function P(160, 60)
P(160,60) = 0.0023
Result: Height and weight are in the bivariate normal distribution, so the person is an
adult.

5.1.4 Matlab Code for Multivariate Density


i) Training Code

function [m,cv]=mvtrain(X) % Train function


[r c]=size(X); % size of X
m=mean(X); % Mean of data(X)
cv=cov(X); % Variance of data(X)
% Guassian Plot only for 2-dimensional data
if (c==2)
plot3(min(X(:,1)),min(X(:,2)),0); % Initialize 3D plot
for i= min(X(:,1))-15:0.5:max(X(:,1))+15 % Loop varies height of person
for j= min(X(:,2))-15:0.5:max(X(:,2))+15 % Loop varies weight of person
d=[i j]- m; % (x-m)
r=d*inv(cv)*d’; % (x-m)cov(X)(x-m)t
p=1/(2*pi*sqrt(det(cv)))*exp(-0.5*r); % Multivariate density function
hold on; % Use the same plot
plot3(i,j,p); % 3D plot
end % end of height for loop
grid % Print the grid point
end % end of weight for loop
xlabel(’height’); % x-label - height of person
ylabel(’weight’); % y-label - weight of person
zlabel(’p(x)’); % z-label - multivariate density function
end

105
%Multivariate Training-Matlab Output
>> X=[165 78; 170 71; 160 60; 154 53; 175 72;
155 51; 167 64; 177 65; 158 55; 178 69];
>> [m,cv]=mvtrain(X)
m = 165.9000 63.8000

cv = 80.9889 58.6444
58.6444 80.1778

% (where m is mean vector of X, cv is variance of X)

ii) Testing Code

function [p]=mvtest(m,cv,hw) % Test function


d=hw-m; % (hw-m)
r=d*inv(cv)*d’; % (hw-m)cov(X)(hw-m)t
p=1/(2*pi*sqrt(det(cv)))*exp(-0.5*r); % multivariate density function
if p>0.000005 % threshold (t)=0.000005
disp ’The person is an adult’
else
disp ’The person is not an adult’
end

%Multivariate Testing-Matlab Output


i) >> hw = [75 25]
>> p = mvtest(m,cv,hw)
p = 1.2620e-029
The person is not an adult.

ii) >> hw = [160 60]

106
>> p = mvtest(m,cv,hw)
p = 0.0023
The person is an adult.

5.2 Bayes Theory


It says that

P (x|ωj ) P (ωj )
P (ωj |x) = (5.4)
P (x)
where P (x|ωj ) is probability of x given that ωj ,
P (ωj ) is probability of ωj ,
P (x) is probability of x,
ωj are categories (j = 1, 2, · · · , n),
x is the feature vector.

Bayes formula can be expressed in English by saying that

likelihood × P rior
P osterior = (5.5)
Evidence
We can find P (x|ωj ) using normal distribution

P2
P (x) = j=1 P (x|ωj ) P (ωj ) for two class problem

Bayes Decision Rule:

x belongs to ω1 if P (ω1 |x) > P (ω2 |x) otherwise x belongs to ω2


Example for Bayes Rule (To classify male and female with respect to height)

Table 5.2: Heights of males and females


Height (hm) of males 165 170 160 154 175 155 167 177 158 178
Height (hf ) of females 140 145 149 152 157 135 139 160 155 163

Training:

Fit a gaussian for hm and hf as shown in Fig. 5.4


µm =165.9
µf =155

107
Fig. 5.4: Gaussian distribution for hm and hf

Assume P (f emale)=0.5, P (male)=0.5

Test data: Height 136

P (hm|male)P (male)
P (male|hm) = P (h)

P (hf |f emale)P (f emale)


P (f emale|hf ) = P (h)

P (h) = P (hm|male) P (male) + P (hf |f emale) P (f emale)

P (f emale) = 0.5, P (male) = 0.5


P (h) = 0.0078
P (hm|male) = 1.7770e-004 = 0.0001777
P (hf |f emale) = 0.0153
P (male|hm) = 0.0115
P (f emale|hf ) = 0.9885
P (f emale|hf ) > P (male|hm), so the given height corresponds to a female

108
5.2.1 Matlab Code for Bayes Theory
i) Training Code:

function [m1,m2,v1,v2] = bttrain(hm,hf) % Train function


plot(min(hf),0); % Initialize 2D plot
for i= min(hm)-15:0.5:max(hm)+15 % loop varies height of male person
m1=mean(hm); % mean of male height(m1)
d1(1)=i-m1; % (hm-m1)
v1=var(hm); % variance of height of male person
r1=(d1*d1’)/v1; % (hm-m1)(hm-m1)’/variance
pm=1/(sqrt(2*pi*v1))*exp(-0.5*r1); % posterior probability p(hm|male)
hold on; % use the same plot
plot(i,pm,’*r’); % 2D plot
grid % print the grid point in the graph
end % end of for loop
for i= min(hf)-15:0.5:max(hf)+15 % loop varies height of female person
m2=mean(hf); % mean of female height(m2)
d2(1)=i-m2; % (hf-m2)
v2=var(hf); % variance of height of female person
r2=(d2*d2’)/v2; % (hf-m2)(hf-m2)’/variance
pf=1/(sqrt(2*pi*v2))*exp(-0.5*r2); % posterior probability p(nf|female)
hold on; % use the same plot
plot(i,pf,’*b’); % 2D plot
grid % print the grid point in the graph
end % end of for loop
xlabel(’height’); % x-axis - height (male & female))
ylabel(’p(x|wi)’); % y-axis - posterior probability

109
%Bayes Theory Training-Matlab Output
>> hm=[165 170 160 154 175 155 167 177 158 178]; % height of male
>> hf=[140 145 149 152 157 135 139 160 155 163]; % height of female
>> [m1,m2,v1,v2] = bttrain(hm,hf)
m1 = 165.9000
m2 = 149.5000
v1 = 80.9889
v2 = 90.7222

% [note: m1 and m2 are mean of male and female height of person, respectively]
% v1 and v2- variance of male and female height of person, respectively]

i) Testing Code:

function [pmh,pfh] = bttest(m1,m2,v1,v2,h) % Test function


% find the probability of p(hm|male)
d1(1)=h-m1; % (hm-m1)
r1=(d1*d1’)/v1; % (hm-m1)(hm-m1)’/variance
phm=1/(sqrt(2*pi*v1))*exp(-0.5*r1); % likelihood probabilty
% find the probability of p(hf|female)
d2(1)=h-m2; % (hm-m1)
r2=(d2*d2’)/v2; % (hm-m1)(hm-m1)’/variance
phf=1/(sqrt(2*pi*v2))*exp(-0.5*r2); % likelihood probabilty
% Bayes formula
pfemale=0.5;
pmale=0.5;
ph=phm*pmale+phf*pfemale;
pmh=(phm*pmale)/ph;
pfh=(phf*pfemale)/ph;

110
% Bayes decision rule
if pmh > pfh
disp ’The person is male’
else
disp ’The person is female’
end

%Bayes Theory Testing-Matlab Output


i) >> h = 136
>> [pmh,pfh] = bttest(m1,m2,v1,v2,h)
pmh = 0.0115
pfh = 0.9885
The person is female

ii) >> h = 174


>> [pmh,pfh] = bttest(m1,m2,v1,v2,h)
pmh = 0.9507
pfh = 0.0493
The person is male

111
Chapter 6

k-means Clustering
by
P. DHANALAKSHMI
Lecturer (Selection Grade),
Department of CSE, Annamalai University

6.1 Introduction
A cluster is a collection of objects which are similar between them and are dissimilar
to the objects belonging to other clusters. Clustering is an unsupervised learning
method which deals with finding a structure in a collection of unlabeled data. A loose
definition of clustering could be the process of organizing objects into groups whose
members are similar in some way.
k-means clustering [16] [17] is an algorithm to group objects based on attributes/fea-
tures into k number of groups where k is a positive integer. The grouping (clustering)
is done by minimizing the Euclidean distance between data and the corresponding
cluster centroid. Thus the purpose of k-means clustering is to cluster the data.

Fig. 6.1: Clustering

The k-means clustering algorithm is given in Table 6.1. Steps 2 to 3 are repeated
until the objects does not move groups for any two consecutive iterations. In some
cases, when the data set is large and distributed, the objects keep on changing groups

112
Table 6.1: k-means clustering algorithm

1. Initialize k centroids
2. Compute the distance between each feature vector (object) to the centroids
3. Assign the feature vector to the centroid whose distance is minimum
4. Re-calculate the centroids

Table 6.2: Training set


Object Attribute 1 Attribute 2
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4

for consecutive iterations. In such cases, the algorithm is terminated after reaching ’n’
number of iterations.

6.2 k-means Algorithm


The purpose of k-means algorithm is to cluster the data. k-means algorithm is one
of the simplest partition clustering method. For example, suppose we have 4 objects,
each consisting of 2 attributes in the training set as shown in Table 6.2. Each attribute
represents coordinate of the object. We also know before hand that these objects
belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to
determine which medicines belong to cluster 1 and which medicines belong to cluster
2.
Each medicine represents a point with two attributes (x,y) that we can represent
it as coordinate in the two dimensional feature space as shown in Fig. 6.2.

1. Initial value of centroids:


In the beginning we determine number of cluster k and we assume the centroid or
center of these clusters. We can take any random objects as the initial centroids
or the first k objects in sequence can also serve as the initial centroids. Suppose
we use medicine A and medicine B as the initial centroids. Let C1 and C2 denote
the coordinates of the centroids, then C1 = (1, 1) and C2 = (2, 1).
2. Objects−centroids distance (Iteration 1)

113
Fig. 6.2: Objects in the feature space

We calculate the distance between cluster centroid to each object using Eu-
clidean distance measure.
Euclidean distance:
Let x = (x1 , x2 , . . . xn ) y = (y1 , y2 , . . . yn ) be any two points in a n - dimensional
space. Then the Euclidean distance between x and y is denoted by kx − yk and
is given by
q
d= (x1 − y1 )2 + (x2 − y2 )2 + · · · (xn − yn )2 (6.1)
Then we have distance matrix at iteration 1 as:
" #
0 1 3.61 5
D1 =
1 0 2.83 4.24

Each column in the distance matrix symbolizes the object. The first row of the
distance matrix corresponds to the distance of each object to the first centroid
and the second row is the distance of each object to the second centroid. For
example,
q distance from medicine C = (4, 3) to the first centroid C1 = (1, 1) is
(4 − 1)2 + (3 − 1)2 = 3.61 , and its distance to the second centroid C2 = (2, 1)
q
is (4 − 2)2 + (3 − 1)2 = 2.83

3. Objects clustering (Iteration 1)


We assign each object to centroid based on the minimum distance. Thus,
medicine A is assigned to group 1, medicine B to group 2, medicine C to group

114
Fig. 6.3: Initial value of centroids

2 and medicine D to group 2. The element of Group matrix G is 1 if and only


if the object is assigned to that group
" ABCD#
1 0 0 0
G1 =
0 1 1 1

4. Re-calculate the centroids (Iteration 1)


Knowing the members of each group, now we compute the new centroid of each
group based on these new memberships. Group 1 has only one member thus the
centroid remains in C1 = (1,1). Group 2 now has three members, thus the cen-

troid is the average coordinate among the three members: C2 = 2+4+5
3
, 1+3+4
3

= 11 ,
3 3
8

5. Objects-centroids distances (Iteration 2)


The next step is to compute the distance of all objects to the new centroids.
We have distance matrix at iteration 2 is
" #
0 1 3.61 5
D2 =
3.14 2.36 0.47 1.89

6. Objects clustering (Iteration 2)


We assign each object based on the minimum distance. Based on the new dis-
tance matrix, we move the medicine B to group 1 while all the other objects

115
Fig. 6.4: Centroids in iteration 1

remain. The group matrix is :


" A B C D#
1 1 0 0
G2 =
0 0 1 1

7. Re-calculate the centroids (Iteration 2)


Now we repeat step 4 to calculate the new centroids. Group 1 and group 2 both
 
has two members, thus the new centroids are C1 = 1+2 2
, 1+1
2
= 3
2
, 1 and C2
4+5 3+4
 9 7

= 2 , 2 = 2, 2
8. Objects-centroids distances (Iteration 3)
Repeating
" step 2 again, we have# new distance matrix at iteration 3 as
0.5 0.5 3.20 4.61
D3 =
4.30 3.54 0.71 0.71

" ABCD#
1 2 4 5
1 1 3 4

9. Objects clustering (Iteration 3)


Again, we assign each object based on the minimum distance.
" #
1 1 0 0
G3 =
0 0 1 1

We obtain G3 = G2 . Comparing the grouping of last iteration and this iteration

116
Fig. 6.5: Centroids in iteration 2

Table 6.3: Final grouping


Object Attribute 1 Attribute 2 Group
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2

reveals that the objects does not move groups. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration is needed. We get the final
grouping as the results.

6.3 Matlab Code for k-means Clustering


function [] = Kmeans1(input2,m,n,k)
B=0.0;
iter=10;
flag=0;
count=0.0;
%initialising k-mean
for i=1:k
for j=1:n
B(i,j)=input2(i*(m/k),j);
end

117
end
disp(B);
for c=1:iter
for i=1:1
for j=1:m
dist=0.0;
for t=1:n
dist=DIST(B,input2’);
end
if(dist(i,j)<= dist(i+1,j))
flag(i,j)=1;
else
flag(i,j)=2;
end
end
end
for i=1:k
count(i)=0;
for j=1:n
sum(i,j)=0.0;
end
end
for i=1:m
for j=1:n
sum(flag(i),j)=sum(flag(i),j)+input2(i,j);
end
count(flag(i))=count(flag(i))+1;
end
%disp(flag);
for i=1:k
for j=1:n
B(i,j)=sum(i,j)/count(i);
end
end
disp(B);
end

>> Kmeans1(input2, m, n, k)
The vectors belonging to Cluster 1 are (1, 2)
The vectors belonging to Cluster 2 are (3, 4)
The centroids for cluster 1 is (1.5, 1)
The centroids for cluster 2 is (4.5, 3.5)

118
6.4 Expectation Maximization (E-M) Algorithm
E-M algorithm [16] [17] finds out maximum likelihood estimates of parameters in
probabilistic models. This algorithm iterates between the E- step and the M-step un-
til convergence.

1. Expectation: - This step computes an expectation of the likelihood assuming


parameters

2. Maximization:- This step computes maximum likelihood estimates of param-


eters by maximizing the expected likelihood found in E-step.

According to Bayes theorem

P (x|ωj ) P (ωj )
P (ωj |x) = (6.2)
P (x)
where P (x|ωj ) is probability of x given that ωj ,
P (ωj ) is probability of ωj ,
P (x) is probability of x,
ωj are categories (j = 1, 2, · · · , n),
x is the feature vector.

Bayes formula can be expressed as


likelihood × P rior
P osterior = (6.3)
Evidence
Estimation of likelihood becomes difficult when the dimensionality of the feature vector
x is large. We already know that P (x|ωj ) is a normal density with mean µ and covari-
ance Σ (parameters). Now the problem of estimating an unknown function P (x|ωj )
is simplified to the problem of estimating the unknown parameters µj and Σj

E-step: Assume a mixture of 2 Gaussians i.e k =" 2 and the# data are statistically
2
σ1 0
independent. Therefore the covariance matrix Σ = ,
0 σ22

119
 
µ1
 µ2 
 
Assume θ1 =  
 σ12 
σ22
Compute
 
 1 1 t −1
pij = p xi/µj , σj2 =q exp − (xi − µj ) Σ (xi − µj ) cj (6.4)
2
(2π)d |Σ|

where cj is the mixing probability

M-step:
Having known p′ij s, compute the parameter values

1X
cj = pij (6.5)
n i
1X
µj = pij xi (6.6)
n i
1X
σj2 = pij (xi − µj )2 (6.7)
n i
Iterate between E-step and the M-step.

6.5 Matlab Code for E-M Algorithm


//Implementation of E-M Algorithm for one dimensional data

function []=em1d(h,n,m)
% h - sample one dimensional data
% n - number of data points
% m - number of mixtures
% Initial Random numbers plays a major role in the EM Algorithm
% Initial Mean
for i=1:m
em(i)=random(’Normal’,mean(h),6)
ev(i)=0;
end
% Initial Variance
for j=1:m
for i=1:n
ev(j)=ev(j)+(h(i)-em(j))*(h(i)-em(j));
end

120
ev(j)=ev(j)/n;
end
ERROR=0.5;
ITERATION =0;
while ((ERROR>1e-50)&&(ITERATION<5000))
for l=1:m
tp(l)=0;
tmean(l)=0;
tvar(l)=0;
p(j)=0;
end
% Calculation of mean and variance using Gaussian density
for i=1:n
sp=0;
for j=1:m
p(j)=1/sqrt(2*pi*ev(j))*exp(-0.5*((h(i)-em(j))*(h(i)-em(j)))/ev(j));
sp=sp+p(j);
end
for j=1:m
tmean(j)=tmean(j)+p(j)/sp*h(i);
tvar(j)=tvar(j)+p(j)/sp*(h(i)-em(j))*(h(i)-em(j));
tp(j)=tp(j)+p(j)/sp;
end
end
% Error between means in two consecutive iterations
ITERATION=ITERATION +1
ERROR=0;
for j=1:m
ERROR=ERROR+(em(j)-tmean(j)/tp(j))*(em(j)-tmean(j)/tp(j));
end

ERROR
for j=1:m
em(j)=tmean(j)/tp(j)
ev(j)=tvar(j)/tp(j)
end
end

In k-means clustering, the objects are assigned to a particular cluster during every
iteration and hence it is called hard assignment. In E-M algorithm, the probability of
occurrence of the object to the various Gaussians in the mixture is evaluated and hence
it is called soft assignment. The iteration is repeated until the likelihood estimates of
two consecutive iterations do not vary much.

121
Chapter 7

Principle Components Analysis and Linear


Discriminant Analysis

by
M. KALAISELVIGEETHA
Lecturer (Selection Grade),
Department of CSE, Annamalai University.

7.1 Introduction
Development in the data collection and storage techniques in recent days have led to an
information overload. Researchers working on diversified domains such as engineering,
astronomy, biology, remote sensing, consumer transactions, etc., come across an huge
or high dimensional datasets which present many mathematical challenges to handle
due to curse of dimensionality [18]. As the dimensionality of the dataset increases, its
performance decreases as seen in Fig.7.1. Moreover, if the dimensionality of the input
space is higher, more feature vectors are needed for training. The major problem with
these high-dimensional data sets is that, in most cases, not all the measured data are
important for understanding the underlying phenomena of interest. Thus, dimension
reduction is necessary for effective analysis of high dimensional data sets.
Principal components analysis (PCA) and Linear discriminant analysis (LDA) are
well-known schemes for dimension reduction. PCA finds a set of most representative
projection vectors, and thus the projected samples preserves the most relevant infor-
mation about original dataset. LDA projects data onto a lower-dimensional vector
space such that the ratio of the between-class variance to the within-class variance is
maximized, thus achieving maximum discrimination.

7.2 Principal Components Analysis


Principal components analysis(PCA) is a useful statistical technique that has found
application in many fields such as video/audio classification, face recognition, image
compression etc., It is a simple method of extracting relevant information from high

122
Fig. 7.1: Curse of dimensionality

dimensional data sets. With minimal effort, PCA provides a road map for reducing
the complex data set to a lower dimensional dataset.
Principal component analysis (PCA) was first introduced by Pearson [19] in 1901
and later independently developed by Hotelling [20] in 1933, where the name principal
components first appears. In various fields, it is also known as the singular value
decomposition (SVD), the Karhunen-Loeve transform, the Hotelling transform, and
the empirical orthogonal function (EOF) method.
The goal of this discussion is to provide both an intuitive feel of PCA, and a
complete overview. The discussion begins with the mathematical concepts essential
for understanding how PCA works [16]. It gives the background knowledge on standard
deviation, covariance, eigenvalues and eigenvectors. This is added in the discussion
for making the PCA section more understandable, but can be skipped off, if these
concepts are already familiar.

7.2.1 Background Mathematics


This section tries to give the elementary mathematical skills required in the process of
PCA.

7.2.1.1 Standard Deviation

In statistics and probability, standard deviation is a measure of the variability of a


statistical data set, or a probability distribution. A low standard deviation indicates
that the data points tend to be very close to the mean, whereas high standard deviation
indicates that the data are spread out over a large range of values. Now, consider the
following example:
Example:
x = 1, 4, 2, 12, 15, 25, 67, 65, 6, 98
For the above dataset, the mean x̄ can be calculated using:

123
n
X
xi
i=1
x̄ = = 23 (7.1)
n

where, n is the total number of elements in the data set. But, obviously, the mean
does not give more information about the distribution of the data. For example, any
two data sets having different distribution may have same mean as indicated below:

a = [3, 1, 24, 12] and b = [11, 9, 7, 13]


Mean = 10

The standard deviation (SD) of a data set is a measure of distribution of this data.
SD can be defined as the average distance from the mean of the data set to a point.
v
uPn
u (xi − x̄)2
t
i=1
SD = (7.2)
n

Now, consider the datasets a and b,


Set a: SD = 9.08; Set b: SD = 2.236
As seen, data set a has more variance, when compared to b. The reason is that,
the spread of data in a from the mean is more. Consider another data set c =
[15, 15, 15, 15, 15]. Here, SD=0 since, the spread of data is alike and none of them
deviate from the mean.

7.2.1.2 Variance

The data spread can be measured using another measure called variance.
n
P
(xi − x̄)2
i=1
Variance = (7.3)
n

Thus, SD is square root of variance.

7.2.1.3 Covariance

As discussed in the previous sections, standard deviation and variance can be applied
for only one dimensional data sets. For example, the marks of students in a class or
the salary of an employee and so on. But, the major goal of statistical analysis is
to look for any underlying relationship between the data and their dimensions. For
example, the number of classes attended by a student and his attendance percentage,

124
the overall experience of an employee in years and his salary etc., In statistical analysis,
it is needed to analyse whether a student who has attended more classes is getting more
attendance percentage or not.
Standard deviation and variance operate on one dimensional data which is inde-
pendent of other dimension. But, it is normally advantageous and interesting to have
a measure to see how two different dimensions are related to each other. Covariance
(σ) is such a measure for 2-dimensional data (x, y) and is given by:
n
P
(xi − x̄)2 (yi − ȳ)2
i=1
cov(x,y) = (7.4)
n
where, x̄ and ȳ are mean of x and y respectively, and n is the total number of
elements in the dataset.

7.2.1.4 Covariance Matrix

As discussed in Section 7.2.1.3, covariance (σ) is measured for two or more dimensional
data. Thus, if the data is three dimensional, the covariance matrix has 3 × 3 elements.
Therefore, for a n dimensional data, the covariance matrix has n × n elements.

An interesting point to note here is that, if the obtained value in a covariance ma-
trix is positive, both the dimensions increase together. For example, as the experience
of an employee increases, his salary also increases. If it is negative, this shows that,
as one dimension increases, other decreases. Finally, if the covariance is zero, the two
dimensions are independent of each other.

For two dimensional data the covariance matrix can be given as follows:

!
cov(x, x) cov(x, y)
cov(X) =
cov(x, y) cov(y, y)

where, cov(x, x) and cov(y, y) is the variance between x and x, y and y respectively.
cov(x, y) is the the variance between x and y. X is a vector which contains two ele-
ments x and y.

Example: Consider the following 2-dimensional dataset.

125
x y (x-x̄) (y-ȳ)
4 7 0.5 -0.5
3 6 –0.5 -1.5
2 8 -1.5 0.5
5 9 1.5 1.5
sum = 14 30
mean = 3.5 7.5

As discussed in Section 7.2.1.3, cov(x,y) can be calculated as follows:


n
X
(xi − x̄)2
i=1
cov(x,x) = (7.5)
n

Similarly,
n
P
(xi − x̄) (yi − ȳ)
i=1
cov(x,y) = (7.6)
n
Thus, for the given 2-dimensional data set, the calculated covariance σ is given below

! !
cov(x, x) cov(x, y) 1.25 0.5
cov(X) = =
cov(x, y) cov(y, y) 0.5 1.25

It is seen that, along the diagonal, the covariance is calculated between the same
dimensions (ie., between x-x and y-y). The matrix is symmetrical about the diagonal.
Note that, since the data is 2-dimensional, the obtained covariance matrix is (2 × 2).
Obviously, for n-dimensional data, the size of the covariance matrix will be (n × n).
In Matlab, covariance of this data can be found using

X = [4 7; 3 6; 2 8; 5 9]
covariance = cov(X) ∗ 3/4

The multiplication factor (3/4) is used because, Matlab uses (n − 1) in eqn.(7.6).

Exercise

1. Obtain the covariance matrix for the following 2 dimensional dataset.

126
Items 1 2 3 4 5
x 10 42 26 22 12
y 46 15 19 24 48

2. Obtain the covariance matrix for the following 3 dimensional dataset.

Items 1 2 3
x 22 32 12
y 16 11 29
z 27 18 46

7.2.2 Matrices

This section provide the essential basics of matrices required for PCA, especially about
eigenvalues and eigenvectors of a given matrix. Again, the discussion starts with the
assumption of basic knowledge on matrices.

7.2.2.1 Eigenvectors and Eigenvalues

Before starting the discussion on how to find the eigenvectors of a matrix, it is impor-
tant to note that, eigenvectors can only be found on square matrices. And, not every
square matrix has eigenvectors. Further, for a given n × n matrix, there are n eigen-
values and n eigenvectors. Thus, a 3 × 3 matrix has 3 eigenvalues and 3 eigenvectors.

Another interesting property is that, if an eigenvector is scaled by some factor,


the result will be the multiple of the eigenvector by the factor. The reason is that, if
the vector is scaled by some amount, it will either become smaller or larger based on
the scaling factor and its direction will not be changed. Finally, all the eigenvectors of
a matrix are perpendicular/orthogonal to each other, means that, they meet at right
angles irrespective of their dimensions.

Example

Consider the matrix A.


!
5 2
A=
2 5

Eigenvalues and eigenvectors of A can be calculated as follows:

127
(A - λ I) = 0 (7.7)

where, λ is the eigen values, I is the identity matrix.

Thus, eqn.(7.7) gives,

! !
5 2 λ 0
− =0
2 5 0 λ

!
(5 − λ) 2
=0
2 (5 − λ)

Solving for eigenvalues (λ);

λ1 = 3 and λ2 = 7

Now, for calculating the eigenvectors for the given matrix A,

(A − λI)X = 0 (7.8)
where, λ is the eigen values, I is the identity matrix and X is the eigenvector. For λ1
= 3,

(A − 3I)X = 0

" ! !# !
5 2 3 0 x1
− =0
2 5 0 3 x2
Solving for eigenvectors, !
−0.7071
x1 =
0.7071

Similarly for λ2 = 7,

(A − 7I)X = 0

128
" ! !# !
5 2 7 0 x1
− =0
2 5 0 7 x2
Solving for eigenvectors, !
0.7071
x2 =
0.7071

In Matlab, eigenvalues and eigenvectors are obtained for this data using

A = [5 2; 2 5]
[v d] = eig(A)

Exercise Find the eigenvalues and eigenvectors for the given matrix.
 
1 −3 2
 
3 0 −4
−2 −1 −2

7.2.3 Principal Components Analysis - Algorithm

PCA is a useful statistical procedure that has found importance in many fields, and
is a well-known technique for finding patterns in high dimensional data. It is a way
of identifying patterns in data, and expressing the data in such a way to highlight
their similarities and differences. The other main advantage of PCA is that once these
patterns are found, the data can be compressed by reducing the number of dimensions,
without much loss of information. Thus, PCA ‘combines’ the essence of attributes by
creating an alternative, smaller set of variables. The initial data can then be projected
onto this smaller set as shown in Fig. 7.2.

This section explain the steps to perform principal components analysis on a set of
data. The discussion tries to explain what is happening at each step so that anyone can
make decisions on their own while using this dimension reduction method themselves.

7.2.3.1 Approach

1. Dataset

129
Fig. 7.2: Orthogonal principal eigenvectors

Table 7.1: PCA original data


Items 1 2 3 4 5 6 7 8 9 10
x 25 50 24 45 32 27 4 2 12 55
y 21 71 29 64 30 48 17 12 18 59

The discussion on PCA approach starts with the example data as shown in the
Table 7.1. The plot of the data is shown in Fig. 7.3.
2. Subtract the mean from the data items
The next step is to subtract the mean from each of the data item. This is the
average across each dimension. Thus, all the data items in each dimension will
have its mean subtracted (the mean (x̄) subtracted from all x values and mean
(ȳ) subtracted from all y values). This new dataset will thus have a mean whose
value is ’zero’. The mean subtracted data of the dataset shown in Table 7.1 is
given in Table 7.2.

Table 7.2: PCA mean subtracted data


Items 1 2 3 4 5 6 7 8 9 10
x -2.6 22.4 -3.6 17.4 4.4 -0.6 -23.6 -25.6 -15.6 27.4
y -15.9 34.1 -7.9 27.1 -6.9 11.1 -19.9 -24.9 -18.9 22.1

3. Calculate covariance matrix


Covariance matrix is computed for the data in Table 7.2, as discussed in Section
7.2.1.4. The data set is 2-dimensional and hence the covariance matrix will be
(2 × 2). For the dataset under consideration, the covariance matrix is:

130
PCA original data
100

80

60
Y
40

20

0
0 20 40 60 80 100
X
Fig. 7.3: Plot of PCA original data

!
305.04 327.56
327.56 424.49
Hint: (n − 1) is used in eqn.(7.6).
4. Calculate eigenvectors and eigenvalues of the covariance marix Re-
call from Section 7.2.2.1 that, eigenvectors can be calculated only for square
matrices. Since the covariance matrix is a square matrix, eigenvectors can be
calculated. A plot of the mean subtracted data with eigen vectors is shown
in Fig. 7.4. The eigen vectors are shown in blue color. The eigenvalues and
eigenvectors are:
!
31.8046 0
eigenvalues =
0 697.7254
!
−0.7679 0.6406
eigenvectors =
0.6406 0.7679

5. Sort the eigen values


In this step, the idea of dimension reduction comes into effect. Generally, once
the eigen values and eigen vectors are computed, the eigen values are sorted
in descending order, which gives the principal components in the order of sig-
nificance. Then, the less significant components can be ignored. Because of
this, some information loss can be found, but, if the eigen values are small, the

131
PCA mean subtracted data with eigen vectors
40
30
20
10
0
Y

-10
-20
-30
-40
-40 -30 -20 -10 0 10 20 30 40
X
Fig. 7.4: Plot of PCA mean subtracted data with eigen vectors

information loss will not be much significant. Thus, the final data will have less
dimension than the original data set.
Obviously, if the original dataset is of n dimensions, this give n eigenvalues and
eigenvectors. If only k eigenvalues are chosen in the procedure, the final data
has only k dimension, where, k ≤ n.
Considering the eigen values obtained for the example data shown in Table 7.1,
they are different values. Of course, the eigen vector of the largest eigenvalue,
697.7254 turns out to be the first principal component of the dataset under
consideration.
!
31.8046 0
eigenvalues =
0 697.7254
Thus, for the largest eigen value 697.7254 of the example dataset, the principal
components (eigenvector) is
!
0.6406
eigenvector =
0.7679

6. Compute the new projected data set


After choosing the set of principal components (eigenvectors), the final data set
of reduced dimension is obtained as follows:

132
F inal data = Eigenvector T × Mean subtracted data (7.9)

where, Eigenvector T is the matrix with the eigenvectors in the columns trans-
posed, Final data is the final data set with data items in columns and dimension
along rows.

Table 7.3: PCA projected data with 2 eigenvectors


Items x y
1 -13.8752 -8.1883
2 40.5342 4.6417
3 -8.3725 -2.2959
4 31.9561 3.9974
5 -2.4801 -7.7986
6 8.1395 7.5709
7 -30.3986 5.3756
8 -35.5192 3.7087
9 -24.5062 -0.1271
10 34.5221 -6.8845

Suppose, if it is decided to keep both the principal components (eigenvectors),


then the dimension of the F inal data will be the same as that of the dimension
of the original data set. In the example data set, which is of two-dimension, if
both the principal components are kept for obtaining the F inal data, then this
F inal data will also be of same two-dimension. The PCA projected data set,
keeping both the dimensions is shown in Table 7.3. But, if it is decided to leave
one component, then the obtained F inal data will be of reduced dimension, i.e.,
one-dimension.
F inal data actually represent the original data purely in terms of the eigen
vectors chosen. The original data set is of two dimension, and hence had two
axes x and y. And, any two dimensional data can be represented in terms of any
two axes. If these new axes are perpendicular, it will be efficient to represent.
The eigen vectors are perpendicular to each other. And hence, the original data
set can be represented in terms of the eigen vectors.
If it is decided to keep both the eigenvectors, then the final transformed data
is the original data, represented in terms of both the eigenvectors. Obviously,

133
eigenvectors are the axes. The plot of the final transformed data keeping both
the eigenvectors is shown in Fig. 7.5.

PCA projected data with 2 eigenvectors

40

20

0
Y

-20

-40

-40 -20 0 20 40
X
Fig. 7.5: Plot of PCA projected data with 2 eigen vectors

7. Deriving the old data back Deriving the original data back is essential in
PCA dimensionality reduction procedure. Before discussing this, it is important
to note that, only if all the eigenvectors are taken for PCA projection, then only
it is possible to get exactly the same old original data back. If some of the
eigenvectors were left out for dimension reduction, then the retrieved old data
will lose some information.
Rewriting from eqn.(7.9),

Mean subtracted data = Eigenvector −1 × F inal data (7.10)


where, Eigenvector −1 is the inverse of Eigenvector. If in the PCA projection
procedure, all the eigenvectors were considered, then the inverse of the eigen-
vector is equal to the transpose of the eigenvector. Thus, the equation becomes,

Mean subtracted data = Eigenvector T × F inal data (7.11)


But, for completeness, it is needed to add the mean of the original data (earlier,
the mean was subtracted in one of the steps). Hence, the equation becomes,

Mean subtracted data = (Eigenvector T × F inal data) + Mean (7.12)

134
Table 7.4: PCA reconstructed data
Items x y
1 25.000002 21.000011
2 49.999984 70.999976
3 24.000003 29.000006
4 44.999988 63.999981
5 31.999997 30.000005
6 27.000000 47.999992
7 4.000017 17.000014
8 2.000018 12.000018
9 12.000011 18.000013
10 54.999981 58.999984

The reconstructed data is shown in Table 7.4.

Implementation using Matlab:

Computes eigen value and eigenvectors and covariance matrix


function []= eigenanalysis ()

f i p 1 = f o pen ( ’ pca2 . t x t ’ , ’ r ’ ) ;
f i p 2 = f o pen ( ’ mean . dat ’ , ’ w ’ ) ;
f i p 3 = f o pen ( ’ e i g e n v e c t o r . dat ’ , ’ w ’ ) ;

c=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ; % c i s t he number o f f e a t u r e v e c t o r s
r=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ; % r i s t he dimensio n o f each f e a t u r e v e c t o r

%−−−−−−Reading T r a i n i n g data−−−−−−
f o r i =1: c
f o r j =1: r
k=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ;
img ( i , j )=k ;
end
end
% −−−F ea t ur e v e c t o r s a r e a r r a ng ed i n column−w i s e−−−−

135
img
img=img ’ ;
%−−−−−Computing Mean v e c t o r−−−−−−
f o r i =1: r
mu( i )=0;

f o r j =1: c
mu( i )=mu( i )+img ( i , j ) ;
end
mu( i )=mu( i ) / c
end
%−−−−Computing Co va r ia nce ma t r ix−−−−−−−
cov =0;
f o r i =1: c
cov=cov+(img ( : , i )−mu ’ ) ∗ ( ( img ( : , i )−mu ’ ) ’ ) ;
end
cov=cov / c ;
cov
%−−−−−−Computing Eigen v a l u e and Eigen v e c t o r−−−−−−
[ v d]= e i g ( cov )
% −−−−Rearrange e i g e n v e c t o r s b e c a u s e Matlab g i v e s e i g e n v a l u e s−−
%−−−−−i n a s c e n d i n g o r d e r−−−−−
f o r i =1: r
f o r j =1: r
w( j , r−i +1)=v ( j , i ) ;
end
end
%−−−−Wr it ing mean v e c t o r−−−−−−−−−
f p r i n t f ( f i p 2 , ’ % f ’ , mu ( : ) ) ;
%Wr it ing e i g e n v e c t o r s
f o r i =1: r
f o r j =1: r
f p r i n t f ( f i p 3 , ’ % f ’ , w( i , j ) ) ;
end
fprintf ( fip3 , ’ \ n ’ ) ;
end
fclose ( fip1 );

136
fclose ( fip2 );
fclose ( fip3 );

PCA Projection and Reconstruction


function []= eigenprojection ()

f i p 1 = f o pen ( ’ t e s t . dat ’ , ’ r ’ ) ;
f i p 2 = f o pen ( ’ mean . dat ’ , ’ r ’ ) ;
f i p 3 = f o pen ( ’ e i g e n v e c t o r . dat ’ , ’ r ’ ) ;
f i p 4 = f o pen ( ’ p r o j e c t i o n . dat ’ , ’ w ’ ) ;
f i p 5 = f o pen ( ’ r e c o n s t r u c t i o n . dat ’ , ’ w ’ ) ;
c=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ;
r=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ;
%−−−−−Reading t e s t data−−−−−
f o r i =1: c
f o r j =1: r
k=f s c a n f ( f i p 1 , ’ % f ’ , 1 ) ;
t ( i , j )=k ;
end
end

t=t ’ ;
%−−−−−Reading mean v e c t o r−−−−−−−
f o r i =1: r
k=f s c a n f ( f i p 2 , ’ % f ’ , 1 ) ;
mu( i )=k ;
end
%−−−−−−Reading e i g e n v e c t o r−−−−−−
f o r i =1: r
f o r j =1: r
k=f s c a n f ( f i p 3 , ’ % f ’ , 1 ) ;
w( i , j )=k ;
end
end
%−−−−−−Computing P r o j e c t i o n−−−−−−−−
p r o j ( r , c )=0;
f o r i =1: c

137
p r o j ( : , i )=w’ ∗ ( t ( : , i )−mu ’ ) ;
f p r i n t f ( fip4 , ’% f ’ , proj ( : , i ) ) ;
fprintf ( fip4 , ’ \ n ’ ) ;
end
%−−−−−R e c o n s t r u c t i o n o r i g i n a l data−−−−−−−
r e c o n=w∗ p r o j ;
f o r i =1: c
r e c o n ( : , i )= r e c o n ( : , i )+mu ’ ;
f p r i n t f ( f i p 5 , ’% f ’ , recon ( : , i ) ) ;
fprintf ( fip5 , ’ \ n ’ ) ;
end
fclose ( fip1 );
fclose ( fip2 );
fclose ( fip3 );
fclose ( fip4 );
fclose ( fip5 );

7.3 Linear Discriminant Analysis

Linear Discriminant Analysis handles the data set when the within-class frequency is
unequal. The major objective of LDA [16] is to perform dimensionality reduction while
preserving as much of the class discriminatory information as possible. It aims to find
directions along which the classes are best separated. It is commonly used for data
classification and dimensionality reduction. LDA maximises the ratio of between-class
scatter to the within-class scatter in any particular data set thus generating maximal
discrimination.

7.3.1 Linear Discriminant Analysis - Algorithm

1. Dataset
The discussion begins with the example data sets/classes X1 and X2 .
! !
9 2 7 3 2 4 5 1
X1 = X2 =
6 8 4 1 6 9 4 2

138
2. Compute Mean
Compute the mean of X1 and X2 . Let µ1 and µ2 be the mean of X1 and X2
and µ be the mean of entire data, obtained by merging X1 and X2 .
! ! !
5.25 3.0 4.125
µ1 = µ2 = µ=
4.75 5.25 5.0

3. Compute Optimizing Criterion


In LDA, the criteria for class separability is formulated using within-class scat-
ter (Sw ) and between-class scatter (Sb ). Within-class scatter is the expected
covariance of each of the classes and is given by,

X
Sw = covi (7.13)
i

where, covi is the covariance of ith data set/class. Covariance is calculated using

X
covi = (xj − µi) (xj − µi)T (7.14)
xj ∈xi

Thus, for a two-class problem,

Sw = cov1 + cov2 (7.15)


where, cov1 and cov2 are the covariance of 1st and 2nd classes respectively. For
the example data sets, Sw is,
!
42.75 8.25
Sw =
8.25 53.5

The between-class scatter is computed using,


X
Sb = (µi − µ)(µi − µ)T (7.16)
i

For the example data set, the between-class scatter Sb is,


!
2.5313 −0.5625
Sb =
−0.5625 0.1250

Notice that, Sb is the covariance of data set whose members are the mean vectors
of each class. As stated earlier, the optimizing criterion in LDA is the ratio of
between-class scatter to the within-class scatter. By maximizing this criterion

139
gives a solution which defines the axes of the transformed space. The optimizing
criterion is computed as,

C = (Sw )−1 Sb (7.17)


Optimizing criterion for the example data set is,
!
0.0631 −0.0140
C=
−0.0202 0.0045

4. Compute Eigen Vector


By definition, an eigen vector of a transformation represents a one dimensional
subspace, which is invariant. Also, the eigen vectors whose eigenvalues are non-
zero are linearly independent and are invariant under the transformation. Thus,
any vector space can be represented in terms of linear combinations of the eigen
vectors. For the example data set, the eigen value is computed as,
!
0.0676 0
eigenvalue =
0 0

Eigen value having a value ’zero’ indicates the linear dependency. Thus, eigen
vectors corresponding to non-zero eigen values only are considered, while com-
puting the non-redundant set of transformed data set. Eigen values and eigen
vectors are calculated as discussed in Section. 7.2.2.1. The eigen vector is
calculated as,
!
0.9522 0.2169
eigenvector =
−0.3055 0.9762

The eigen vector corresponding to non-zero eigenvalue, 0.0676 is


!
0.9522
−0.3055

5. Transformation
The idea of dimensionality reduction comes into picture in this step. For any C-
class problem there will be always C −1 non-zero eigen values. The eigen vectors
corresponding to non-zero eigen values can be used for the transformation.

Transformed set = V T Xi (7.18)

140
where, V T is the eigen vector and Xi is the mean subtracted ith data set/class.
For the example data set, the eigen vector corresponding to the eigen value,
0.0676 is,
!
0.9522
−0.3055

Using this eigen vector in Eqn.(7.18), the transformed data of the data sets x1
and x2 are,
 
Tx1 = 3.1890 −4.0874 1.8955 −0.9970
 
Tx2 = −1.1813 −0.1932 2.2862 −0.9117

Implementation using Matlab:

LDA Projection and reconstruction

% Two c l a s s data ( x1 and x2 ) LDA p r o j e c t i o n and r e c o n s t r u c t i o n

f u n c t i o n [ rx1 , rx2 ]= l d a ( x1 , x2 )
[ r 1 c1 ]= s i z e ( x1 ) ; % r 1 , r 2 ( r 1==r 2 ) dimensio n o f f e a t u r e v e c t o r
[ r 2 c2 ]= s i z e ( x2 ) ; % c1 − no . o f p o i n t s i n x1 , c2−no . o f p o i n t s i n x2
%−−−−−C a l c u l a t i o n o f c l a s s mean and g l o b a l mean−−−−−−−−−−−
f o r i =1: r 1
mu1( i )=0;
mu( i )=0;
f o r j =1: c1
mu1( i )=mu1( i )+x1 ( i , j ) ;
mu( i )=mu( i )+x1 ( i , j ) ;
end
mu1( i )=mu1( i ) / c1 ;
end

f o r i =1: r 2

141
mu2( i )=0;
f o r j =1: c2
mu2( i )=mu2( i )+x2 ( i , j ) ;
mu( i )=mu( i )+x2 ( i , j ) ;
end
mu2( i )=mu2( i ) / c2 ;
mu( i )=mu( i ) / ( c1+c2 ) ;
end
%−−−−−−−−− with−i n c l a s s s c a t t e r ma t r ix−−−−−−−−−−−−−
sw=0;
f o r i =1: c1
sw=sw+(x1 ( : , i )−mu1 ’ ) ∗ ( ( x1 ( : , i )−mu1 ’ ) ’ ) ;
end

f o r i =1: c2
sw=sw+(x2 ( : , i )−mu2 ’ ) ∗ ( ( x2 ( : , i )−mu2 ’ ) ’ ) ;
end
%−−−−−−−−− between−c l a s s s c a t t e r ma t r ix−−−−−−−−−−
sb =0;
sb=(mu1−mu) ’ ∗ ( mu1−mu)+(mu2−mu) ’ ∗ ( mu2−mu ) ;
%−−−−−−−−c o v a r i a n c e ma t r ix−−−−−−−−
cov=i n v ( sw )∗ sb
%−−−−−−Eigen v a l u e s and v e c t o r s −−−−−−−
[ v d]= e i g ( cov ) ;
%−−−−−−−P r o j e c t i o n and r e c o n s t r u c t i o n o f x1 −−−−−−−−
f o r i =1: c1
px1 ( : , i )=v ’ ∗ ( x1 ( : , i )−mu1 ’ ) ;
end

f o r i =1: c1
rx1 ( : , i )=v∗ px1 ( : , i )+mu1 ’ ;
end
%−−−−−P r o j e c t i o n and r e c o n s t r u c t i o n o f x2 −−−−−−−−
f o r i =1: c2
px2 ( : , i )=v ’ ∗ ( x2 ( : , i )−mu2 ’ ) ;
end

142
f o r i =1: c2
rx2 ( : , i )=v∗ px2 ( : , i )+mu2 ’ ;
end

143
Chapter 8

Radial Basis Function Neural Network

by
A. GEETHA
Lecturer (Selection Grade),
Department of CSE, Annamalai University.

8.1 Introduction

Radial basis function neural network (RBFNN) [21] is a type of artificial neural network
(ANN). It has a feedforward architecture with an input layer, a hidden layer, and an
output layer. It is applied to the problems of supervised learning and associated
with radial basis functions. RBFNN trains faster than multi-layer perceptron. It can
be applied to the fields such as control engineering, time-series prediction, electronic
device parameter modeling, speech recognition, image restoration, motion estimation,
data fusion etc.

8.2 Architecture of Radial Basis Function Neural


Network

The architecture of RBFNN is shown in Fig 8.1 [16] [17]. Radial basis functions are
embedded into a two-layer feed forward neural network. Such a network is character-
ized by a set of inputs and a set of outputs. In between the inputs and outputs there
is a layer of processing units called hidden units. Each of them implements a radial
basis function. The input layer of this network has ni units for a ni dimensional input
vector. The input units are fully connected to the nh hidden layer units, which are in
turn fully connected to the nc output layer units, where nc is the number of output

144
classes. The activation functions of the hidden layer were chosen to be Gaussians,
and are characterized by their mean vectors (centers) µi, and covariance matrices Ci ,

Fig. 8.1: Radial basis function neural network.

i = 1, 2, · · · , nh . For simplicity, it is assumed that the covariance matrices are of the


form Ci = σi2 I, i = 1, 2, · · · , nh . Then the activation function of the ith hidden unit
for an input vector xj is given by
!
− kxj − µik2
gi(xj ) = exp (8.1)
2σi2

The µi and σi2 are calculated by using suitable clustering algorithm. Here the k-means
clustering algorithm is employed to determine the centers. The algorithm is composed
of the following steps:
1. Randomly initialize the samples to k means (clusters), µ1 , · · · , µk
2. Classify n samples according to nearest µk.
3. Recompute µk .
4. Repeat the steps 2 and 3 until no change in µk.
The number of activation functions in the network and their spread influence the
smoothness of the mapping. The assumption σi2 = σ 2 is made and σ 2 is given in (8.2)
to ensure that the activation functions are not too peaked or too flat.

ηd2
σ2 = (8.2)
2
In the above equation d is the maximum distance between the chosen centers, and
η is an empirical scale factor which serves to control the smoothness of the mapping

145
function. Therefore, the above equation is written as
!
− kxj − µik2
gi(xj ) = exp (8.3)
ηd2

The hidden layer units are fully connected to the nc output layer units through weights
wik . The output units are linear, and the response of the k th output unit for an input
xj is given by
i=0
X
yk (xj ) = wik gi (xj ), k = 1, 2, · · · , nc (8.4)
nh

where g0 (xj ) = 1. Given nt feature vectors from nc classes, training the RBFNN
involves computing µi , i = 1, 2, · · · , nh , η, d 2 , and wik , i = 0, 1, 2, · · · , nh , k =
1, 2, · · · , nc .

8.3 Training of RBFNN

The training procedure is given below:


Determination of µi and d 2 : Conventionally, the unsupervised k-means clustering
algorithm, can be applied to find nh clusters from nt training vectors. However, the
training vectors of a class may not fall into a single cluster. In order to obtain clusters
only according to class, the k-means clustering may be used in a supervised manner.
Training feature vectors belonging to the same class are clustered to nh /nc clusters
using the k-means clustering algorithm. This is repeated for each class yielding nh
cluster for nc classes. These cluster means are used as the centers µi of the Gaussian
activation functions in the RBFNN. The parameter d was then computed by finding
the maximum distance between nh cluster means.
Determining the weights wik between the hidden and output layer: Given
that the Gaussian function centers and widths are computed from nt training vectors,
(8.4) may be written in matrix form as

Y = GW (8.5)

where Y is a nt × nc matrix with elements Yij = yj (xi), G is a nt × (nh + 1) matrix


with elements Gij = gj (xi), and W is a (nh + 1) × nc matrix of unknown weights. W
is obtained from the standard least squares solution as given by

W = (GT G)−1 GT Y (8.6)

146
To solve W from (8.6), G is completely specified by the clustering results, and the
elements of Y are specified as
(
1, if xi ∈ class j,
Yij = (8.7)
0, otherwise

8.4 Matlab Code for RBFNN

The Matlab files for training and testing RBFNN are given below.

%---------------rbftrain.m--------------------%

%function[weightmatrix]=rbftrain11(traindata,means,no of training vectors,dimension


function[wt]=rbftrain(fname1,fname2,preoutput,m,n,k)
%training data
A=load(’rbftrain.dat’);
%mean values of the training data obtained using k-means algorithm
b=load(’means.dat’);
%preoutput
pr=load(’preoutput.dat’);
eeta=0.25;
dist=0.0;
max=0.0;
dif=0.0;
mat1=zeros(m,n);
dif1=0.0;
%finding maximum distance between clusters
for j=1:k
for i=1:k
dist=0.0;
for l=1:n

147
dist=dist+sqrt((b(i)-b(j))*(b(i)-b(j)));
end
distmax(i)=dist;
if (distmax(i) >= max)
max=distmax(i);
end
end
end
%finding G matrix
for i=1:m
for j=1:k;
dif1=0.0;
for t=1:n
dif1=dif1+(A(i)-b(j))*(A(i)-b(j));
end
dif(j)=exp((-dif1)/(eeta*max*max));
mat1(i,j)=dif(j);
end
end
%adding bias value
for i=1:m
mat11(i,1)=1;
for j=1:k
mat11(i,j+1)=mat1(i,j);
end
end
%disp(mat11);
%transpose matrix
mat2=mat11’;
% disp(mat2);
%multiplication of G matrix and its transpose
mat=mat2*mat11;

148
%disp(mat);
%inverse matrix
inver=inv(mat);
%disp(inver);
%resultmatrix=inverse matrix * transpose of G matrix
res=inver*mat2;
%disp(res);
%weight matrix= resultmatrix * preoutput
wt=res*pr;
%postoutput = G matrix * weight matrix
postoutput=mat11*wt;
%disp(postoutput);
return;

%------------------rbftest.m----------------------------%
%function[class]=rbftest(testdata,means,weight matrix,no of test vectors,dimension,
function[y]=rbftest(fname1,fname2,wt,m,n,k)
A1=load(’rbftest.dat’);
b=load(’means.dat’);
eeta=0.25;
dist=0.0;
max=0.0;
dif=0.0;
dif1=0.0;
%finding maximum distance between clusters
for j=1:k
for i=1:k
dist=0.0;
for l=1:n
dist=dist+sqrt((b(i)-b(j))*(b(i)-b(j)));

149
end
distmax(i)=dist;
if (distmax(i) >= max)
max=distmax(i);
end
end
end
%disp(max);
matrix1=zeros(m,n);
%finding G matrix
for i=1:m
for j=1:k;
dif1=0.0;
for t=1:n
dif1=dif1+(A1(i)-b(j))*(A1(i)-b(j));
end
dif(j)=exp((-dif1)/(eeta*max*max));
matrix1(i,j)=dif(j);
end
end
%adding bias value
for i=1:m
matrix11(i,1)=1;
for j=1:k
matrix11(i,j+1)=matrix1(i,j);
end
end
%disp(matrix11);
%postoutput = G matrix * weight matrix
postout=matrix11*wt;
%disp(postout);
y=0;

150
for i=1:m
if(postout(i) < 0.5)
y(i)=0;
else
y(i)=1;
end
end
return;
%-------------- -rbftraintest.m-------------------------%
wt=rbftrain(’rbftrain.dat’,’means.dat’,’preoutput.dat’,20,3,4);
disp(wt);
y=rbftest(’rbftest.dat’,’means.dat’,wt,4,3,4);

8.4.1 Example

Consider the plot of train and test data shown in Fig 8.2.

Fig. 8.2: Plot of train and test data

151
%INPUT
%-------------------rbftrain.dat---------------------%
%No. of train vectors(m) = 20; dimension(n)=2;
1.5 1.5
2 2
4 3
3.5 2
4.5 2.5
8 1.5
9 2.3
10.5 3
7 2.3
10 3
3 7
2.5 8.3
4 8
3 10
4 10.5
8 7
7 8
6.5 9.5
8 8.5
10 10
%------------means.dat--------------%
% No. of mean vectors(k)=4;
3.100000 2.200000
8.900000 2.420000
3.300000 8.760000
7.900000 8.600000
% ----------preoutput.dat-----------%
1 0
1 0

152
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
%--------------rbftest.dat----------------------%
% No. of test vectors(m)=4; dimension(n)=2;
2 2
10.5 3
2.5 8.3
6.5 9.5
%OUTPUT
0
0
1
1

153
Chapter 9

Gaussian Mixture Model

by
T.S. SUBASHINI
Lecturer (Selection Grade),
Department of CSE, Annamalai University.

9.1 Introduction

Data measurements of many properties are often normally distributed, but with het-
erogeneous populations, sometimes data measurements reflect a mixture of normal
distributions. Mixture models are a type of density model which comprise a number
of component functions, usually Gaussian. These component functions are combined
to provide a multi model density. Gaussian mixture models are formed by combining
multivariate normal density components. Gaussian mixture models are often used for
data clustering. Like k-means clustering, Gaussian mixture modeling uses an iterative
algorithm that converges to a local optimum. Gaussian mixture modeling may be more
appropriate than k-means clustering when clusters have different sizes and correlation
within them.

k-Means clustering : k-means clustering is a method of cluster analysis which


aims to partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean.

9.2 Preliminaries

This section gives an review of the basic definitions and concepts which are required
to understand the basics of Gaussian mixture models. [16]

154
9.2.1 Mean Vector and Covariance Matrix

The first step in analyzing multivariate data is computing the mean vector and the
covariance matrix. Consider the following matrix
 
4.0 2.0 .60
 
 4.2 2.1 .59 
 
X=
 3.9 2.0 .58 

 
 4.3 2.1 .62 
4.1 2.2 .63

For the example sample data matrix the mean vector and the covariance matrix
are

µ = [4.10 2.08 .604]

 
0.025 0.0075 0.00175
 
Σ =  0.0075 0.0070 0.00135 
0.00175 0.00135 0.00043

where the mean vector contains the arithmetic averages of the three variables and
the covariance matrix Σ is calculated by
n
1 X
Σ= (xi − µ)(xi − µ)t
n − 1 i=1

where n = 5 for this example.

Thus, 0.025 is the variance of the length variable, 0.0075 is the covariance between
the length and the width variables, 0.00175 is the covariance between the length and the
height variables, 0.007 is the variance of the width variable, 0.00135 is the covariance
between the width and height variables and 0.00043 is the variance of the height
variable.

9.2.2 The Multivariate Normal Distribution

When multi variable data are analyzed, the multivariate normal density model is the
most commonly used model. A d−dimensional vector of random variables

155
x = x1 , x2 , x3 , · · · , xd − ∞ < xi < ∞, i = 1, · · · , d

is said to have a multivariate normal distribution if its density function f (x) is of


the form

f (x) = f (x1 , x2 , x3 , · · · , xd )

1
= e−1/2 (x − µ)t Σ−1 (x − µ)
(2π)d/2 |Σ|1/2
where µ = (µ1 , µ2 , µ3 , · · · , µd ) is the mean vector and Σ is the covariance matrix of
the multivariate normal distribution.

9.2.3 Univariate Normal Distribution

When d = 1, the one dimensional vector x = x1 has the normal distribution with mean
µ and variance σ 2

1 (x − µ)2
f (x) = √ e−1/2 −∞<x<∞
σ 2π 2σ 2

9.2.4 Bivariate Normal Distribution:

When d = 2 the two dimensional vector x = (x1 , x2 ) has the bivariate normal distri-
bution with a two-dimensional vector of means µ = (µ1 , µ2 ) and covariance matrix
" #
2 2
P σ11 σ12
= 2 2
σ21 σ22

9.2.5 Maximum Likelihood

For most sensible models, we will find that certain data are more probable than other
data. The aim of maximum likelihood estimation is to find the parameter (mean,
covariance) value(s) that makes the observed data most likely. This is because the
likelihood of the parameters given the data is defined to be equal to the probability of
the data given the parameters.

156
However, in the case of data analysis, we have already observed all the data:
once they have been observed they are fixed, there is no ’probabilistic’ part to them
anymore. We are much more interested in the likelihood of the model parameters that
underly the fixed data.

Probability: Knowing parameters =⇒ Prediction of outcome

Likelihood: Observation of data =⇒ Estimation of parameters

For example, suppose you are interested in the heights of Americans. You have a
sample of some number of Americans, but not the entire population, and record their
heights. Further, you are willing to assume that heights are normally distributed with
some unknown mean and variance. The sample mean is then the maximum likelihood
estimator of the population mean, and the sample variance is a close approximation
to the maximum likelihood estimator of the population variance.

9.3 Gaussian Mixture Model

Fig. 9.1: Mixture of two gaussians

157
Gaussian mixture model (GMM) is a mixture of several Gaussian distributions
and can therefore represent different subclasses inside one class. Figure 9.1 shows the
mixture of two Gaussians. The probability density function is defined as a weighted
sum of Gaussians. [22]
XK
p(x|Θ) = αk pk (x|θk ) (9.1)
k=1
PK
where k is the number of mixtures and k=1 αk = 1 are the mixture weights and where

Θ = {α1 , ......, αk , θ1 , ...., θk }

where each component is a multivariate Gaussian density.

We are given dataset D = x1 , x2 ......, xn where xi is a d−dimensional vector.


Assume that the points are generated in an identically independent fashion from an
underlying density p(x). We further assume a Gaussian mixture model with K com-
ponents.

1
pk (x|θk ) = e−1/2 (x − µk )t Σ−1
k (x − µk )
(2π)d/2 |Σk |1/2
with its own parameters θk = {µk , Σk }. We can compute the membership of data point
xi in cluster k given parameters Θ as

pk (xi |θk ) · αk
wi,k = P (9.2)
m=1 Kpm (xi |θm ) · αm

9.3.1 GMM Training

Several approaches exist for estimating the parameters of the GMM given a set of
data points. The most popular, and the one used here, is the expectation-maximization
(EM) algorithm, which iteratively optimizes the model using maximum likelihood esti-
mates. Expectation maximization (EM) algorithm is an iterative algorithm consisting
of expectation step (E-step) and maximization step (M-step), and is widely used for
model training

158
9.3.1.1 The EM Algorithm

We define the EM (Expectation-Maximization) algorithm for Gaussian mixtures as


follows.

E-Step: Denote the current parameter value as Θ. Compute wik , using equation
9.2 for all data points xi , 1 ≤ i ≤ N and all mixture components 1 ≤ k ≤ K .
Note that for each data point xi the membership weights are defined such that

PK
k=1wik = 1. This yields an N × K matrix of membership weights, where each
of the rows sum to 1 .

M-Step: Now use the membership weights and the data to calculate new param-
eter values. Specifically,
N
1 X
αnew
k = wik · xi 1 ≤ k ≤ K.
N i=1

! N
1 X
µnew
k = PN wik · xi 1 ≤ k ≤ K.
i=1 wik i=1

new
! N
X 1 X
new t
= PN wik · (xi − µnew
k ) (xi − µk )
k i=1 wik i=1
1 ≤ k ≤ K.

The equations in the M-Step need to be computed in this order, i.e., first compute
P′
the K new α′ s, then the K new µ′ s, and finally the K new s.

After we have computed all of the new parameters, the M-Step is complete and
we can now go back and recompute the membership weights in the E-Step, then re-
compute the parameters again in the M-Step , and continue updating the parameters
in this manner. Each pair of E and M steps is considered to be an iteration.

9.3.1.2 Initialization Issues

The initial parameters or weights can be chosen randomly (e.g. select K random
data points as initial means and select the covariance matrix of the whole data set

159
for each of the initial K covariance matrices) or could be chosen via some heuristic
method (such as by using k-Means to cluster the data first and then defining weights
based on k-Means memberships).

9.3.2 GMM Testing

After obtaining the maximum likelihood estimates of the various components of the
mixture using EM algorithm the probability density function is found out using equa-
tion 9.2. Given a test data the probability density function is found out for each class
as a weighted sum of Gaussians using the estimated parameters. The test data belongs
to the class which has the highest probability.

Fig. 9.2: GMM-Training

Figure 9.2 and Figure 9.3 shows the steps in GMM training and testing respec-
tively.

160
Fig. 9.3: GMM-Testing

9.4 Example

Assuming we are given one dimensional data on two classes namely male heights (mh)
and female heights (fh). Let the number of mixtures in each class be two. Using
EM algorithm maximum likelihood parameter estimates namely two means and two
covariances for male and female classes are found out.

enter no of class labels: 2


enter no of mixtures: 2

mh = [165 170 160 154 175 155 167 177 158 178];
fh= [135 146 160 156 147 123 169 141 132 139];

Assuming each class consists of two clusters the two means using EM algorithm for
the male class was found to be (162.1924, 169.6676) and the two means obtained for

161
the female class are (147.9285, 142.0535) respectively. The corresponding covariances
are found to be (75.6141, 74.3385) for the male class and (156.8230, 182.3439) for
the female class respectively.

Given a test data for example 149 the weighted sum of Gaussian probability
density function is found out for each class using the estimated parameters as shown
in equation 9.2. The test data belongs to the class which as a highest probability.
The probability that the given height 149 is a male was found to be 0.0137 and
that it is a female was found to be 0.0570. Since the female probability is high the
given test data 149 represents female height.

9.5 Matlab Code

The function gmmtrain() in Section 9.5.1 after obtaining inputs from the user regarding
the number of classes, clusters etc., calls the em1d() function (EM algorithm) given
in Section 9.5.1.1 which finds out the maximum likelihood estimates and these values
are used by gmmtest() of Section 9.5.2 to classify the given test data.

9.5.1 GMM Training

function [newmu newvar] = gmmtrain()

% ncl - number of class labels


% ncc - number of mixtures in a class

ncl = input(’enter no of class labels’);


ncc = input(’enter no of mixtures’);

% the no of classes and clusters are saved for future use


details = [ncl ncc];
save details;

%Initialing the mean vector and covariance vectors


for j=1:ncl
for k=1:ncc
newmu(j,k)=0;
newvar(j,k)=0;

162
end
end

% calling EM algorithm
for i=1:ncl
x= input(’Enter the single dimensional array name :’,’s’);
v = evalin(’base’, x);
ndata=length(v);
[newmu(i,:) newvar(i,:)] = em1d(v,ndata,ncc);

end
% the final mean and covariance martix are saved for future use
save newmu;
save newvar;

9.5.1.1 EM Algorithm

function [newm,newv]=em1d(h,n,m);

% h - sample one dimensional data


% n - number of data points
% m - number of mixtures
%nc -number of class labels

% Initial Random numbers plays a major role in the EM Algorithm

% Initial Mean
for i=1:m
em(i)=random(’Normal’,mean(h),6);
ev(i)=0;
end

% Initial Variance

for j=1:m
for i=1:n
ev(j)=ev(j)+(h(i)-em(j))*(h(i)-em(j));
end
ev(j)=ev(j)/n;
end

ERROR=0.5;
ITERATION =0;

163
while ((ERROR>1e-50)&&(ITERATION<5000))

for l=1:m
tp(l)=0;
tmean(l)=0;
tvar(l)=0;
p(j)=0;
end

% Calculation of mean and variance using Gaussian density

for i=1:n
sp=0;
for j=1:m
p(j)=1/sqrt(2*pi*ev(j))*exp(-0.5*((h(i)-em(j))*(h(i)-em(j)))/ev(j));
sp=sp+p(j);
end
for j=1:m
tmean(j)=tmean(j)+p(j)/sp*h(i);
tvar(j)=tvar(j)+p(j)/sp*(h(i)-em(j))*(h(i)-em(j));
tp(j)=tp(j)+p(j)/sp;
end
end

% Error between means in two consecutive iterations

ITERATION=ITERATION +1;
ERROR=0;
for j=1:m
ERROR=ERROR+(em(j)-tmean(j)/tp(j))*(em(j)-tmean(j)/tp(j));
end

for j=1:m
newm(j)=tmean(j)/tp(j);
newv(j)=tvar(j)/tp(j);
end
end

9.5.2 GMM Testing

function [fp] = gmmtest(nh)

164
%nh = new test height

load newmu;
load newvar;
load details;
%ncl-no of class labels
%ncc -no of clusters
ncl = details(1,1);
ncc = details(1,2);
for i=1:ncl
fp(i)=0;
end

for k = 1:ncl
sp=0;
for j=1:ncc
p(j) = 1/sqrt(2*pi*newvar(k,j))*exp(-0.5*((nh-newmu(k,j))*(nh-newmu(k,j)))/
sp = sp + p(j);
end

fp(k)=sp;
end

9.5.3 GMM Training - Matlab Output

mh = [165 170 160 154 175 155 167 177 158 178];
fh= [135 146 160 156 147 123 169 141 132 139];

>> gmmtrain
enter no of class labels2
enter no of mixtures2
Enter the single dimensional array name :mh
Enter the single dimensional array name :fh

newmu =

163.3440 169.9813
144.0540 145.5456

newvar =

165
61.4519 146.9010
170.3218 174.8937

9.5.4 GMM Testing - Matlab Output

The test data (td) heights namely 163 and 145 were tested using GMM. It is found
that the male probability is more for the test data 163 which implies that the height
163 represents a male and test data 145 represents the height of a female.

>> gmmtest(149)

ans =

0.0207 0.0574 (Here the first entry is the male probability


and the second corresponds to female.)

Inference : Height 149 corresponds to a female as the female probability is more.

>> gmmtest(164)

ans =

0.0746 0.0210

Inference : Height 164 corresponds to a male as the male probability is more.

>> gmmtest(157)

ans =

0.0512 0.0394

Inference : Height 157 corresponds to a male as the male probability is more.

166
>> gmmtest(154)

ans =

0.0383 0.0474

Inference : Height 154 corresponds to a female as the female probability is more.

167
Chapter 10

Support Vector Machines

by
M. BALASUBRAMANIAN
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

10.1 Introduction

Support vector machine (SVM) [23], [24] is based on the principle of structural risk
minimization (SRM). Like RBFNN, support vector machines can be used for pattern
classification and nonlinear regression. SVM constructs a linear model to estimate the
decision function using non-linear class boundaries based on support vectors. If the
data are linearly separated, SVM trains linear machines for an optimal hyperplane
that separates the data without error and into the maximum distance between the
hyperplane and the closest training points. The training points that are closest to
the optimal separating hyperplane are called support vectors. Fig. 10.1 shows the
architecture of the SVM. SVM maps the input patterns into a higher dimensional
feature space through some nonlinear mapping chosen a priori. A linear decision
surface is then constructed in this high dimensional feature space. Thus, SVM is a
linear classifier in the parameter space, but it becomes a nonlinear classifier as a result
of the nonlinear mapping of the space of the input patterns into the high dimensional
feature space.

10.2 SVM Principle

Support vector machine (SVM) ban be used for classifying the obtained data (Burges,
1998). SVM are a set of related supervised learning methods used for classification
and regression. They belong to a family of generalized linear classifiers. Let us denote
a feature vector (termed as pattern) by x=(x1 , x2 , · · · , xn ) and its class label by y such
that y = {+1, −1}. Therefore, consider the problem of separating the set of n-training
patterns belonging to two classes,
168
Fig. 10.1: Architecture of the SVM (Ns is the number of support vectors).

(xi , yi ) , xi ∈ Rn , y = {+1, −1} , i = 1, 2, · · · , n

A decision function g (x) that can correctly classify an input pattern x that is not
necessarily from the training set.

10.2.1 SVM for Linearly Separable Data

A linear SVM is used to classify data sets which are linearly separable. The SVM
linear classifier tries to maximize the margin between the separating hyperplane. The
patterns lying on the maximal margins are called support vectors. Such a hyperplane
with maximum margin is called maximum margin hyperplane [23]. In case of linear
SVM, the discriminant function is of the form:

g (x) = w t x + b (10.1)

such that g (xi ) ≥ 0 for yi = +1 and g (xi ) < 0 for yi = −1. In other words, training
samples from the two different classes are separated by the hyperplane g (x) = w t x+b =
0. SVM finds the hyperplane that causes the largest separation between the decision
function values from the two classes. Now the total width between two margins is
2
wt w
, which is to be maximized. Mathematically, this hyperplane can be found by
minimizing the following cost function:
1
J (w) = w t w (10.2)
2
Subject to separability constraints

g (xi ) ≥ +1 for yi = +1

169
or
g (xi ) ≤ −1 for yi = −1

Equivalently, these constraints can be re-written more compactly as



yi w t xi + b ≥ 1; i = 1, 2, · · · , n (10.3)

For the linearly separable case, the decision rules defined by an optimal hyperplane
separating the binary decision classes are given in the following equation in terms of
the support vectors: !
i=N
Xs
Y = sign yi αi (xxi ) + b (10.4)
i=1

where Y is the outcome, yi is the class value of the training example xi , and represents
the inner product. The vector corresponds to an input and the vectors xi , i = 1, . . . , Ns ,
are the support vectors. In Eq. 10.4, b and αi are parameters that determine the
hyperplane.

10.2.2 SVM for Linearly Non-separable Data

For non-linearly separable data, it maps the data in the input space into a high di-
mension space x ∈ RI 7→ Φ(x) ∈ RH with kernel function Φ(x), to find the separating
hyperplane. A high-dimensional version of Eq. 10.4 is given as follows:
i=N
!
X
Y = sign yi αi K (x, xi ) + b (10.5)
i=1

10.3 Determining Support Vectors

The support vectors are the (transformed) training patterns. The support vectors are
(equally) close to hyperplane. The support vector are training samples that define
the optimal separating hyperplane and are the most difficult patterns to classify. In-
formally speaking, they are the patterns most informative for the classification task.
Fig. 10.2 shows a SVM example to classify a person into two classes: overweighed,
not overweighed; two features are pre-defined: weight and height. Each point repre-
sents a person. Dark circle and star points denote overweighed and not overweighed
respectively. Circles over the points as shown in Fig. 10.2 denote support vectors.

170
Fig. 10.2: SVM example to classify a person into two classes: overweighed, not over-
weighed; two features are pre-defined: weight and height. Each point represents a person.
Dark circle point (•) : overweighed; star point (∗) : not overweighed.

10.4 Inner Product Kernels

SVM generally applies to linear boundaries. In the case where a linear boundary is
inappropriate SVM can map the input vector into a high dimensional feature space. By
choosing a non-linear mapping, the SVM constructs an optimal separating hyperplane
in this higher dimensional space, as shown in Fig. 10.3. The function K is defined
as the kernel function for generating the inner products to construct machines with
different types of non-linear decision surfaces in the input space.

K (x, xi ) = Φ (x) .Φ (xi ) (10.6)

The kernel function may be any of the symmetric functions that satisfy the Mercer’s
conditions (Courant and Hilbert, 1953). There are several SVM kernel functions as
given in Table 10.1.

An example for SVM kernel function Φ(x) maps 2-dimensional input space (x1 , x2 )

to higher 3-dimensional feature space (x21 , x22 , 2x1 x2 ) as shown in Fig. 10.3. SVM was
originally developed for two class classification problems. The N class classification
problem can be solved using N SVMs. Each SVM separates a single class from all the
remaining classes (one-vs-rest approach) [25], [26]. The dimension of the feature space
vector Φ(x) for the polynomial kernel of degree p and for the input pattern dimension
of d is given by
(p + d)!
(10.7)
p! d!

171
Table 10.1: Types of SVM inner product kernels
Types of kernels Inner Product Kernel K(xT , xi ) Details
Polynomial (xT xi + 1)p Where x is input patterns,
  xi is support vectors,
2
kx −xik
T
Gaussian exp − 2σ2 σ 2 is variance, 1 ≤ i ≤ Ns ,
Ns is number of support vectors,
T

Sigmoidal tanh β0 x xi + β1 β0 , β1 are constant values.
p is degree of the polynomial

Fig. 10.3: An example for SVM kernel function Φ(x) maps two dimensional input space
to higher three dimensional feature space. (a) Nonlinear problem. (b) Linear problem.

For sigmoidal kernel and Gaussian kernel, the dimension of feature space vectors is
shown to be infinite. Finding a suitable kernel for a given task is an open research
problem.

10.4.1 Example for Polynomial Kernel: OR Problem

Table 10.2: OR table


Input vectors(x) Desired Output(d)
x1 x2
-1 -1 -1
-1 1 1
1 -1 1
1 1 1

SVM kernel function Φ(x) maps 2-dimensional (2D) input space to higher 6-
dimensional (6D) feature space. We have to find support vectors after transformation

172
(2D to 6D). In this problem, there are 4 support vectors.
Input vectors (x) : [-1 -1], [-1 1], [1 -1], [1 1]
Support vectors (xi ) : [-1 -1], [-1 1], [1 -1], [1 1]
Inner product for polynomial kernel

K(x, xi ) = (xt xi + 1)p " # !


−1
K(x1 , x1 ) = [−1 − 1] +1 = (1 + 1 + 1)2 = 9
−1
" # !
−1
K(x1 , x2 ) = [−1 − 1] + 1 = (1 − 1 + 1)2 = 1
1
Similarly, we have to find the kernel for K(x1 , x3 ), K(x1 , x4 ), K(x2 , x1 ), · · · , K(x4 , x4 )
 
9 1 1 1
 1 9 1 1 
 
K(x, xi ) =  
 1 1 9 1 
1 1 1 9

The Optimal Separating Hyperplane Equation : W t X + b = 0


Where W is weight vector (Hidden layer to output layer),
X is Inner product kernel matrix,
b is bias, Let us take X = K(x, xi )

SVM Training:
Finding weights from hidden layer to output layer using least square approximation
B=bxd
where b is bias, d is desired output
 
−1
 1 
 
Let us take b = 0.5 and d = 
 1 
1

   
−1 −0.5
 1   0.5 
   
B = 0.5 x  = 
 1   0.5 
1 0.5

If X is square matrix, we can use the following equation to find W(weight)

173
W = X −1 B

If X is not square matrix, we can use the following equation to find W(weight)

W = (X t X)−1 X t B

In this OR-problem, X is square matrix

 
−0.1875
 0.0625 
 
W = 
 0.625 
0.625

SVM Testing:

 
−1.5
 0.5 
t  
W X= 
 0.5 
0.5
 
−1
 1 
 
W tX + b =  
 1 
1

If W t X + B ≥ +1 then class(di ) = +1
If W t X + B ≤ −1 then class(di ) = −1

10.4.2 Example for Gaussian Kernel: OR Problem

SVM kernel function Φ(x) maps 2-dimensional (2D) input space to higher infinite-
dimensional (ID) feature space. We have to find support vectors after transformation
(2D to 6D). In this problem, there are only 3 support vectors.
Input vectors (x) : [-1 -1], [-1 1], [1 -1], [1 1]
Support vectors (xi ) : [-1 -1], [-1 1], [1 -1] h i
kx−xi k2
Inner Product for Gaussian Kernel K(x, xi ) = exp − 2σ2

174
OR table
Input vectors (x) Desired Output (d)
x1 x2
-1 -1 -1
-1 1 1
1 -1 1
1 1 1

h i
−x1 k2
K(x1 , x1 ) = exp − kx12σ 2

h i
−1]−[−1 −1]k2
=exp − k[−1 2×8

h i  0
+1)2 +(−1 +1)2
=exp − (−1 2×8
= exp − 16 =1
h i
−x2 k2
K(x1 , x2 ) = exp − kx12σ 2

h i
−1]−[−1 1]k2
=exp − k[−1 2×8

h i  4
+1)2 +(−1 −1)2
=exp − (−1 2×8
= exp − 16 = 0.7788

Similarly, we have to find the kernel for K(x1 , x3 ), K(x1 , x4 ), K(x2 , x1 ), · · · , K(x4 , x4 )
 
1 0.7788 0.7788
 0.7788 1 0.6065 
 
K(x, xi ) =  
 0.7788 0.6065 1 
0.6065 0.7788 0.7788
The Optimal Separating Hyperplane Equation : W t X + b = 0
Where W is weight vector (Hidden layer to output layer),
X is Inner product kernel matrix,
b is bias, Let us take X = K(x, xi )

SVM Training:
Finding weights from hidden layer to output layer using least square approximation
B=bxd
where b is bias, d is desired output

175
 
−1
 1 
 
Let us take b = 0.8011 and d = 
 1 
1

   
−1 −0.8011
 1   0.8011 
   
B = 0.8011 ×  = 
 1   0.8011 
1 0.8011

If X is square matrix, we can use the following equation to find W(weight)

W = X −1 B

If X is not square matrix, we can use the following equation to find W(weight)

W = (X t X)−1 X t B

In this OR-problem, X is not square matrix


 
−8.1451
 
W =  4.0761 
4.0700

SVM Testing:

 
−1.8019
 0.2005 
t  
W X= 
 0.1981 
1.4033
 
−1.0008
 1.0016 
 
W tX + b =  
 0.9991 
2.2024
If W t X + b ≥ +1 then class(di ) = +1
If W t X + b ≤ −1 then class(di ) = −1

176
10.5 SVM Tool Demonstration

Name of SVM Tool : SVM torch [27]


i) SVM Training:
General Format:
SVMTorch.exe [options] htrainf ilei hmodelf ilei
Example for OR-Problem:
train file : ORtrain.dat
model file : ORmodel
kernel options :
-t 0 - linear kernel
-t 1 - polynomial kernel
-t 2 - gaussian kernel
-t 3 - sigmoid kernel
-multi - multiclass mode (More than two classes)
Training file format
i) Two class

n (d+1)
a11 a12 ... a1d +1
a21 a22 ... a2d +1
b31 b32 ... b3d -1
bn1 bn2 ... bnd -1

where n is the number of feature vectors.


d is dimension of each feature vector (Number of features).
last column denotes category (Two class (+1 and -1))
category1 data:
a11 a12 ... a1d
a21 a22 ... a2d
category2 data:
b31 b32 ... b3d
bn1 bn2 ... bnd
ii) Multi class
Example for four class training file
Example: OR Problem
Training file : ORtrain.dat

177
n (d+1)
a11 a12 ... a1d 0.0
a21 a22 ... a2d 0.0
b31 b32 ... b3d 1.0
b41 b42 ... b4d 1.0
c51 c52 ... c5d 2.0
c61 c62 ... c6d 2.0
d71 d72 ... d7d 3.0
dn1 dn2 ... dnd 3.0

4 3
-1 -1 -1
-1 1 +1
1 -1 +1
1 1 +1

SVMTorch.exe -t 1 ORtrain.dat ORmodel

ii) SVM Testing:


General Format:
SVMTest.exe [options] hmodelf ilei htestf ilei
Example for OR-Problem:
model file : ORmodel
test file : ORtest.dat
options :
-no - no desired output in test file
-oa hresulti - write in ASCII the SVM output into hresulti
-ob hresulti - write in binary the SVM output into hresulti
-multi - multiclass mode (More than two classes)

178
Example: OR Problem
Testing file : ORtest.dat

4 2
-1 -1
-1 1
1 -1
1 1

SVMTest.exe -no -oa result.dat ORmodel ORtest.dat


Open the result.dat file : which contains the following data
-1
1
1
1

179
Chapter 11

Hidden Markov Models

by
AN. SIGAPPI
Lecturer (Selection Grade),
Department of CSE, Annamalai University.

11.1 Need for Hidden Markov Model

Speech signal is represented by a series of feature vectors which are computed for every
10 ms. A whole word will comprise dozens of those vectors, and the number of vectors
(the duration) of a word will depend on how slow a person is speaking. Therefore, in
speech recognition applications, it is required to classify not only single vectors, but
sequences of vectors. Let us assume that we would want to recognize a few command
words or digits. For an utterance of a word w which is TX vectors long, we will get
a sequence of vectors from the acoustic preprocessing stage. What is required is a
way to find the optimum match between this unknown sequence of vectors and known
sequence of vectors contained in the vocabulary. These unknown sequence of vectors
are the prototypes for the words we want to recognize.

All variability of speech that naturally occurs if a word is spoken at different times
by a person or even by different persons had to be reflected by the prototype vector
sequences. It should be clear that the number of prototypes we have to store in order
to implement a speaker independent system might become quite large, especially if we
are dealing with a large vocabulary size. What we are rather looking for to handle this
problem is a way to represent the words of our vocabulary in a more generic form than
just to store many speech samples for each word. If, for example, we would have a
general stochastic model for the generation of feature vectors corresponding to a given
word, then we could calculate how good a given utterance fits to our model. If we

180
calculate a fitvalue for each model of our vocabulary, then we can assign the unknown
utterance to the model which best fits to the utterance. This is a very simplistic
description of a general classification method, the so called statistical classification.

One major breakthrough in speech recognition was the introduction of the sta-
tistical classification framework for speech recognition purposes. The important part
of that framework is the emission probability density p(X|v). These density func-
tions have to be estimated for each word model to reflect the generation process of
all possible vector sequences belonging to the word class. As we know, these vector
sequences may vary in their length as well as in the individual spectral shapes of the
feature vectors within that sequence. Thus, a model is needed which is capable of
dealing with both of these variabilities. The hidden Markov model (HMM) is mod-
eling a stochastic process defined by a set of states and transition probabilities
between those states, where each state describes a stationary stochastic process and
the transition from one state to another state describes how the process changes its
characteristics in time.

Thus, a hidden Markov model is used in the problem of making a sequence of


decisions on temporal basis. It is a statistical model and is a variant of the finite state
machine.

11.2 Markov Model

In Markov model, the states are directly accessible to the observer. For example, con-
sider a simple 3-state Markov model of the weather. We assume that once a day (say,
at noon) the weather is observed as one of the following:

State 1 : Rainy
State 2 : Cloudy
State 3 : Sunny

To say the weather on a particular day, it is only required to say the state directly.
So, in a Markov model, the states are directly accessible to the observer. As a Markov
model is a form of finite state machine, it can be represented by a state diagram as
given in Fig. 11.1.

181
Fig. 11.1: Markov model

Table 11.1: Visible actions


Walking Shopping Cleaning Hidden State
0.1 0.3 0.6 Rainy
0.5 0.3 0.2 Cloudy
0.2 0.6 0.2 sunny

In Fig. 11.1, wi represents the state and ai,j represents the transition probability
to make a transition from the current state i to the next state j.

11.3 Hidden Markov Model

Unlike Markov model, in hidden Markov model the states are not directly accessible
to the observer. Again consider the weather example. However now the states are not
directly revealed to the observer. Instead the actions (such as walking, shopping, and
cleaning) done will be revealed. From the probability of the actions one must predict
the weather state. This is the hidden Markov model. The hidden states are as follows:
ω1 - Rainy
ω2 - Cloudy
ω3 - Sunny

The actions (visible) are given in Table. 11.1

182
The state diagram of a HMM can be given as follows:

Fig. 11.2: State diagram of HMM

11.3.1 Notations and Model Parameters used in HMM

The following notations are used in HMM:

• ω - Hidden states
• v - Visible states
• ai,j - Transition probability to make a transition from state i at t to state j at
(t+1)
• bj,k - Emission probability to emit visible state k at hidden state j
• N - Number of hidden states (guess this number)
• M - Number of visible states (obtained from the training set)

A complete specification of a HMM [16] requires specification of two parameters


(N and M ), specification of visible states, and the specification of the probability
measures ai,j and bj,k . If there is a prior probability π, we could include such a factor
as well. For convenience, we use the compact notation,

X X
λ = {ai,j , bj,k , π}, ai,j = 1, ∀i bj,k = 1, ∀j (11.1)
j k

183
to indicate the complete parameter set of the model. For simplicity, we ignore π
in our discussion.

11.3.2 Order of HMM

An order of a HMM represents the memory size of the model. An order 1 (first order
Markov model) has a memory size 1. That is the probability at (t+1) depends only
on the states at t. An order n Markov model depends on the states upto t, so requires
memory of size n.

11.3.3 Types of HMM

1. Ergodic Model

• Every state of the model can be reached from every other state in a finite
number of steps.
• This type of model has the property that every ai,j coefficient is positive
[28].
• Example: A 4-state ergodic model is given in Fig. 11.3. {ai,j } for this
4-state ergodic model will be

Fig. 11.3: 4-State ergodic model


 
a11 a12 a13 a14
 a21 a22 a23 a24 
 
ai,j =  
 a31 a32 a33 a34 
a41 a42 a43 a44

2. Left-Right Model

184
• This model has the property that as time increases the state’s index in-
creases (or stays the same), i.e, the states proceed from left to right.
• So the state transition coefficients have the property ai,j = 0, j < i, i.e, no
transitions are allowed to states whose indices are lower than the current
state [28].
• Example: A 4-state left-right model is given in Fig. 11.4.

Fig. 11.4: 4-State left-right model

{ai,j } for this 4-state left-right model will be


 
a11 a12 a13 a14
 0 a22 a23 a24 
 
ai,j =  
 0 0 a33 a34 
0 0 0 a44

Ergodic model is a highly flexible model, whereas left-right model is a strict model.
By combining these models, many variations and combinations are possible.

11.3.4 Applications of HMM

The following is an illustrative list of applications of HMM:

• Speech recognition
• Gait recognition
• Optical character recognition
• Lip-reading (visual speech to text mapping)

185
11.4 Design Issues

The HMM will be useful in real world applications, if the following three basic problems
of HMM are solved [16]:

1. Evaluation problem
2. Decoding problem
3. Learning problem

11.4.1 Evaluation Problem

• Transition probability ai,j and emission probability bj,k are given.


• The probability to generate a particular sequence of visible states V T by that
model is to be determined.
V T = v1 , v2 , . . . , vT ,where T is the length of the sequence.
Example: V 6 = {v5 , v1 , v4 , v5 , v2 , v3 } here T = 6
• So our goal is to find P (V T |θ). That is to find the probability to generate the
sequence V T when θ = {ai,j , bj,k } is given. We must take each possible sequence
of hidden states to produce V T , calculate the probability and then add up the
probabilities. So,

N T
X
P (V T |θ) = P (V T |ωrT )P (ωrT ) (11.2)
r=1

where
Q
– P (V T |ωrT ) = Tt=1 P (v(t)|ω(t))
The right hand side is nothing but bj,k , i.e, probability to generate the
visible state v at the hidden state ω. So, this term is a product of bj,k ’s
according to the hidden state and the corresponding visible state.
Q
– P (ωrT ) = Tt=1 P (ω(t)|ω(t − 1))
The right hand side is nothing but ai,j , i.e., probability to make transition
from one hidden state at (t-1) to another hidden state at t. So, this term
is a product of ai,j ’s according to the hidden sequence.

Hence P (V T |θ) is merely the product of the corresponding ai,j and bj,k at each
step.

186
T
N Y
T
X
T
P (V |θ) = P (v(t)|ω(t))P (ω(t)|ω(t − 1)) (11.3)
r=1 t=1

But this type of calculation is much complex, and takes O(N T T ) calculation.
A computationally simpler recursive algorithm for the same goal is as follows:



 0 , t = 0 and j 6= initialstate
αj (t) = 1 , t = 0 and j = initialstate (11.4)

 P
[ i αi (t − 1)ai,j ] bj,k v(t) , otherwise

This is known as forward algorithm. The computational complexity of this


algorithm is O(C 2 T ).

11.4.2 Decoding Problem

• The decoding problem is to find the most probable sequence of hidden states
for the given sequence of visible states V T .
• For decoding Viterbi algorithm is used.
• The decoding algorithm finds at each time step t, the state that has the highest
probability (αj (t)). The full path is the sequence of hidden states to generate
the given visible state sequence optimally.

11.4.3 Learning Problem

• The values of N, M are given.


• The goal of learning is to determine the model parameters {ai,j , bj,k } from the
training samples.
• Forward-Backward algorithm, also known as Baum-Welch algorithm is
used for learning algorithm.
• Forward algorithm is already discussed in the evaluation problem. It will gen-
erate α values. By using backward algorithm, we must find β values.

Let the model be in state ωi (t) by generating part of the given visible sequence.
α is nothing but the probability taken so far to come to the current state from the

187
initial state. β is the probability of the model to generate the remainder of the target
sequence.


 0 , ωi (t) 6= f inalstate and t = T

βi (t) = 1 , ωi (t) = f inalstate and t = T (11.5)

 P
j βj (t + 1)ai,j ]bj,k v(t + 1) , otherwise

β value may be calculated from the final state at t=T, if the state is in the final
hidden state β = 1, otherwise β = 0.

The evaluation process is started by randomly selecting the value of ai,j and bj,k
(such that the summation of each row of ai,j and bj,k is equal to 1). The reestimation
of ai,j and bj,k will be done to achieve the true values of ai,j and bj,k .

Expected number of transitions f rom state Si to Sj


ai,j = Expected number of transitions f rom state Si

Using this principle all values of {ai,j } will be calculated.

Expected number of times in state j and observing symbol vk


bj,k = Expected number of times in state j

For the same training data again P (V T |θ) is calculated by reestimated ai,j and
bj,k . This reestimation for the same training data will be done repeatedly until the
value of ai,j and bj,k is constant for subsequent iterations or there is negligible change
in the estimated values of the parameters in subsequent iterations. Now, the values of
ai,j and bj,k are the true values. So, it can be applied to test data.

11.5 Implementation Example

Consider using HMMs to build an isolated word recognizer. Assume we have a


vocabulary of V words to be recognized and that each word is to be modeled by a
distinct HMM. That is if we want to build a 10-word recognizer for the digits Zero,
One, Two, etc.. upto Ten, it is required to build a HMM for each word, and hence
ten HMMs need to be built for this task. For each word in the vocabulary, we have
a training set of K occurrences spoken by one or more speakers. That is for the word

188
’Zero’, 100 samples spoken by one or more speakers, and for the word ’One’, another
100 samples spoken by one or more speakers is required, and so on for all the 10 words
in the voacublary.

For each word, we have many observation sequences, i.e., visible state sequence
T
V . For example, let us assume that it takes 0.5 sec to utter the word ’Zero’. Within
this period, V T is taken for each 25 ms. So, each word has many V T or observation
sequences. It is nothing but a vector of 19 dimensions. Likewise, observation sequences
are obtained for all 100 samples of the word ’Zero’. HMM0 will be trained by using
these sequences, by doing the evaluation and learning process. Training will be stopped
after obtaining the optimum values for ai,j and bj,k . Likewise, all ten HMMs will be
trained. The block diagram of an isolated word recogniser using HMM is given in
Fig. 11.5.

Fig. 11.5: Isolated word recogniser

Linear Predictive Coding (LPC) is used to obtain the observation vectors V T


from the speech samples. Each unknown word which is to be recognized is applied
to the LPC block. LPC produces continuous vectors. These observation sequences
are applied to all HMMs (from HMM0 to HMM9 ). Each HMM will compute the
probability P (V T |θ) for the observation sequence V T by using the evaluation process.

From all P (V T |θ) values, the maximum value will be selected by using Viterbi
algorithm. That is for the word ’Zero’, the P (V T |θ) generated by HMM0 will be
greater than other HMMs. So it is recognized that the spoken word is ’Zero’.

189
11.6 Introduction to HTK

HTK is a toolkit for building hidden Markov models. It is developed by the Cam-
bridge University Engineering Department (CUED). HMMs can be used to model any
time series or dynamic behavior. HTK is primarily designed for building HMM-based
speech processing tools, in particular recognizers. Speech recognition systems generally
assume that the speech signal is a realisation of some message encoded as a sequence
of one or more symbols. To effect the reverse operation of recognising the underlying
symbol sequence given as spoken utterance, the continuous speech waveform is first
converted to a sequence of equally spaced discrete parameter vectors. This sequence
of parameter vectors is assumed to form an exact representation of the speech wave-
form on the basis that for the duration covered by a single vector (typically 10 ms
or so), the speech waveform can be regarded as being stationary. Typical parametric
representations in common use are smoothed spectra or linear prediction coefficients
plus various other representations derived from these.

The role of the recognizer is to effect a mapping between sequences of speech


vectors and the wanted underlying symbol sequences. The following two problems
make this very difficult.

1. The mapping from symbols to speech is not one-to-one since different underlying
symbols can give rise to similar speech sounds. Furthermore, there are large
variations in the realised speech waveform due to speaker variability, mood,
environment, etc.
2. The boundaries between symbols cannot be identified explicitly from the speech
waveform. Hence, it is not possible to treat the speech waveform as a sequence
of concatenated static patterns.

Hence in this discussion it is proposed to restrict the task to isolated word recog-
nition using HMMs. As shown in Fig. 11.6, this implies that the speech waveform
corresponds to a single underlying symbol (e.g. word) chosen from a fixed vocabulary.

11.7 Overview of HTK Toolkit

This section describes the software architecture of a HTK tool and gives a brief outline
of all the HTK tools and the way that they are used together to construct and test

190
Fig. 11.6: Using HMMs for isolated word recognition

HMM-based recognisers. Much of the functionality of HTK is built into the library
modules. These modules ensure that every tool interfaces to the outside world in
exactly the same way. They also provide a central resource of commonly used func-
tions. Fig. 11.7 illustrates the software structure of a typical HTK tool and shows its
input/output interfaces.

Fig. 11.7: Software architecture of HTK

User input/output and interaction with the operating system is controlled by the
library module HShell and all memory management is controlled by HMem. Math
support is provided by tool HMath and the signal processing operations needed for
speech analysis are in HSigP. Each of the file types required by HTK has a dedicated
interface module. HLabel provides the interface for label files, HLM for language model
files, HNet for networks and lattices, HDict for dictionaries, HVQ for VQ codebooks

191
and HModel for HMM definitions. All speech input and output at the waveform level
is via HWave and at the parameterised level via tool HParm. Direct audio input is
supported by HAudio and simple interactive graphics is provided by HGraf. HUtil
provides a number of utility routines for manipulating HMMs while HTrain and HFB
contain support for the various HTK training tools. HAdapt provides support for
the various adaptation tools. Finally, HRec contains the main recognition processing
functions.

11.8 Generic Properties of a HTK Tool

HTK tools are designed to run with a traditional command-line style interface. Each
tool has a number of required arguments plus optional arguments. The latter are
always prefixed by a minus sign. As an example, the following command would invoke
the mythical HTK tool called HFoo.

HFoo -T 1 -f 34.3 -a -s myfile file1 file2

This tool has two main arguments called file1 and file2 plus four optional argu-
ments. Options are always introduced by a single letter option name followed where
appropriate by the option value. The option value is always separated from the option
name by a space. Thus, the value of the -f option is a real number, the value of the -T
option is an integer number and the value of the -s option is a string. The -a option
has no following value and it is used as a simple flag to enable or disable some feature
of the tool. The -T option is always used to control the trace output of a HTK tool.
In addition to command line arguments, the operation of a tool can be controlled by
parameters stored in a configuration file. For example, if the command

HFoo -C config -f 34.3 -a -s myfile file1 file2

is executed, the tool HFoo will load the parameters stored in the configuration file
config during its initialisation procedures.

11.9 HTK Toolkit

The processing steps involved in building a speech recogniser are: data preparation,
training, testing and analysis.

192
11.9.1 Data Preparation Tools

In order to build a set of HMMs, a set of speech data files and their associated tran-
scriptions are required. Speech data will be obtained from database archives. Before
it can be used in training, it must be converted into the appropriate parametric form
and any associated transcriptions must be converted to have the correct format and
use the required phone or word labels. If the speech needs to be recorded, then the
tool HSLab can be used both to record the speech and to manually annotate it with
any required transcriptions.

To parameterise the data just once, the tool HCopy is used. By setting the ap-
propriate configuration variables, all input files can be converted to parametric form
as they are read-in. The tool HList can be used to check the contents of any speech
file and since it can also convert input on-the-fly, it can be used to check the results
of any conversions before processing large quantities of data.

11.9.2 Training Tools

The second step is to define the topology required for each HMM by writing a prototype
definition. HTK allows HMMs to be built with any desired topology. HMM definitions
can be stored externally as simple text files and hence it is possible to edit them with
any convenient text editor. The purpose of the prototype definition is only to specify
the overall characteristics and topology of the HMM. The actual parameters will be
computed later by the training tools.

The actual training process takes place in stages. Firstly, an initial set of models
must be created. If there is some speech data available for which the location of the
sub-word (i.e. phone) boundaries have been marked, then this can be used as bootstrap
data. In this case, the tools HInit and HRest provide isolated word style training using
the fully labeled bootstrap data. Each of the required HMMs is generated individually.
When no bootstrap data is available, a so-called flat start can be used. In this case all
of the phone models are initialised to be identical and have state means and variances
equal to the global speech mean and variance. The tool HCompV can be used for this.

Once an initial set of models has been created, the tool HERest is used to perform
embedded training using the entire training set. HERest performs a single Baum-
Welch re-estimation of the whole set of HMM phone models simultaneously. For each
training utterance, the corresponding phone models are concatenated and then the

193
forward-backward algorithm is used to accumulate the statistics of state occupation,
means, variances, etc., for each HMM in the sequence. When all of the training data
has been processed, the accumulated statistics are used to compute re-estimates of the
HMM parameters.

11.9.3 Recognition Tools

HTK provides a recognition tool called HVite which uses the token passing algorithm
to perform Viterbi-based speech recognition. HVite takes as input a network describing
the allowable word sequences, a dictionary defining how each word is pronounced and
a set of HMMs. It operates by converting the word network to a phone network and
then attaching the appropriate HMM definition to each phone instance. Recognition
can then be performed on either a list of stored speech files or on direct audio input.

11.9.4 Analysis Tools

Once the HMM-based recogniser has been built, it is necessary to evaluate its perfor-
mance. This is usually done by using it to transcribe some pre-recorded test sentences
and match the recogniser output with the correct reference transcriptions. This com-
parison is performed by a tool called HResults which uses dynamic programming to
align the two transcriptions and then count substitution, deletion and insertion errors.

11.10 Example

This section explains the steps in building a simple HMM based recogniser for hypo-
thetical two-dimensional data stored in two files, namely, input1.dat and input2.dat.
The main steps are as follows:

During training,

• Create input files with feature vectors.


• Convert the files containing feature vectors to HTK format.
• Initialise the HMM parameters using HInit.
• Perform reestimation using HRest.

194
Table 11.2: Input file with sequence of feature vectors
4 3
4 5
3 4
5 4
8 7
8 9
7 8
9 8
0 0

During testing,

• Create test file with feature vectors.


• Convert the file containing feature vectors to HTK format.
• Create network file, model list, dictionary entries manually.
• Create classNet from network file using HParse.
• Perform recognition using HVite. This is done by matching the observations
against the constructed models.
• The best matched model name and its corresponding likelihood score is obtained
as result.

Assume input file 1 contains the two-dimenisonal data given in Table. 11.2.

Assume input file 2 contains the two-dimensional data given in Table. 11.3.

If the problem requires two models, initialise the two models with number of states,
number of mixtures, means, variances, and values for the transition matrix. In this
example, a 3-state HMM is assumed and the appropriate prototype initialisation for
model 1 is given below:
˜o < VecSize > 2 <USER>
˜h ” model1 ”
<BeginHMM>
<NumStates> 3
<S t a t e > 2 <NumMixes> 2

195
Table 11.3: Input file with sequence of feature vectors
3 2
3 4
2 3
4 3
7 6
7 8
6 7
8 7
0 0

<Mixture > 1 0 . 5
<Mean> 2
0
<Va r ia nce > 2
1.0 1.0
<Mixture > 2 0 . 5
<Mean> 2
0
<Va r ia nce > 2
1.0 1.0
<TransP> 3
0 . 0 0 e +0 0 . 5 e +0 0 . 5 e+0
0 . 0 0 e +0 0 . 5 e +0 0 . 5 e+0
0 . 0 0 e +0 0 . 0 e +0 0 . 0 e+0
<EndHMM>

Model 2 must also be initialised on the same lines as described above. The HTK
tool HInit performs the initialisation and HRest performs the reestimation and finally
two models are constructed. Prior to testing, model list, dictionary, and network files
must be created manually. The model list file contains the list of models as given
below:
model1
model2

The dictionary file contains a mapping of the form given below:

196
Table 11.4: Testfile with sequence of feature vectors
4 3
4 5
3 4
5 4
8 7
8 9
7 8
9 8
0 0

model1 model1
model2 model2

The network file contains the following description:


$Model = model1 | model2 ;
( $Model )

Invoke HParse to generate the ClassNet file. Now, assume the system is tested
with the two-dimensional data given in Table. 11.4.

Testing the recognition system produces the match as model1 with an acoustic
score of -25.717180. Thus HTK tools can be used in the construction and evaluation
of hidden Markov models for various speech related applications.

197
Chapter 12

Basics of Neural Networks

by
M. ARULSELVI
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

12.1 Introduction

Works on artificial neural networks (ANN) [29] commonly referred to as ”Neural Net-
works”. ANN is an information processing system that is inspired by the way biological
nervous system, such as the brain process information.

The brain is a highly complex, nonlinear and parallel computer (information pro-
cessing system) and it has the capability to organize its structural constituents known
as neurons, so as to perform computations (e.g., Pattern recognition, perception and
motor control) many times faster than the fastest digital computer in existence today.

A neural network is a machine that is designed to model the way in which the
brain performs a particular task or functions; the network is usually implemented
by using electronic components or is simulated in software on a digital computer.
To achieve good performance, neural networks employs a massive interconnections of
simple computing cells referred to as ”neurons” or ”processing units”.

Thus the definition of neural network is ”A neural network is a massively par-


allel distributed processor made up of simple processing units, which has a natural
capability for storing experimental knowledge and making it available for use”.

198
12.1.1 Structure and Learning Process of Human Brain

Neuron is the structural constituents of the brain. The human brain contains about 10
billion nerve cells (neurons). Each neuron connects to approximately 100-10000 other
neurons through transmitting electro-chemical signals. In the human brain learning
process is performed by nervous system [30]. The Fig. 12.1 shows the parts of a
biological neuron.

Fig. 12.1: Structure of a biological neuron

• Cell body/Soma : It is the central part of the neuron. The nervous system
prepare the data through neurons. Neurons have some special components to
input/output and process the data.
• Dendrites : It is a root like extension from soma. A neuron collects signals
from others through a host of fine structures are known as dendrites.
• Axon/Nerve fibers : A single axon which is a tubular extension from the cell
soma that carries an electrical signal away from the soma to another neuron
for processing. when a particular amount of input is received then cell fires.
It transmits signals through axon to other cell. Axon is responsible for data
output from the soma.
• Synapse : It is a joint place of axon and dendrites. Synapse is a biochemical
part, which converts a pre-synaptic electrical signal into a chemical signal and

199
then back into a post-synaptic electrical signal.

12.1.2 Structure and Learning Process of Artificial Neuron

A neuron with n input is shown in Fig. 12.2.The input x= (x1 , x2 , ..., xn ). The weight
vector can be written as w = (w1 , wi , ..., wn ). The net can be written in terms of the

Fig. 12.2: Structure of artificial neuron

inner product, or dot product net=xw is the argument of the transfer function f,
which produces the scalar output y. In mathematical terms, net is described as
n
X
net = wixi (12.1)
i=1

where n is the number of connections to neuron. To generate the final output y, one
of the activation functions given in Fig. 12.3 is used.

Types of Activation function : (a) Linear/Threshold function:


The threshold function is defined as
(
1, if net ≥ 0
y = f (net) = (12.2)
0, if net < 0

200
Fig. 12.3: Activation function

(b) Sigmoid/Squashing function:


The sigmoid function is defined as
y = f (net) = 1+e1−net

(c) Hyperbolic tangent function:


The tangent function is defined as
y=f(net)= tanh(net)

12.2 Perceptron

The system with collection of multiple inputs by a weight and single output is called
perceptron [29] as shown in Fig. 12.4. Perceptron is the simplest form of neural
network used for the classification of patterns that are linearly separable.

12.2.1 Perceptron Training Algorithm

1. Apply an input pattern and calculate the output y.


2. (a) If the output is correct go to step 1.
(b) If the output is incorrect and is -1, add each input to its corresponding

201
Fig. 12.4: Architecture of perceptron

weight.
(c) If the output is incorrect and is +1, subtract each input to its corresponding
weight.
3. Go to step1.

Perceptron training algorithm is also known as delta rule given by


δ = (t − y)
where t is target output and y is actual output.

There are three cases that can occur for a single neuron once an input vector x is
presented and the network output is calculated as:
Case 1: δ = 0,corresponds to step 2(a).
Case 2: δ > 0, corresponds to step 2(c).
Case 3: δ < 0, corresponds to step 2(b).

In algorithm, δ is multiplied by the value of each input xi and product is added to


the corresponding weights. To generalize learning rate coefficient η multiplies the δxi .
Symbolically
wi (n + 1) = wi (n) + ηδxi , 0 < η < 1 (12.3)

202
where wi (n+1) is the value of weight after adjustment and wi (n) is the value of
weight before adjustment.

12.2.2 Matlab Code for Perceptron

Function for Training the Perceptron


f u n c t i o n [ net ]= t r a i n p e r ( x , y )
net=newp ( minmax ( x ) , 1 , ’ ha r dlim ’ , ’ l e a r n p ’ ) ;
net . trainParam . epo chs = 5 0 ;
net = t r a i n ( net , x , y ) ;
z=sim ( net , x ) ;
return ;

Function for Testing the Perceptron


f u n c t i o n [ v ]= t e s t p e r ( x , net )
v= sim ( net , x ) ;
return ;

Let us consider an OR gate. The input of OR gate is defined in the file ”in.dat”.
The output of OR gate is in the file ”out.dat”. The command to execute the function
is explained as:
INPUT ( ” i n . dat ” )
0 0 1 1
0 1 0 1

OUTPUT ( ” out . dat ” )


0 1 1 1

Matlab command f o r e x e c u t i n g t he f u n c t i o n :
>> x=l o a d ( ‘ i n . dat ’ ) ;
>> y=l o a d ( ’ out . dat ’ ) ;
C a l l i n g t he f u n c t i o n t o t r a i n t he network :
>>[net ]= t r a i n p e r ( x , y )
C a l l i n g t he f u n c t i o n t o t e s t t he network :
>>[v]= t e s t p e r ( x , net ) ;

203
12.3 Backpropagation Neural Network

Backpropagation neural network (BPNN) [29] is a multi layer feedforward neural net-
work (i.e., propagating the error backward to adjust the weights). The basic idea is to
efficiently compute partial derivatives of an approximating function f(W,X) realized
by the network with respect to all the elements of the adjustable weights vector W for
a given value of input vector X. Fig. 12.5 shows the architecture of backpropagation
neural network.

Fig. 12.5: Architecture of backpropagation neural network

204
12.3.1 Backpropagation Training Algorithm

1. Select the next training pair from the training set and apply to the network.
2. Calculate the output of the network.
3. Calculate the error between the output of the network and the desired output.
4. Adjust the weights (V,W matrix) in such a way that it minimize the error.
5. Repeat steps 1 to 4 for all the training pairs.
6. Repeat steps 1 to 5 until the network recognizes the training set or for certain
number of iterations called epochs.1

The activation function used by BPNN training algorithm is sigmoid or squashing


or logistic function, and it is defined as

OUT = 1/(1 + e−N ET ) (12.4)

BPNN training algorithm uses the derivative of activation function and defined as
∂OU T
= OUT (1 − OUT ) (12.5)
∂N ET
BPNN training algorithm consists of two passes: (i)Forward pass and (ii)Reverse pass.
(i)Forward pass:
In this pass, output of the network is calculated as,

NET1j = x1 w11 + x2 w21 + ... + xn wn1 (12.6)


−N ET1j
OUT1j = 1/(1 + e ) (12.7)
NET1k = OUT11 v11 + OUT21 v21 + ... + OUTn1 vn1 (12.8)
OUT1k = 1/(1 + e−N ET1k ) (12.9)

This is repeated for all the neurons.


(ii)Reverse pass:
This pass consist of two parts. They are
(a) Adjusting the weights of the output layer:
To adjust the weights of the output layer generalized delta rule is used.

δqk = OUTqk (1 − OUTqk )(T arget − OUTqk ) (12.10)

where δqk is error for neuron q in the output layer k,


OUTqk is output of neuron q in the output layer k and
Target is required output.
1
An epoch is an application of all training pairs once to the network

205
The new weight of the V matrix is calculated as

Vpq (n + 1) = Vpq (n) + ηδqk OUTpj (12.11)

where Vpq (n+1) is new weight,


Vpq (n) is old weight and
η is learning or training rate coefficient.

(b)Adjusting the weights of the hidden layer:


n
X
δpj = OUTpj (1 − OUTpj ) δqk Vpq (12.12)
q=1

Where δpj is error for neuron p in the hidden layer j,


OUTpj is output of neuron p in the hidden layer j,
δqk is error neuron q in the output layer k and
Vpq is weight from neuron p in the hidden layer to neuron q in the output layer.
The new weight of the W matrix is calculated as

Wmp (n + 1) = Wmp (n) + ηδpj xm (12.13)

Where Wmp (n+1) is new weight,


Wmp (n) is old weight and
η is learning or training rate coefficient.

12.3.2 Matlab Code for BPNN

Function for Training the BPNN


f u n c t i o n [ net ]= trainBPN ( x , y )
[ n , i ]= s i z e ( x ) ;
[m, o ]= s i z e ( y ) ;
net=n e w f f ( minmax ( x ) , [ i , 1 0 ,m] , { ’ t a n s i g ’ , ’ t a n s i g ’ , ’ p u r e l i n ’ } , ’ t r a i n l m ’ ) ;
net . trainParam . epo chs = 5 0 ;
net . trainParam . l r = 0 . 2 ;
net = t r a i n ( net , x , y ) ;
r = sim ( net , x ) ;
return ;

Function for Testing the BPNN

206
f u n c t i o n [ v ]= testBPN ( x , net )
v= sim ( net , x ) ;
return ;

Let us consider a EX-OR gate. The input of EX-OR gate is defined in the file
’in.dat’. The output of EX-OR gate is defined in the file ’out1.dat’.
INPUT( ” i n . dat ” ’ )
0 0 1 1
0 1 0 1

OUTPUT( ” out1 . dat ” ’ )


0 1 1 0
Matlab command f o r e x e c u t i n g t he f u n c t i o n :
>> x=l o a d ( ‘ i n . dat ’ ) ;
>> y=l o a d ( ’ out1 . dat ’ ) ’
C a l l i n g t he f u n c t i o n t o t r a i n t he network :
>>[net ]= trainBPN ( x , y ) :
C a l l i n g t he f u n c t i o n t o t e s t t he network :
>>[v]= testBPN ( x , net ) ;

12.4 Probabilistic Neural Network

Probabilistic neural network (PNN) [16] is a feed-forward neural network that imple-
ments a Bayesian decision strategy for classifying input vectors. The Parzen window
method 2 can be implemented as a neural network known as a PNN. Suppose we
wish to form a parzen estimate based on n patterns, each of which is d -dimensional
randomly selected c classes. Fig. 12.6 shows the architecture of PNN.

PNN consist of d -input units comprising the input layer, where each unit is con-
nected to each of the pattern units. The connections from the input to pattern units
represent the modifiable weights which will be trained. Each category unit computes
the sum of the pattern units connected to it. Each pattern unit is in turn connected
to one and only one of the category unit.
2
Parzen window is one of the non-parametric technique for density estimation. It can be used to
approximate the probability density px of a continuous random variables x.

207
Fig. 12.6: Architecture of probabilistic neural network

12.4.1 Algorithm for PNN

Probabilistic neural network training and classification algorithms are given below:
(a)PNN Training Algorithm:
begin initialize j ← 0, n,aji ← 0 for j=1,...,n,i=1,...,c
do j ← j + 1
xjk ← xjk /(Σdi=1 x2ji )1/2
wjk ← xjk
if x ε wi then aji ← 1
until j=n
end

208
where wjk is weight from j th input to k th unit in the pattern layer,
aji is weight from j th unit in the pattern layer to ith unit in the category layer,
n is the number of patterns, c is the number of classes and
d is the dimension of the input.

(b)PNN Classification Algorithm:


A normalized test pattern x is placed at the input units. Each pattern unit k computes
the inner product to yield the net activation or simply net as
netk = wtk x
Each output unit sums the contributions from all pattern units connected to it. The
contribution of k th pattern unit to the ith category (output) unit is given by

aki = enetk −1 /σ 2 (12.14)

where σ is the parameter set by the user.

The algorithm is given below:


begin initialize k ← 0,x ← testpattern
do k ← k + 1
netk =wtk x
if aki =1 then gi ← gi + exp[(netk − 1)/σ 2
until k=n
return class ← arg maxi gi(x)
end

12.4.2 Matlab Code for 2-class Classification using PNN

%This Matlab code solves EX-OR problem


%For Pattern A data
Pattern A=[-1 -1;
1 1;]
For Pattern B data
Pattern B=[1 -1;
-1 1;]
% N is = No. of rows and columns of PatternA or PatternB
N=size(PatternA);
for i=1:N(1) %N(1)= No. of rows of PatternA or PatternB

209
sumA=0;
sumB=0;
for j=1:N(2) %N(2)= No. of columns of PatternA or PatternB
sumA=sumA+PatternA(i,j)*PatternA(i,j);
sumB=sumB+PatternB(i,j)*PatternB(i,j);
end
sumA1(i)=sumA;
sumB1(i)=sumB;
end
for i=1:N(1)
for j=1:N(2)
NormalizedA(i,j)=PatternA(i,j)/sqrt(sumA1(i));
NormalizedB(i,j)=PatternB(i,j)/sqrt(sumB1(i));
end
end

WeightsA=NormalizedA
WeightsB=NormalizedB

Testdata=[-1 -1;
1 1;
1 -1;
-1 1;]
N1=size(Testdata);
for i=1:N1(1)
sumTestdata=0;
for j=1:N1(2)
sumTestdata=sumTestdata+Testdata(i,j)*Testdata(i,j);
end
sumTestdata1(i)=sumTestdata;
end
for i=1:N1(1)
for j=1:N1(2)
NormalizedTestdata(i,j)=Testdata(i,j)/sqrt(sumTestdata1(i));
end
end
PatternAnet=NormalizedTestdata*WeightsA’;
PatternBnet=NormalizedTestdata*WeightsB’;
sigma=0.45;
PatternAout=exp((PatternAnet-1)/(sigma.*sigma));
PatternBout=exp((PatternBnet-1)/(sigma.*sigma));
for i=1:N1(1)
SumAout(i)=sum(PatternAout(i,:));
SumBout(i)=sum(PatternBout(i,:));
end

210
output=sign(SumAout-SumBout)
fid=fopen(’PNNOUTPUT.TXT’,’w’);

fprintf(fid,’----------------------------------------------\n’);
fprintf(fid,’CLASSIFICATION RESULTS ARE GIVEN BELOW \n’);
fprintf(fid,’----------------------------------------------\n’);
fprintf(fid,’Test Data No. X Value Y Value Class Value\n’);
fprintf(fid,’----------------------------------------------\n’);
fprintf(’--------------------------------------------------\n’);
fprintf(’CLASSIFICATION RESULTS ARE GIVEN BELOW \n’);
fprintf(’--------------------------------------------------\n’);
fprintf(’ Test Data No. X Value Y Value Class Value\n’);
fprintf(’--------------------------------------------------\n’);
for i=1:N1(1)
if(output(i)==1)
outclass=’A’;
else if (output(i)==-1)
outclass=’B’;
end
end
fprintf(fid,’10d +5.5f +5.5f c \n’,i,Testdata(i,1),Testdata(i,2),outclass);
fprintf(fid,’---------------------------------------------------\n’);
fprintf(’10d +5.5f +5.5f c \n’,i,Testdata(i,1),Testdata(i,2),outclass);
fprintf(’---------------------------------------------------\n’);
end;
fclose(fid}

The output of the above program is defined as:

PatternA =
-1 -1
1 1

PatternB =
1 -1
-1 1

WeightsA =
-0.7071 -0.7071
0.7071 0.7071

WeightsB =
0.7071 -0.7071

211
-0.7071 0.7071

Testdata =
-1 -1
1 1
1 -1
-1 1

output =
1 1 -1 -1

---------------------------------------------------
CLASSIFICATION RESULTS ARE GIVEN BELOW
---------------------------------------------------
Test Data No. X Value Y value Class Value
---------------------------------------------------
1 -1.00000 -1.00000 A
---------------------------------------------------
2 +1.00000 +1.00000 A
---------------------------------------------------
3 +1.00000 -1.00000 B
---------------------------------------------------
4 -1.00000 +1.00000 B
---------------------------------------------------

212
Chapter 13

Autoassociative Neural Network Model

by
Dr. S. PALANIVEL
Reader, Department of CSE,
Annamalai University.

13.1 Introduction

Neural network models such as backpropagation neural network (BPNN) and radial
basis function neural network (RBFNN) are used for pattern classification because of
their ability to capture the nonlinear hyperspace separating the classes in the feature
space. A special kind of backpropagation neural network called autoassociative neural
network (AANN) can be used to capture the distribution of feature vectors in the
feature space.

13.1.1 Characteristics of Autoassociative Neural Network Mod-


els

Autoassociative neural network models are feedforward neural networks performing an


identity mapping of the input space, and are used to capture the distribution of the
input data [1], [31]. The distribution capturing ability of the AANN model is described
in this section. Let us consider the five layer AANN model shown in Fig. 13.1, which
has three hidden layers.

In this network, the second and fourth layers have more units than the input layer.
The third layer has fewer units than the first or fifth. The processing units in the first
and third hidden layer are nonlinear, and the units in the second compression/hidden
layer can be linear or nonlinear. As the error between the actual and the desired
output vectors is minimized, the cluster of points in the input space determines the
shape of the hypersurface obtained by the projection onto the lower dimensional space.
Fig. 13.2(b) shows the space spanned by the one dimensional compression layer for the

213
2 4

Layer 1 5
3

. . . . .
. . . . .
. . . . .

Input layer Output layer


Compression
layer

Fig. 13.1: A five layer AANN model.

2 dimensional data shown in Fig. 13.2(a) for the network structure 2L 10N 1N 10N 2L,
where L denotes a linear unit and N denotes a nonlinear unit. The integer value
indicates the number of units used in that layer. The nonlinear output function for
each unit is tanh(s), where s is the activation value of the of the unit. The network
is trained using backpropagation training algorithm [32], [17]. The solid lines shown
in Fig. 13.2(b) indicate mapping of the given input points due to the one dimensional
compression layer. Thus, one can say that the AANN captures the distribution of the
input data depending on the constraints imposed by the structure of the network, just
as the number of mixtures and Gaussian functions do in the case of Gaussian mixture
models (GMM).

In order to visualize the distribution better, one can plot the error for each input
data point in the form of some probability surface as shown in Fig. 13.2(c). The error
ei for the data point i in the input space is plotted as pi = exp(−ei /α) , where α is a
constant. Note that pi is not strictly a probability density function, but we call the
resulting surface as probability surface. The plot of the probability surface shows a
large amplitude for smaller error ei , indicating better match of the network for that
data point. The constraints imposed by the network can be seen by the shape the
error surface takes in both the cases. One can use the probability surface to study the
characteristics of the distribution of the input data captured by the network. Ideally,
one would like to achieve the best probability surface, best defined in terms of some
measure corresponding to a low average error.

214
0.1 0.1
10 10
0.05 0.05

0 5 0 5
−4 −4
−2 0 −2 0
0 0
2 −5 2 −5
4 4

(a) (b)

(c)
Fig. 13.2: Distribution capturing ability of AANN model. From [1]. (a) Artifi-
cial 2 dimensional data. (b) 2 dimensional output of AANN model with the struc-
ture 2L 10N 1N 10N 2L. (c) Probability surfaces realized by the network structure
2L 10N 1N 10N 2L.

13.1.2 Matlab Implementation of Autoassociative Neural Net-


work

The autoassociative neural network is a special case of backpropagation neural net-


work which captures the distribution of input feature vectors. In AANN, the required
(desired) output is same as the input. The Matlab code for backpropagation neural
network (BPNN) can be used to capture the distribution of input feature vectors if
the required output in BPNN is same as the input feature vector.

13.1.3 Applications of Autoassociative Neural Network

13.1.3.1 Face Recognition

Automatic face recognition by machine can be categorized into face identification and
face authentication. The objective of a face identification system is to determine the
identity of a test subject (person) from the set of reference subjects. On the other hand,
a person authentication or verification system should accept or reject the identity claim
of a subject.

The distribution capturing ability of the AANN is analyzed for face recognition
in the laboratory environment for 50 subjects using a camera with a resolution of

215
Fig. 13.3: Real time facial feature extraction for varying size, orientation and background.

160 × 120 [33]. For enrolling a subject, the facial features are extracted from 300
face images with variations in size, orientation and pose of the face. The distribution
of the facial feature vectors is captured using an AANN model with the structure
73L 90N 30N 90N 73L, and the network is trained for 200 epochs. A separate AANN
is trained for each subject. Fig. 13.3 shows the real time facial feature extraction for
varying size, orientation and background.

For identification, ten feature vectors are extracted from ten consecutive frames
in the video. The feature vector is given as input to each of the model. The output
of the model is compared with the input to compute the normalized squared error.
ky −ok2
The normalized squared error (e) for the feature vector y is given by, e= ky k2 ,
where o is the output vector given by the model. The error (e) is transformed into
a confidence score (c) using c = exp(−e). The average confidence score is calculated
for each model. The identity of the subject is decided based on the highest confidence
score. The identification performance is measured in terms of recognition rate. The
confidence scores from the models can be used to find the similarity of a subject with
other subjects.

For authentication, the feature vector is given as input to the claimant model,
and the confidence score is calculated. The claim is accepted if the confidence score is
greater than a threshold, otherwise the claim is rejected.

Fig. 13.4 shows the snapshot of the real time person recognition system. The person
recognition system detect the face, determines the locations of the eyes, extracts the
facial features and calculates the confidence score in real time at about 9 frames/s on
a PC with 2.3 GHz CPU. The performance of the person recognition system must be
invariant to size of the face, background, orientation and pose of the face, and lighting

216
Fig. 13.4: Snapshot of the real time person recognition system

conditions, in order to use it for commercial applications. This method is not sensitive
to the size of the face, its position in the image and its background, and orientation of
the face. It is also not sensitive to the pose of the face as long as the eye regions are
visible. The method is less sensitive to variation in the image brightness. However,
the method is sensitive to shadows, variation in lighting conditions and profile view of
the face.

The structure of AANN model plays an important role in capturing the distri-
bution of the feature vectors. The number of units in the third layer (compression
layer) determines the number of components (similar to principal components in eigen
analysis) captured by the network. The AANN model projects the input vectors onto
the subspace spanned by the number of units in the compression layer. If there are nc
units in the compression layer, then the facial feature vectors are projected onto the

217
subspace spanned by nc components to realize them at the output layer. Similarly,
the performance can be obtained by varying the number of units in the second layer
(expansion layer) keeping the number of units in the compression layer to 30.

13.1.3.2 Speaker Authentication

Speaker recognition can be categorized into speaker identification and speaker verifi-
cation or authentication. The objective of a speaker identification system is to decide
the identity of a speaker based on the speaker’s voice, from a set of n speakers. On
the other hand, a speaker verification or authentication system should accept or reject
the identity claim of a speaker using the speaker’s voice. Speaker identification is a
one-to-many matching, whereas speaker verification is a one-to-one matching.

The five layer autoassociative neural network model as shown in Fig. 13.1 can
be used capture the distribution of the acoustic feature vectors as explained in Sec-
tion 13.1.1. The AANN model with the structure 19L 38N 4N 38N 19L is used for
capturing the distribution of the acoustic features of a subject [33]. For testing the
identity claim, each acoustic feature vector extracted from the test utterance is given
as input to the claimant speaker model. The output of the model is compared with
its input to compute the normalized error. The normalized error (ei ) for the ith fea-
ky −o k2
ture vector (y i ) is given by, ei = kiy k2i , where oi is the output vector given by the
i
model. The error (ei ) is transformed into a confidence score using ci = exp(−ei ). If
nf
P
the average confidence score c = n1f ci is greater than a threshold, then the claim
i=1
is accepted, otherwise the claim is rejected, where nf is number of acoustic feature
vectors in the test utterance.

The performance of speaker verification system is evaluated for the television


broadcast audio news data. The speech signal is recorded for 60 sec at 8000 sam-
ples per second. For enrolling a speaker, the differenced speech signal is analyzed by
dividing it into frames of 20 msec, with a shift of 10 msec. A 14th order LP analysis
is used to capture the properties of the signal spectrum. The recursive relation be-
tween the predictor coefficients and cepstral coefficients is used to convert the 14 LP
coefficients into 19 cepstral coefficients. The LP coefficients for each frame is linearly
weighted to form the WLPCC. The distribution of the 19 dimensional WLPCC fea-
ture vectors in the feature space is captured using an AANN model. Separate AANN
models are used to capture the distribution of feature vectors of each speaker.

For testing the identity claim of a subject, the speech signal is recorded for 10 sec,

218
one month after collecting the training data. The feature vectors extracted from the
test utterance are given as input to the model and the average confidence score (c)
is calculated. If the confidence score is greater than a threshold, then the claim is
accepted, otherwise the claim is rejected.

219
Chapter 14

Bayesian Belief Networks, Classification


and Regression Tree

by
G. ARULSELVI
Lecturer (Senior Scale),
Department of CSE, Annamalai University.

14.1 Bayesian Belief Networks

14.1.1 Introduction

Bayesian belief networks (BBN) are powerful tools for modeling causes and effects in a
wide variety of domains. Bayesian belief networks are also known as ”belief networks”,
”causal probabilistic networks”, ”causal nets”, and ”graphical probability networks.
They are compact networks of probabilities that capture the probabilistic relationship
between variables, as well as historical information about their relationships.

Bayesian belief networks are very effective for modeling situations where some
information is already known and incoming data is uncertain or partially unavailable
(unlike rule-based or expert systems, where uncertain or unavailable data results in
ineffective or inaccurate reasoning).

These networks also offer consistent semantics for representing causes and effects
(and likelihoods) via an intuitive graphical representation. Because of all of these
capabilities, Bayesian belief networks are being increasingly used in a wide variety of
domains where automated reasoning is needed.

An important fact to realize about Bayesian belief networks is that they are not de-
pendent on knowing exact historical information or current evidence. That is, Bayesian

220
belief networks often produce very convincing results when the historical information
in the conditional probability tables or the evidence known is inexact. Given that hu-
mans are excellent at vague linguistic representations of knowledge (for example, it will
probably rain tomorrow), and less adept and providing specific estimates, the ability
to be effective despite the unexpected input information is particularly advantageous.
This robustness in the face of imperfect knowledge is one of the many reasons why
Bayesian belief nets are increasingly used as an alternative to other AI representational
formalisms.

In simpler terms, a Bayesian belief network is a model. It can be a model of


anything: the weather, a disease and its symptoms, a military battalion, even a garbage
disposal.

Belief networks are especially useful when the information about the past and/or
the current situation is vague, incomplete, conflicting, and uncertain. And also used for
modeling knowledge in gene regulatory networks, medicine, engineering, text analysis,
image processing, data fusion, and decision support systems.

14.1.2 Why Bayesian Networks?

First, Bayesian networks handle incomplete data sets without difficulty because they
discover dependencies among all variables [34]. When one of the inputs is not observed,
most models will end up with an inaccurate prediction. That is because they do not
calculate the correlation between the input variables. Bayesian networks suggest a
natural way to encode these dependencies.

Second, one can learn about causal relationships by using Bayesian networks. In
the presence of intervention, one can make predictions with the knowledge of causal
relationships.

Third, considering the Bayesian statistical techniques, Bayesian networks facili-


tate the combination of domain knowledge and data. Prior or domain knowledge is
crucially important if one performs a real-world analysis; in particular, when data
is inadequate or expensive. Additionally, Bayesian networks encode the strength of
causal relationships with probabilities. Therefore, prior knowledge and data can be
put together with well-studied techniques from Bayesian statistics.

221
14.1.3 Representation of BBN

Causal relations also have a quantitative side, namely their strength. This is expressed
by attaching numbers to the links. Let the variable A be a parent of the variable B in
a causal network. Using probability calculus, it will be normal to let the conditional
probability, P (B|A) be the strength of the link between these variables. On the other
hand, if the variable C is also a parent of the variable B, then conditional probabilities
P (B|A) and P (B|C) do not provide any information about the impacts of the inter-
action of the variable A and the variable B [34]. They may cooperate or counteract in
various ways. Therefore, the specification of P (B|A, C) is required.

For causal networks no calculus coping with feedback cycles has been developed.
Therefore, it is necessary for the network not to contain cycles. Thus, A Bayesian
network consists of the following elements:

• A set of variables and a set of directed edges between variables,


• Each set contains a finite set of mutually exclusive states,
• The variables coupled with the directed edges construct a directed acyclic graph
(DAG),
• Each variable A with parents B1 , B2 , · · · , Bn has a conditional probability table
P (A|B1 , B2 , · · · , Bn ) associated with it.

If the variable A does not have any parent, then the table can be replaced by
the unconditional probabilities P(A). A graph is acyclic if there is no directed path
A1 → . . . → An such that A1 = An . For the directed acyclic graph in Fig. 14.1 the
prior probabilities P(A) and P(B) have to be specified. P(A) and P(B) are called
simple probability [34].

14.1.4 Bayes Rule, Beliefs and Evidence

Bayesian belief networks are based on Bayes rule. Bayes rule can be expressed as:

P (A|B)P (B)
P (B|A) = (14.1)
P (A)

where P(A) is the probability of A, and P (A|B) is the probability of A given that
B has occurred.

222
Fig. 14.1: Simple belief net

Beliefs are the probability that a variable will be in a certain state based on the
addition of evidence in a current situation. A-priori beliefs are a special case of beliefs
that are based only on prior information. A-priori beliefs are determined only by the
information stored in the conditional probability tables CPT. Evidence is information
about a current situation.

14.1.5 Examples for BBN

Fig. 14.2: BBN for weather condition

Example 1: A Bayesian belief network is a model that represents the possible


states of a given domain. A Bayesian belief network also contains probabilistic rela-

223
tionships among some of the states of the domain [35]. In the simple model shown in
Fig. 14.2, the sky is either sunny or cloudy. Whether it is raining or not depends on
cloudiness. The grass can be wet or dry, and the sprinkler can be on or off. There
is also some causality : If the weather is rainy, it will make the grass wet directly.
However, sunny weather can also make the grass wet indirectly, by causing a home
owner to turn on the sprinkler.

When probabilities are entered into this Bayesian belief network that represent real
world weather and sprinkler usage, this belief network can be used to answer questions
like the following: If the lawn is wet, was it more likely to be caused by rain or by the
sprinkler? How likely is it that I will have to water my lawn on a cloudy day?

The probability of any node in the Bayesian belief network being in one state or
another without current evidence is described using a conditional probability table, or
CPT. Probabilities on some nodes are affected by the state of other nodes, depending
on causality. Prior information about the relationships among nodes may indicate that
the likelihood that a node is in one state is dependent on another nodes state.

Using the Baye’s rule we can determine the probability of any configuration of
variables in the joint distribution which is explained by the following two cases:
Case 1: If prior probabilities and conditional probabilities are known we can determine
any entry in the joint probability distribution [16]. Let us consider the problem of
classifying fish which is shown in Fig. 14.3.

If we can determine the value of any entry in the joint probability, for instance the
probability that fish was caught in the summer in the north atlantic and is a sea bass
that is dark and thin:

P (a3, b1, x2, c3, d2) = P (a3)P (b1)P (x2|a3, b1)P (c3|x2)P (d2|x2) (14.2)

Matlab code for finding joint probability for fish net


f u n c t i o n [ r ]=bbn ( )
aa=l o a d ( ’ a . dat ’ ) ;
bb=l o a d ( ’ b . dat ’ ) ;
xx=l o a d ( ’ x . dat ’ ) ;
cc=l o a d ( ’ c . dat ’ ) ;
dd=l o a d ( ’ d . dat ’ ) ;
s=i n p u t ( ’ Enter t he Season \ n 1− Winter ,2− S p r i n g ,3−Summer,4−Autumn : ’ ) ;
i f s==1

224
Fig. 14.3: BBN for fish

r 1 = aa ( 1 )
e l s e i f s==2

225
r 1=aa ( 2 )
e l s e i f s==3
r 1=aa ( 3 )
e l s e i f s==4
r 1=aa ( 4 )
end
l=i n p u t ( ’ \ n Enter t he L o c a t i o n \ n 1− North a t l a n t i c a ,2− South a t l a n t i c a : ’ )
i f l==1
r 2 = bb ( 1 )
e l s e i f l==2
r 2=bb ( 2 )
end
f=i n p u t ( ’ \ n Enter t he F ish type \ n 1−Salmon ,2− Sea b o s s : ’ ) ;
i f ( s==1)&&( l ==1)&&( f ==1)
r 3=xx ( 1 , 1 )
e l s e i f ( s==1)&&( l ==2)&&(f ==1)
r 3=xx ( 1 , 2 )
e l s e i f ( s==2)&&( l ==1)&&(f ==1)
r 3=xx ( 1 , 3 )
e l s e i f ( s==2)&&( l ==2)&&(f ==1)
r 3=xx ( 1 , 4 )
e l s e i f ( s==3)&&( l ==1)&&(f ==1)
r 3=xx ( 1 , 5 )
e l s e i f ( s==3)&&( l ==2)&&(f ==1)
r 3=xx ( 1 , 6 )
e l s e i f ( s==4)&&( l ==1)&&(f ==1)
r 3=xx ( 1 , 7 )
e l s e i f ( s==4)&&( l ==2)&&(f ==1)
r 3=xx ( 1 , 8 )
e l s e i f ( s==1)&&( l ==1)&&(f ==2)
r 3=xx ( 2 , 1 )
e l s e i f ( s==1)&&( l ==2)&&(f ==2)
r 3=xx ( 2 , 2 )
e l s e i f ( s==2)&&( l ==1)&&(f ==2)
r 3=xx ( 2 , 3 )
e l s e i f ( s==2)&&( l ==2)&&(f ==2)
r 3=xx ( 2 , 4 )

226
e l s e i f ( s==3)&&( l ==1)&&(f ==2)
r 3=xx ( 2 , 5 )
e l s e i f ( s==3)&&( l ==2)&&(f ==2)
r 3=xx ( 2 , 6 )
e l s e i f ( s==4)&&( l ==1)&&(f ==2)
r 3=xx ( 2 , 7 )
e l s e i f ( s==4)&&( l ==2)&&(f ==2)
r 3=xx ( 2 , 8 )
end
l i =i n p u t ( ’ Enter t he L i g h t n e s s \ n 1− L i g h t ,2−Medium,3−Dark : ’ ) ;
i f ( f ==1)&&( l i ==1)
r 4=cc ( 1 , 1 )
e l s e i f ( f ==2)&&( l i ==1)
r 4=cc ( 1 , 2 )
e l s e i f ( f ==1)&&( l i ==2)
r 4=cc ( 2 , 1 )
e l s e i f ( f ==2)&&( l i ==2)
r 4=cc ( 2 , 2 )
e l s e i f ( f ==1)&&( l i ==3)
r 4=cc ( 3 , 1 )
e l s e i f ( f ==2)&&( l i ==3)
r 4=cc ( 3 , 2 )
end
t=i n p u t ( ’ Enter t he T h i c k n e s s o f F ish \ n 1−Wide,2−Thin : ’ ) ;
i f ( f ==1)&&(t ==1)
r 5=dd ( 1 , 1 )
e l s e i f ( f ==2)&&(t==1)
r 5=dd ( 1 , 2 )
e l s e i f ( f ==1)&&(t ==2)
r 5=dd ( 2 , 1 )
e l s e i f ( f ==2)&&(t==2)
r 5=dd ( 2 , 2 )
end
r e s=r 1 ∗ r 2 ∗ r 3 ∗ r 4 ∗ r 5 ;
d i s p ( ’ The p r o b a b i l i t y o f t he g i v e n c o n d i t i o n s ’ )
disp ( res )
return

227
The conditional probability tables shown in Fig. 14.3 are defined in the file a.dat
for season, b.dat for location, x.dat for fish type, c.dat for lightness and d.dat for
thickness.
Inputs :
”a . dat ” :
0.25 0.25 0.25 0.25
”b . dat ” :
0.6 0.4
”x . dat ” :
0.5 0.7 0.6 0.8 0.4 0.1 0.2 0.3
0.5 0.3 0.4 0.2 0.6 0.9 0.8 0.7
” c . dat ” :
0.6 0.2
0.2 0.3
0.2 0.5
”d . dat ” :
0.3 0.6
0.7 0.4
Matlab command f o r e x e c u t i n g t he f u n c t i o n :
>>[r ]=bbn ( )

Case 2: BBN is also used to determining the probability of its variables [16].
Suppose we wish to determine the probability distribution over the variables d1 , d2 , . . .
at D in the left network of Fig. 14.4 using the conditional probability tables and the
network topology. We evaluate this by summing the full joint distribution, P(a, b, c,
d), over all the variables other than d :

P
P (d) = a,b,c P (a, b, c, d)
P
= a,b,c P (a)P (b|a)P (c|b)P (d|c)
X X X
= P (d|c) P (c|b) P (b|a)P (a)
c b
|a {z } (14.3)
P (b)
| {z }
P (c)
| {z }
P (d)

If we want to find out the probability of a particular value of D, for instance d2 ,


then

228
Fig. 14.4: BBN - linear and loop structure

X
P (d2) = P (a, b, c, d2) (14.4)
a,b,c

Now we compute the probabilities of the variables at H in the network with the
loop on the right of Fig. 14.4.

X
P (h) = P (e, f, g, h) (14.5)
e,f,g
X
= P (e)P (f |e)P (g|e)P (h|f, g) (14.6)
e,f,g
X X
= P (e)P (f |e)P (g|e) P (h|f, g) (14.7)
e f,g

14.2 Classification and Regression Tree

14.2.1 Introduction

Automating the learning process is one of the main goal of artificial intelligence and its
more recent specialization, machine learning. The ability to learn from examples has
found numerous applications in the scientific and business communities , applications
include scientific experiments, medical diagnosis, fraud detection, credit approval, and
target marketing since it allows the identification of interesting patterns or connections

229
either in the examples provided or, more importantly, in the natural or artificial process
that generated the data (Attributes). Attributes whose domain is numerical are called
numerical attributes, whereas attributes whose domain is not numerical are called
categorical attributes [16]. According to the attributes there are two types of learning
tasks: unsupervised and supervised learning.

They differ in the semantics associated with the attributes of the learning examples
and their goals. The general goal of unsupervised learning is to find interesting patterns
in the data, patterns that are useful for a higher level understanding of the structure
of the data. Types of interesting patterns that are useful are: groupings or clusters in
the data as found by various clustering algorithms and frequent item-sets.

Unsupervised learning techniques usually assign the same role to all the attributes.
Supervised learning tries to determine a connection between a subset of the attributes,
called the inputs or attribute variables, and the dependent attribute or outputs. Two
of the central problems in supervised learning are classification and regression.

Both unsupervised and supervised problems have as goal the construction of a


succinct model that can predict the value of the dependent attribute from the attribute
variables. The difference between the two tasks is the fact that the dependent attribute
is categorical for classification and numerical for regression.

There are many classification and regression models have been proposed-neural
networks, genetic algorithms, Bayesian methods , log-linear models and other statisti-
cal methods , decision tables, and tree structured models, so-called classification and
regression trees.

14.2.2 Decision Tree

A decision tree is a visual representation of a problem. A decision tree helps decompose


a complex problem into smaller, more manageable undertakings. This allows the
decision-makers to make smaller determinations along the way to achieve the optimal
overall decision. Decision tree analysis is a formal, structured approach to making
decisions.

Decision tree is a common and intuitive to classify a pattern through a sequence


of questions in which the next question is depends upon the answer to the current
question. But this approach is useful for nonmetric data only. Because all the questions
can be asked in a yes/no (or) true/false style.

230
Such a sequence of questions is displayed in a directed decision tree (or) tree. This
tree can be used for classification. The classification of a particular pattern begins at
the root node - which asks for the value of a particular property of the pattern.

The root node has different possible value and so it has different links. Based on
the answer we follow the appropriate link to a subsequent node. But we follow only one
link. Continue the process until we reach a leaf node, which has no further question.
Each leaf node having a category label, and the test pattern is assigned the category
of the leaf node reached.

14.2.2.1 Advantages of Decision Trees

Train fast, Evaluate fast, Compact models, Intelligible if small, Do feature selection,
Don’t use all features, Experts understand/accept them, Easy to convert to rules, Can
handle missing values.

14.2.2.2 Disadvantages of Decision Trees

Not good at regression (predicting continuous values), Not good at non-axis parallel
splits, Trees for problems with continuous attributes can be large, Large trees are not
intelligible, Split ordering often counterintuitive to experts, Not good at learning from
many inputs (e.g., pixels), The tree is very broad and depth. Therefore if we want to
search any element in the tree, it is complicated and it takes much time.

14.2.3 Types

Decision tree has three other names:

Classification tree analysis is a term used when the predicted outcome is the
class to which the data belongs.

Regression tree analysis is a term used when the predicted outcome can be
considered as a real number (e.g. the price of a house, or a patients length of stay in
a hospital).

CART analysis is a term used to refer to both of the above procedures.

231
14.2.4 Classification and Regression Tree

CART is a binary recursive tree [16]. The tree will progressively split the set of train-
ing data into smaller and smaller subsets. If all the samples in each subset had the
same category label then it would be ideal. In that case, we say that each subset was
pure1 and could terminate that portion of the tree. Usually for each branch we will
decide to either stop splitting and accept an imperfect decision (or) to select another
property and grow the tree further.
The number of splits at a node is related to the property to be tested at each node.
The root node splits the full training set. For nonnumerical data, there is a problem
for geometrical interpretation of how the query at a node splits the data.
But for numerical data, we can easily visualize the decision boundaries that are pro-
duced by decision tree. That decision boundaries are perpendicular to the co-ordinate
axes. Therefore, a property query T at each node N makes the data to reach the
immediate descendant nodes as pure as possible.
Classification tree is built through a process known as binary recursive partitioning.
This is an iterative process of splitting the data into partitions, and then splitting it
up further on each of the branches.

14.2.4.1 Finding the Initial Split

The process starts with a training set consisting of pre-classified records. Pre-classified
means that the target value, or dependent variable, has a known class or label. The
goal is to build a tree that distinguishes among the classes. For simplicity, assume
that there are only two target classes and that each split is binary partitioning. The
splitting criterion easily generalizes to multiple classes, and any multi-way partitioning
can be achieved through repeated binary splits [16]. Every possible split is tried and
considered, and the best split is the one which produces the largest decrease in diversity
of the classification label within each partition (this is just another way of saying ”the
increase in homogeneity”). This is repeated for all values, and the winner is chosen as
the best splitter for that node. The process is continued at the next node and, in this
manner, a full tree Fig. 14.5 is generated for the given training data Table. 14.1.
1
data with unique property

232
Table 14.1: Training data for CART

14.2.4.2 Pruning the Tree

An alternate approach to stop splitting is pruning. In this method, a tree is grown


fully until leaf nodes have minimum impurity. After that, all pairs of neighboring leaf
nodes are considered for elimination [16]. Any pair whose elimination yields a small
increase in impurity is eliminated and the common parent node is declared a leaf. Such
merging (or) joining of the two leaf nodes is the inverse of splitting.

In pruning, directly use all information in the training set. For small problems the
computational cost is low. But it is not possible for larger problem, because of using
all information.

Pruning is the process of removing leaves and branches to improve the performance
of the decision tree when it moves from the training data (where the classification is
known) to real-world applications (where the classification is unknown-it is what you
are trying to predict). The tree makes the best split at the root node where there
are the largest number of records and, hence, a lot of information. Each subsequent
split has a smaller and less representative population with which to work. Toward
the end, training records at a particular node display patterns that are peculiar only

233
Fig. 14.5: Decision region and unpruned Classification tree

to those records. These patterns can become meaningless and sometimes harmful for
prediction if you try to extend rules based on them to larger populations.

For example, say the classification tree is trying to predict height and it comes to
a node containing one tall person named X and several other shorter people. It can
decrease diversity at that node by a new rule saying ”people named X are tall” and
thus classify the training data.

Pruning methods solve this problem-they let the tree grow to maximum size, then
remove smaller branches that fail to generalize.

Rather than purity, we calculate the impurity. Let i(N) denote the impurity of
a node N. If all the patterns that reach the node bear the same category label then
i(N)to be 0. The popular measurement of impurity is entropy impurity.

X
i(N) = − P (wj )log2 P (wj ) (14.8)
j

-where P (wj ) is the fraction of patterns at node N that are in category wj .

234
In Fig. 14.5 we apply number of search of the n-1 positions for the x1 feature and
n-1 position for the x2 feature we find the greatest reduction in the impurity occurs
near x1s = 0.6, and hence this becomes the decision criterion at the root node. Then
continue for each subtree until each final node represent a single category (has the
lowest impurity, 0).

Suppose if all the patterns are of the same category then impurity is 0. Otherwise,
impurity is positive value. Therefore to choose the query that decreases the impurity
as much as possible. The impurity reduction corresponds to an information obtained
by the query.

Since the tree is grown from the training data set, when it has reached full structure
it usually suffers from over-fitting (i.e. it is ”explaining” random elements of the
training data that are not likely to be features of the larger population of data). This
results in poor performance on real life data. Therefore, it has to be pruned using the
validation data set. If pruning were invoked in Fig. 14.5, the pair of leaf nodes at the
left would be the first to be deleted (gray shading) because the impurity is increased
the least. If the w2 point marked ∗ in the Fig. 14.5 is moved slightly, the decision
region and tree differ significantly, as shown in Fig. 14.6.

Fig. 14.6: Decision region and classification tree

235
In Fig. 14.6 we find the greatest reduction in the impurity occurs near x2s = 0.33,
and hence this becomes the decision criterion at the root node.

Matlab code for Fig. 14.6:


f u n c t i o n [ t ]= c a r t ( x1 , x2 )
i f ( x2 <0.33)
i f ( x2 <0.09)
disp ( ’ black ’ )
else
d i s p ( ’ r ed ’ )
end
e l s e i f ( x1 <0.6)
disp ( ’ black ’ )
e l s e i f ( x1 <0.69)
d i s p ( ’ r ed ’ )
else
disp ( ’ black ’ )

end
return

The following function will display the corresponding value (i,e.,black or red), after
giving the input values x and y.
Matlab command f o r e x e c u t i n g t he f u n c t i o n :
>> x = 0 . 3 8 ;
>> y = 0 . 7 0 ;
C a l l i n g t he f u n c t i o n t o t r a i n t he network :
>>[t ]= c a r t ( x , y )

236
Appendix A

A.1 C++ Code for Audio Processing

// C++ code for audio processing


// The program performs the following functions
// i) reads an audio file (.wav file) into an array
// ii) creates an audio file (.wav file) from the array of values
// iii) Computes the difference signal
// Written by Dr. S. Palanivel for the AICTE sponsored
// SDP on "Pattern Classification Techniques for Audio and
// Video Processing", 23-11-2009 to 04-12-2009.

#include<fstream.h>
#include<stdio.h>
#define SF 8000
int samples=0;
// Creating raw file (sample values) from the wave file
// Assumes 16-bits per sample
void waveread(char *wfname, char *rfname, int header)
{
char lb,hb;
int temp;
ifstream fp1;
ofstream fp2;
fp1.open(wfname,ios::binary);
fp2.open(rfname,ios::out);
// Waverformat Header
for(int i=0;i<header;i++)fp1.read(&lb,1);
// Reading first sample value (16-bits i.e., 2 bytes )
// Lower byte is in lb and higher byte is in hb.
fp1.read(&lb,1);
fp1.read(&hb,1);
temp=lb;
temp=temp&0x00ff;
temp=hb*256|temp;
while(!fp1.eof())
{
fp2<<temp<<endl;
fp1.read(&lb,1);
fp1.read(&hb,1);
temp=lb;
temp=temp&0x00ff;
temp=hb*256|temp;
samples++;
}
fp1.close();
fp2.close();
cout<<"\n\nNumber of samples = "<<samples<<endl;
}

// creating wave file from the raw data in an array


// Assumes 8000 samples/sec and 16-bits per sample

void wavewrite(int *s,char *wfname)


{
ifstream fpheader;
ofstream fpwave;

fpheader.open("header.wav",ios::in);
if(!fpheader)
{
cout<<"Header file is not available in the current directory";
return;
}
fpwave.open(wfname,ios::binary);
char header[44];
fpheader.read(header,44);
fpheader.close();
fpwave.write(header,44);
int sampfreq=SF;
short numbits=16;
int datasize,chunksize;
datasize=samples*2;
chunksize=36+samples*2;
fpwave.seekp(4);
fpwave.write(reinterpret_cast<char *> (&chunksize),sizeof(int));
fpwave.seekp(24);
fpwave.write(reinterpret_cast<char *> (&sampfreq), sizeof(int));
fpwave.seekp(34);
fpwave.write(reinterpret_cast<char *> (&numbits),sizeof(short));
fpwave.seekp(40);
fpwave.write(reinterpret_cast<char *> (&datasize),sizeof(int));

238
char lb,hb;
fpwave.seekp(44);

for(int i=0;i<samples;i++)
{
lb=(char)(s[i] & 0x00ff);

s[i]=s[i] & 0xff00;

hb=char(s[i] >> 8);

fpwave.write(&lb,sizeof(char));

fpwave.write(&hb,sizeof(char));
}

fpwave.close();

int main()
{
FILE *fp;
int i, *s1,*s2;
char name[30],fname[30];
cout <<"Enter the name of wave file (without extension):";
cin>>name;
sprintf(fname,"%s.wav",name);
waveread(fname,"raw.dat",44); //Reading wave file

fp=fopen("raw.dat","r");
s1=new int[samples];
s2=new int[samples];

for(i=0;i<samples;i++)
fscanf(fp,"%d",&s1[i]);

for(i=1;i<samples;i++) // Speech preemphasis (Difference signal)


s2[i]=s1[i]-s1[i-1];

sprintf(fname,"%sdiff.wav",name);
wavewrite(s2,fname); // Creates wave file
}

239
Appendix B

B.1 C++ Code for Processing Gray (PGM) and


Color (PPM) Images

// C++ code for processing gray (PGM) and color(PPM) images


// The program performs the following functions
// i) reading gray image and writing its negative
// ii) reading color image and writing its negative
// iii) Writing gray image for the given color image

// Written by Dr. S. Palanivel for the AICTE sponsored


// SDP on "Pattern Classification Techniques for Audio and
// Video Processing",23-11-2009 to 04-12-2009.
#include <string.h>
#include<fstream.h>
#include<stdio.h>
#include<stdlib.h>
int i,j;
class image
{
private:
unsigned char **grayplane,**redplane,**greenplane,**blueplane;
int height,width,depth;
public:
image(){grayplane=NULL;redplane=NULL;greenplane=NULL;blueplane=NULL;}
image(int width,int height)
{
grayplane = new unsigned char*[height];
for(i=0;i<height;i++) grayplane[i] = new unsigned char[width];
redplane = new unsigned char*[height];
for(i=0;i<height;i++) redplane[i] = new unsigned char[width];
greenplane = new unsigned char*[height];
for(i=0;i<height;i++) greenplane[i] = new unsigned char[width];
blueplane = new unsigned char*[height];
for(i=0;i<height;i++)blueplane[i] = new unsigned char[width];
}
int getheight() { return height;}
int getwidth() { return width;}

unsigned char getgraypixval(int i,int j){return(grayplane[i][j]);}


void setgraypixval(int i,int j,int value){grayplane[i][j]=value;}

unsigned char getredpixval(int i,int j){return(redplane[i][j]);}


void setredpixval(int i,int j,int value){redplane[i][j]=value;}

unsigned char getgreenpixval(int i,int j){return(greenplane[i][j]);}


void setgreenpixval(int i,int j,int value){greenplane[i][j]=value;}

unsigned char getbluepixval(int i,int j){return(blueplane[i][j]);}


void setbluepixval(int i,int j,int value){blueplane[i][j]=value;}

void readpgm(char* fileName);


void writepgm(char* fileName);

void readppm(char* fileName);


void writeppm(char* fileName);

void rgb2gray();

~image()
{
if(grayplane){
for(i=0;i<height;i++)delete [] grayplane[i];
delete [] grayplane;}
if(redplane){
for(i=0;i<height;i++)delete [] redplane[i];
delete [] redplane;
for(i=0;i<height;i++)delete [] greenplane[i];
delete [] greenplane;
for(i=0;i<height;i++)delete [] blueplane[i];
delete [] blueplane;}
}

};

// reading gray image

void image::readpgm(char* fileName)


{

FILE* fp = fopen(fileName, "rb");


if (fp==NULL)

241
{
printf("\n File %s does not exist \n", fileName);
return;
}

printf("\n\n Reading PGM image %s\n", fileName);

char* header = new char[100];


char* comment=new char[200];

fscanf(fp,"%s",header); // Reading Header


fscanf(fp, "%s", comment); // Reading comment

if (!strncmp(comment,"#",1)==0)
{
printf ("\n No comment tag..");
width = atoi(comment);
}
else
{
while (fgetc(fp)!=’\n’); // traversing comment until ’\n’ is found
fscanf(fp, "%u", &width);
}
fscanf(fp,"%d",&height);
printf ("\n Width = %u, Height = %u\n", width, height);
fscanf(fp,"%d",&depth);

grayplane = new unsigned char*[height]; // memory allocation


for(i=0;i<height;i++)grayplane[i] = new unsigned char[width];

if (( strcmp(header,"P2")) ==0) // reading ascii data


{
printf("\n Reading ascii data...\n");
for (i=0; i<height; i++)
for (j=0; j<width; j++)
fscanf(fp,"%u", &grayplane[i][j]);
}

else if (( strcmp(header,"P5")) ==0) // reading binary data


{
printf("\n Reading binary data...\n");
while (fgetc(fp)!=’\n’);
for(i=0; i< height; i++)
for (j=0; j< width; j++)
fread(&grayplane[i][j],1,1,fp);

242
}
else
printf (" \n Not in a readable format; header = %d\n", header);

delete[] header;
delete[] comment;
fclose(fp);
}

// writing gray image

void image::writepgm(char* fileName)

{
FILE* fp = fopen(fileName, "wb");
printf("\n Writing the image %s\n", fileName);
fprintf(fp, "P5\n"); // writing header
fprintf(fp, "# file created by S.PAL\n"); // writing comment..
fprintf(fp, "%u %u\n", width, height); // writing width, height
fprintf(fp, "%u\n", 255); // writing depth
for (i=0; i< height; i++)
for (j=0; j< width; j++)
fwrite(&grayplane[i][j],1,1,fp); //writing intensity values
fclose(fp);
}

// reading color image

void image::readppm(char* fileName)


{

FILE* fp = fopen(fileName, "rb");


if (fp==NULL)
{
printf("\n File %s does not exist \n", fileName);
return;
}

printf("\n\n Reading PPM image %s\n", fileName);

char* header = new char[100];


char* comment=new char[200];

243
fscanf(fp,"%s",header); // Reading Header
fscanf(fp, "%s", comment); // Reading comment

if (!strncmp(comment,"#",1)==0)
{
printf ("\n No comment tag..");
width = atoi(comment);
}
else
{
while (fgetc(fp)!=’\n’); // traversing comment until ’\n’ is found
fscanf(fp, "%u", &width);
}
fscanf(fp,"%d",&height);
printf ("\n Width = %u, Height = %u\n", width, height);
fscanf(fp,"%d",&depth);

grayplane = new unsigned char*[height];


for(i=0;i<height;i++)grayplane[i] = new unsigned char[width];

redplane = new unsigned char*[height];


for(i=0;i<height;i++)redplane[i] = new unsigned char[width];
greenplane = new unsigned char*[height];
for(i=0;i<height;i++)greenplane[i] = new unsigned char[width];
blueplane = new unsigned char*[height];
for(i=0;i<height;i++)blueplane[i] = new unsigned char[width];

if (( strcmp(header,"P3")) ==0) // reading ascii data


{
printf("\n Reading ascii data...\n");
for (i=0; i<height; i++)
for (j=0; j<width; j++)
fscanf(fp,"%u %u %u", &redplane[i][j],&greenplane[i][j],&blueplane[i][j]);
}
else if (( strcmp(header,"P6")) ==0) // reading binary data
{
printf("\n Reading binary data...\n");
while (fgetc(fp)!=’\n’);
for(i=0; i< height; i++)
for (j=0; j< width; j++){
fread(&redplane[i][j],1,1,fp);
fread(&greenplane[i][j],1,1,fp);
fread(&blueplane[i][j],1,1,fp);
}
}

244
else
printf (" \n Not in a readable format; header = %d\n", header);

delete[] header;
delete[] comment;
fclose(fp);
}

// writing color image

void image::writeppm(char* fileName)

{
FILE* fp = fopen(fileName, "wb");
printf("\n Writing the image %s\n", fileName);
fprintf(fp, "P6\n"); // writing header
fprintf(fp, "# file created by S.PAL\n"); // writing comment..
fprintf(fp, "%u %u\n", width, height); //writing width, height
fprintf(fp, "%u\n", 255); // writing depth
for (i=0; i< height; i++)
for (j=0; j< width; j++){
fwrite(&redplane[i][j],1,1,fp); //writing intensity values
fwrite(&greenplane[i][j],1,1,fp);
fwrite(&blueplane[i][j],1,1,fp);
}
fclose(fp);
}

// converting RGB image to gray image

void image::rgb2gray()
{
for (i=0; i<height; i++)
for (j=0; j<width; j++)
grayplane[i][j] = (unsigned char)(0.299*redplane[i][j]+
0.587*greenplane[i][j]+0.114* blueplane[i][j]);
}

void main()
{

char name[30],fname[50];
int height,width,t;
image image1,image2;

245
// Processing PGM image
cout <<" Enter the name of pgm format gray image (without extension):";
cin>>name;

sprintf(fname,"%s.pgm",name);
image1.readpgm(fname); // Reading PGM image
height=image1.getheight();
width=image1.getwidth();
for(i=0;i<height;i++)
for(j=0;j<width;j++){
t=image1.getgraypixval(i,j);
image1.setgraypixval(i,j,255-t); // obtaining negative of gray image
}
sprintf(fname,"negativeof%s.pgm",name);
image1.writepgm(fname); // writing PGM image

//Processing PPM image


cout <<" Enter the name of ppm format color image (without extension):";
cin>>name;

sprintf(fname,"%s.ppm",name);
image2.readppm(fname); // Reading PPM image
height=image2.getheight();
width=image2.getwidth();

image2.rgb2gray(); // Obtaining gray imagee


sprintf(fname,"grayof%s.pgm",name);
image2.writepgm(fname);

for(i=0;i<height;i++)
for(j=0;j<width;j++){
t=image2.getredpixval(i,j);
image2.setredpixval(i,j,255-t); // Obtaining negative of color image
t=image2.getgreenpixval(i,j);
image2.setgreenpixval(i,j,255-t);
t=image2.getbluepixval(i,j);
image2.setbluepixval(i,j,255-t);
}

sprintf(fname,"negativeof%s.ppm",name);
image2.writeppm(fname); // writing PPM image
}

246
Bibliography

[1] B. Yegnanarayana and S.P. Kishore, “AANN: an alternative to GMM for pattern
recognition,” Neural Networks, vol. 15, pp. 459–469, January 2002.
[2] Delores M.Etter, Engineering problem solving with Matlab, Pearson Education, India,
2005.
[3] Duane Hanselman and Bruce Littlefield, Mastering Matlab 7, Pearson Education, India,
2005.
[4] MATLAB from mathworks, http://www.mathworks.com.
[5] L. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall,
Englewood Cliffs and N.J.,, 1978.
[6] A. V. Oppenheim and R.W.Schafer, Digital Signal Processing, Englewood Cliffs,NJ:
Prentice Hall, 1975.
[7] Sanjit K. Mitra, Digital Signal Processing, TATA Mc Graw-Hill, New Delhi, 2001.
[8] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Pearson Education,
Singapore, 2003.
[9] Douglas O’Shaughnessy, Speech communications, Universities press, Hyderabad, 2001.
[10] Richard E. Woods Rafael C. Gonzalez and Steven L. Eddins, Digital Image Processing
using MATLAB, Pearson Education, 2008.
[11] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, Pearson Educa-
tion Asia, Singapore, 2001.
[12] R.C. Gonzalez and R.E. Woods, Digital image processing, Pearson Education, Singa-
pore, 2002.
[13] A.K. Jain, Fundamentals of digital image processing, Prentice-Hall of India, New Delhi,
2001.
[14] Matlab documentation,C://MATLAB6p5//help//toolbox//images//images.html.
[15] Matlab documentation, C://MATLAB6p5//help//toolbox//wavelet//wavelet.html.
[16] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, John Wiley & Sons,
Singapore, 2003.
[17] S. Haykin, Neural networks: A comprehensive foundation, Prentice Hall International,
New Jersey, 1999.
[18] R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press,
1961.

247
[19] K. Pearson, “On lines and planes of closest fit to systems of points in space,” Phil.
Mag, vol. 2, pp. 559–572, 1901.

[20] H. Hotelling, “Analysis of a complex of statistical variables into principal components,”


Phil. Mag, vol. 24, pp. 417–441, 1933.

[21] L.H. Koh, S. Ranganath, and Y.V. Venkatesh, “An integrated automatic face detection
and recognition system,” vol. 35, pp. 1259–1273, 2002.

[22] Geoffrey McLachlan and David Peel, Finite Mixture Models, John Wiley and sons,
Newyork, 2000.

[23] V. Vapnik, Statistical Learning Theory, John Wiley and Sons, New York, 1998.

[24] J.C. Burges Christopher, “A tutorial on support vector machines for pattern recogni-
tion,” Data mining and knowledge discovery, vol. 52, pp. 121–167, 1998.

[25] B. Heisele, P. Ho, and T. Poggio, “Face recognition with support vector machines:
Global versus component-based approach,” in Proc. 8th Int. Conf. Computer Vision,
Vancouver, BC, Canada, 2001, pp. 688–694.

[26] B. Heisele, Alessandro, and T. Poggio, “Learning and vision machines,” vol. 90, no. 7,
pp. 1164–1177, July 2002.

[27] SVMTorch from IDIAP, Switzerland, http://www.kernel-machines.org.

[28] Lawrence R Rabiner, “A tutorial on hidden markov models and selected applications
in speech recognition,” vol. 77(2), pp. 257–286, Feb. 1989.

[29] Philip D. Wasserman, Neural Computing Theory and Practice, New York, 1989.

[30] T. N. Shankar, Neural networks, University Science Press, New Delhi, 2008.

[31] B. Yegnanarayana, S.V. Gangashetty, and S. Palanivel, “Autoassociative neural network


models for pattern recognition tasks in speech and image,” in Soft Computing Approach
to Pattern Recognition and Image Processing, World Scientific publishing Co. Pte. Ltd,
Singapore, December 2002, pp. 283–305.

[32] B. Yegnanarayana, Artificial neural networks, Prentice Hall of India, New Delhi, 1999.

[33] S. Palanivel, Person Authentication using Speech, Face and Visual Speech, Ph.d the-
sis, Department of Computer Science and Engineering, Indian Institute of Technology
Madras, Chennai, 2004.

[34] Ferat Sahin, “A bayesian network approach to the self-organization and learning in
intelligent agents,” 2000.

[35] Kevin P. Murphy, “An introduction to graphical models,” May 2001.

248

Potrebbero piacerti anche