Sei sulla pagina 1di 43

DISCLAIMER

In preparation of these slides, materials have been taken from different


online sources in the shape of books, websites, research papers and
presentations etc. However, the author does not have any intention to take
any benefit of these in her/his own name. This lecture (audio, video, slides
etc) is prepared and delivered only for educational purposes and is not
intended to infringe upon the copyrighted material. Sources have been
acknowledged where applicable. The views expressed are presenter’s alone
and do not necessarily represent actual author(s) or the institution.
Machine
Learning

Feature Selection and Dimensionality Reduction

0
Feature
Selection
In the presence of millions of features/attributes/inputs/variables, select the
most relevant ones.
Advantages: build better, faster, and easier to understand learning machines.

m
X m’
features

1/54
Feature
Selection
• Transforming a dataset by removing some of its
columns

A1 A2 A3 A4 C A2 A4 C
Why

?
Lack of quality in the data
– Redundant
– Irrelevant
– Noisy
• Scalability issues
– Some methods many not be able to cope with a large
number of attributes or instances
• In general, to help the machine learning method to learn
better
Taxonomy of feature/prototype
selection methods
• Filter methods
– The reduction process happens before the
learning process
– Using some kind of metric that tries to estimate the
goodness of the reduction

Classifica
Datas Filter ti on
et method
Taxonomy of feature/prototype
selection methods
• Wrapper methods
– Filter methods try to estimate the goodness of the reduced
dataset
– Why don’t we use the actual machine learning method (or at
least a fast one) to tell if the reduction is good or bad?
– The space of possible reductions will be iteratively
explored by a search algorithms

Explore
reduction Classifica
Dataset ti on
method

Classifier
Feature
selection
• Two issues that characterise the FS methods
– Feature evaluation (for the filters)
• How do we estimate the goodness of a feature subset?
• Metric applies to
– Feature subset
– Individual features (generating a ranking)
– Subset exploration (for both filters and wrappers)
• How do we explore the space of feature subsets?
Feature evaluation
methods
• Four types of metrics (Liu and Yu, 2005)
– Distance metrics
• Feature helps separating better between classes
– Information metrics
• Quantify the information gain (Information Theory) of a feature
– Dependency metrics
• Quantify the correlation between attributes and between each
attribute and the class
– Consistency metrics
• Inconsistency: having two equal instances but with different class
labels
• These metrics try to find the minimal set of features that
maintains the same level of consistency as the whole dataset
Feature
Selection
– Filtering approach:
ranks features or feature subsets independently of the
predictor (classifier).
• …using univariate methods: consider one variable at a time
• …using multivariate methods: consider more than one variables at a time

– Wrapper approach:
uses a classifier to assess (many) features or feature subsets.

8/54
Dimensionality
Reduction
Data
• From aDimensionality
theoretical point of view, increasing the number
of features should lead to better performance
(assuming independent features).

• In practice, the inclusion of


more features leads to
worse performance (i.e., curse of
dimensionality).

• Need exponential number of training


examples as dimensionality increases.
Dimensionality
• Reduction
Significant improvements can be achieved by first mapping
the data into a lower-dimensional sub-space.

• Dimensionality can be reduced by:


− Combining features (linearly or non-linearly)
− Selecting a subset of features (i.e., feature selection).

• Linear combinations are particularly attractive because they are


simple to compute and analytically tractable.
11
Dimensionality Reduction
(cont’d)
• Two classical approaches for finding optimal
linear transformations are:

– Principal Components Analysis (PCA): Seeks a projection that


preserves as much information in the data as possible (in a
least-squares sense).

– Linear Discriminant Analysis (LDA): Seeks a projection that


best separates the data (in a least-squares sense).

12
Principal Component Analysis

• Principal Component analysis is used to analyse


the data
• Principal component analysis is used to identify
the patterns in data, which is hard to find in
high dimensional data

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Steps of Principal Component
Analysis
• Input data
• Calculate mean
• Subtract the mean from each of the data dimension
• Compute the covariance matrix
• Calculate the Eigen values and Eigen vectors of
the above matrix
• Choosing components and forming a feature vector
• Deriving the new dataset

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Input Data
X Y
2.2 0.7
3 1.1
0.5 1.5
1.9 2.5
2.8 2.1
1.1 3
1 1.6
2.3 0.9
1.7 2.8
3.1 1.3

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Input Data

Input data
3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate Mean
X Y
(2.2 + (0.7 +
3+ 1.1 +
0.5 + 1.5 +
1.9 + 2.5 +
2.8 + 2.1 +
1.1 + 3+
1+ 1.6 +
2.3 + 0.9 +
1.7 + 2.8 +
3.1 )/10 1.3)/10
Mean = 1.96 Mean = 1.75

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Subtract the mean from each of the
data dimension
X Y
2.2 -1.96 = 0.24 0.7 – 1.75 = -1.05

1.04 -0.65
-1.46 -0.25
-0.06 0.75
0.84 0.35
-0.86 1.25
-0.96 -0.15
0.34 -0.85
-0.26 1.05
1.14 -0.45
12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix

• The covariance matrix is as follows:

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
X Y W=Xi -M Z=Yi -M W*Z
=cov(x,y)=cov(y,x)
2.2 0.7 0.24 -1.05 -0.252
3 1.1 1.04 -0.65 -0.676
0.5 1.5 -1.46 -0.25 0.365
1.9 2.5 -0.06 0.75 -0.045
2.8 2.1 0.84 0.35 0.294
1.1 3 -0.86 1.25 -1.075
1 1.6 -0.96 -0.15 0.144
2.3 0.9 0.34 -0.85 -0.289
1.7 2.8 -0.26 1.05 -0.273
3.1 1.3 1.14 -0.45 -0.513
Total -2.32/9
12
Average -0.25778

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
X Y W=Xi -M Z=Yi -M W*W =cov(x,x) Z*Z=cov(y,y)

2.2 0.7 0.24 -1.05 0.0576 1.1025


3 1.1 1.04 -0.65 1.0816 0.4225
0.5 1.5 -1.46 -0.25 2.1316 0.0625
1.9 2.5 -0.06 0.75 0.0036 0.5625
2.8 2.1 0.84 0.35 0.7056 0.1225
1.1 3 -0.86 1.25 0.7396 1.5625
1 1.6 -0.96 -0.15 0.9216 0.0225
2.3 0.9 0.34 -0.85 0.1156 0.7225
1.7 2.8 -0.26 1.05 0.0676 1.1025
3.1 1.3 1.14 -0.45 1.2996 0.2025
Total 7.124 5.885/9
12
Average 0.791556 0.653889

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
COVARIANCE MATRIX

COV(X,X) COV(X,Y)

COV(Y,X) COV(Y,Y)

COVARIANCE MATRIX VALUES

0.791556 -0.25778

-0.25778 0.653889

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate the Eigen Values & Eigen
Vectors of the Matrix

• Against each Eigen value, there is


12

Eigen vector
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Calculate the Eigen Values

12
Find the Eigen Vectors

12
Eigen Vectors

12
Eigen Vectors

12
Eigen Vectors

12
Eigen Vectors

12
Eigen Vectors

12

• The same procedure will be applied to


find Eigen vector against second Eigen
value
Choosing Components

• Now select the Eigen vectors on which basis you


will derive data
• It can be based on one Eigen vector or on both
Eigen vectors

12
Derive the new Dataset

• To Derive the new dataset.

• Row Feature Vector is the matrix with the Eigen


vector
• Row Data Adjust is the mean adjusted data

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Derive the new Dataset

12

• Same Applies to all samples


http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Getting Old Data Back

12

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Acknowledgement
s
 Introduction to Machine Learning, Alphaydin
 Pattern Classification” by Duda et al., John Wiley & Sons, Chapter-2.
 Some material is taken from Prof. Olga Veksler slide’s
 Some material is taken from this link
Material in these slides has been taken from, the following

https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/CSC411_Fall16.html
 Biomis.org
 https://www.youtube.com/watch?v=TQvxWaQnrqI
 http://www.cse.psu.edu/~rtc12/CSE586Spring2010/lectures/pcaLectureShort.pdf
resources

19

Potrebbero piacerti anche