Ml-Dimensionality Reduction

DISCLAIMER
In preparation of these slides, materials have been taken from different

online sources in the shape of books, websites, research papers and
presentations etc. However, the author does not have any intention to take
any benefit of these in her/his own name. This lecture (audio, video, slides
etc) is prepared and delivered only for educational purposes and is not
intended to infringe upon the copyrighted material. Sources have been
acknowledged where applicable. The views expressed are presenter’s alone
and do not necessarily represent actual author(s) or the institution.
Machine
Learning
Feature Selection and Dimensionality Reduction
0
Feature
Selection
In the presence of millions of features/attributes/inputs/variables, select the
most relevant ones.
Advantages: build better, faster, and easier to understand learning machines.
m
X m’
features
1/54
Feature
Selection
• Transforming a dataset by removing some of its
columns
A1 A2 A3 A4 C A2 A4 C
Why
•
?
Lack of quality in the data
– Redundant
– Irrelevant
– Noisy
• Scalability issues
– Some methods many not be able to cope with a large
number of attributes or instances
• In general, to help the machine learning method to learn
better
Taxonomy of feature/prototype
selection methods
• Filter methods
– The reduction process happens before the
learning process
– Using some kind of metric that tries to estimate the
goodness of the reduction
Classifica
Datas Filter ti on
et method
Taxonomy of feature/prototype
selection methods
• Wrapper methods
– Filter methods try to estimate the goodness of the reduced
dataset
– Why don’t we use the actual machine learning method (or at
least a fast one) to tell if the reduction is good or bad?
– The space of possible reductions will be iteratively
explored by a search algorithms
Explore
reduction Classifica
Dataset ti on
method
Classifier
Feature
selection
• Two issues that characterise the FS methods
– Feature evaluation (for the filters)
• How do we estimate the goodness of a feature subset?
• Metric applies to
– Feature subset
– Individual features (generating a ranking)
– Subset exploration (for both filters and wrappers)
• How do we explore the space of feature subsets?
Feature evaluation
methods
• Four types of metrics (Liu and Yu, 2005)
– Distance metrics
• Feature helps separating better between classes
– Information metrics
• Quantify the information gain (Information Theory) of a feature
– Dependency metrics
• Quantify the correlation between attributes and between each
attribute and the class
– Consistency metrics
• Inconsistency: having two equal instances but with different class
labels
• These metrics try to find the minimal set of features that
maintains the same level of consistency as the whole dataset
Feature
Selection
– Filtering approach:
ranks features or feature subsets independently of the
predictor (classifier).
• …using univariate methods: consider one variable at a time
• …using multivariate methods: consider more than one variables at a time
– Wrapper approach:
uses a classifier to assess (many) features or feature subsets.
8/54
Dimensionality
Reduction
Data
• From aDimensionality
theoretical point of view, increasing the number
of features should lead to better performance
(assuming independent features).
• In practice, the inclusion of

more features leads to
worse performance (i.e., curse of
dimensionality).
• Need exponential number of training

examples as dimensionality increases.
Dimensionality
• Reduction
Significant improvements can be achieved by first mapping
the data into a lower-dimensional sub-space.
• Dimensionality can be reduced by:

− Combining features (linearly or non-linearly)
− Selecting a subset of features (i.e., feature selection).
• Linear combinations are particularly attractive because they are

simple to compute and analytically tractable.
11
Dimensionality Reduction
(cont’d)
• Two classical approaches for finding optimal
linear transformations are:
– Principal Components Analysis (PCA): Seeks a projection that

preserves as much information in the data as possible (in a
least-squares sense).
– Linear Discriminant Analysis (LDA): Seeks a projection that

best separates the data (in a least-squares sense).
12
Principal Component Analysis
• Principal Component analysis is used to analyse

the data
• Principal component analysis is used to identify
the patterns in data, which is hard to find in
high dimensional data
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Steps of Principal Component
Analysis
• Input data
• Calculate mean
• Subtract the mean from each of the data dimension
• Compute the covariance matrix
• Calculate the Eigen values and Eigen vectors of
the above matrix
• Choosing components and forming a feature vector
• Deriving the new dataset
12
Input Data
X Y
2.2 0.7
3 1.1
0.5 1.5
1.9 2.5
2.8 2.1
1.1 3
1 1.6
2.3 0.9
1.7 2.8
3.1 1.3
12
Input Data
Input data
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
12
Calculate Mean
X Y
(2.2 + (0.7 +
3+ 1.1 +
0.5 + 1.5 +
1.9 + 2.5 +
2.8 + 2.1 +
1.1 + 3+
1+ 1.6 +
2.3 + 0.9 +
1.7 + 2.8 +
3.1 )/10 1.3)/10
Mean = 1.96 Mean = 1.75
12
Subtract the mean from each of the
data dimension
X Y
2.2 -1.96 = 0.24 0.7 – 1.75 = -1.05
1.04 -0.65
-1.46 -0.25
-0.06 0.75
0.84 0.35
-0.86 1.25
-0.96 -0.15
0.34 -0.85
-0.26 1.05
1.14 -0.45
12
Compute covariance matrix
• The covariance matrix is as follows:
12
X Y W=Xi -M Z=Yi -M W*Z
=cov(x,y)=cov(y,x)
2.2 0.7 0.24 -1.05 -0.252
3 1.1 1.04 -0.65 -0.676
0.5 1.5 -1.46 -0.25 0.365
1.9 2.5 -0.06 0.75 -0.045
2.8 2.1 0.84 0.35 0.294
1.1 3 -0.86 1.25 -1.075
1 1.6 -0.96 -0.15 0.144
2.3 0.9 0.34 -0.85 -0.289
1.7 2.8 -0.26 1.05 -0.273
3.1 1.3 1.14 -0.45 -0.513
Total -2.32/9
12
Average -0.25778
X Y W=Xi -M Z=Yi -M W*W =cov(x,x) Z*Z=cov(y,y)
2.2 0.7 0.24 -1.05 0.0576 1.1025

3 1.1 1.04 -0.65 1.0816 0.4225
0.5 1.5 -1.46 -0.25 2.1316 0.0625
1.9 2.5 -0.06 0.75 0.0036 0.5625
2.8 2.1 0.84 0.35 0.7056 0.1225
1.1 3 -0.86 1.25 0.7396 1.5625
1 1.6 -0.96 -0.15 0.9216 0.0225
2.3 0.9 0.34 -0.85 0.1156 0.7225
1.7 2.8 -0.26 1.05 0.0676 1.1025
3.1 1.3 1.14 -0.45 1.2996 0.2025
Total 7.124 5.885/9
12
Average 0.791556 0.653889
COVARIANCE MATRIX
COV(X,X) COV(X,Y)
COV(Y,X) COV(Y,Y)
COVARIANCE MATRIX VALUES
0.791556 -0.25778
-0.25778 0.653889
12
Calculate the Eigen Values & Eigen
Vectors of the Matrix
• Against each Eigen value, there is

12
Eigen vector
Calculate the Eigen Values
12
12
12
12
12
12
12
Find the Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
• The same procedure will be applied to

find Eigen vector against second Eigen
value
Choosing Components
• Now select the Eigen vectors on which basis you

will derive data
• It can be based on one Eigen vector or on both
Eigen vectors
12
Derive the new Dataset
• To Derive the new dataset.
• Row Feature Vector is the matrix with the Eigen

vector
• Row Data Adjust is the mean adjusted data
12
Derive the new Dataset
12
• Same Applies to all samples

Getting Old Data Back
12
Acknowledgement
s
 Introduction to Machine Learning, Alphaydin
 Pattern Classification” by Duda et al., John Wiley & Sons, Chapter-2.
 Some material is taken from Prof. Olga Veksler slide’s
 Some material is taken from this link
Material in these slides has been taken from, the following
https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/CSC411_Fall16.html
 Biomis.org
 https://www.youtube.com/watch?v=TQvxWaQnrqI
 http://www.cse.psu.edu/~rtc12/CSE586Spring2010/lectures/pcaLectureShort.pdf
resources
19

Ml-Dimensionality Reduction

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Ml-Dimensionality Reduction

Caricato da

Copyright:

Formati disponibili

DISCLAIMER

In preparation of these slides, materials have been taken from different

Feature Selection and Dimensionality Reduction

• In practice, the inclusion of

• Need exponential number of training

• Dimensionality can be reduced by:

• Linear combinations are particularly attractive because they are

– Principal Components Analysis (PCA): Seeks a projection that

– Linear Discriminant Analysis (LDA): Seeks a projection that

• Principal Component analysis is used to analyse

• The covariance matrix is as follows:

2.2 0.7 0.24 -1.05 0.0576 1.1025

COVARIANCE MATRIX VALUES

• Against each Eigen value, there is

• The same procedure will be applied to

• Now select the Eigen vectors on which basis you

• To Derive the new dataset.

• Row Feature Vector is the matrix with the Eigen

• Same Applies to all samples

Potrebbero piacerti anche