Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
0
Feature
Selection
In the presence of millions of features/attributes/inputs/variables, select the
most relevant ones.
Advantages: build better, faster, and easier to understand learning machines.
m
X m’
features
1/54
Feature
Selection
• Transforming a dataset by removing some of its
columns
A1 A2 A3 A4 C A2 A4 C
Why
•
?
Lack of quality in the data
– Redundant
– Irrelevant
– Noisy
• Scalability issues
– Some methods many not be able to cope with a large
number of attributes or instances
• In general, to help the machine learning method to learn
better
Taxonomy of feature/prototype
selection methods
• Filter methods
– The reduction process happens before the
learning process
– Using some kind of metric that tries to estimate the
goodness of the reduction
Classifica
Datas Filter ti on
et method
Taxonomy of feature/prototype
selection methods
• Wrapper methods
– Filter methods try to estimate the goodness of the reduced
dataset
– Why don’t we use the actual machine learning method (or at
least a fast one) to tell if the reduction is good or bad?
– The space of possible reductions will be iteratively
explored by a search algorithms
Explore
reduction Classifica
Dataset ti on
method
Classifier
Feature
selection
• Two issues that characterise the FS methods
– Feature evaluation (for the filters)
• How do we estimate the goodness of a feature subset?
• Metric applies to
– Feature subset
– Individual features (generating a ranking)
– Subset exploration (for both filters and wrappers)
• How do we explore the space of feature subsets?
Feature evaluation
methods
• Four types of metrics (Liu and Yu, 2005)
– Distance metrics
• Feature helps separating better between classes
– Information metrics
• Quantify the information gain (Information Theory) of a feature
– Dependency metrics
• Quantify the correlation between attributes and between each
attribute and the class
– Consistency metrics
• Inconsistency: having two equal instances but with different class
labels
• These metrics try to find the minimal set of features that
maintains the same level of consistency as the whole dataset
Feature
Selection
– Filtering approach:
ranks features or feature subsets independently of the
predictor (classifier).
• …using univariate methods: consider one variable at a time
• …using multivariate methods: consider more than one variables at a time
– Wrapper approach:
uses a classifier to assess (many) features or feature subsets.
8/54
Dimensionality
Reduction
Data
• From aDimensionality
theoretical point of view, increasing the number
of features should lead to better performance
(assuming independent features).
12
Principal Component Analysis
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Steps of Principal Component
Analysis
• Input data
• Calculate mean
• Subtract the mean from each of the data dimension
• Compute the covariance matrix
• Calculate the Eigen values and Eigen vectors of
the above matrix
• Choosing components and forming a feature vector
• Deriving the new dataset
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Input Data
X Y
2.2 0.7
3 1.1
0.5 1.5
1.9 2.5
2.8 2.1
1.1 3
1 1.6
2.3 0.9
1.7 2.8
3.1 1.3
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Input Data
Input data
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate Mean
X Y
(2.2 + (0.7 +
3+ 1.1 +
0.5 + 1.5 +
1.9 + 2.5 +
2.8 + 2.1 +
1.1 + 3+
1+ 1.6 +
2.3 + 0.9 +
1.7 + 2.8 +
3.1 )/10 1.3)/10
Mean = 1.96 Mean = 1.75
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Subtract the mean from each of the
data dimension
X Y
2.2 -1.96 = 0.24 0.7 – 1.75 = -1.05
1.04 -0.65
-1.46 -0.25
-0.06 0.75
0.84 0.35
-0.86 1.25
-0.96 -0.15
0.34 -0.85
-0.26 1.05
1.14 -0.45
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
X Y W=Xi -M Z=Yi -M W*Z
=cov(x,y)=cov(y,x)
2.2 0.7 0.24 -1.05 -0.252
3 1.1 1.04 -0.65 -0.676
0.5 1.5 -1.46 -0.25 0.365
1.9 2.5 -0.06 0.75 -0.045
2.8 2.1 0.84 0.35 0.294
1.1 3 -0.86 1.25 -1.075
1 1.6 -0.96 -0.15 0.144
2.3 0.9 0.34 -0.85 -0.289
1.7 2.8 -0.26 1.05 -0.273
3.1 1.3 1.14 -0.45 -0.513
Total -2.32/9
12
Average -0.25778
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
X Y W=Xi -M Z=Yi -M W*W =cov(x,x) Z*Z=cov(y,y)
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Compute covariance matrix
COVARIANCE MATRIX
COV(X,X) COV(X,Y)
COV(Y,X) COV(Y,Y)
0.791556 -0.25778
-0.25778 0.653889
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate the Eigen Values & Eigen
Vectors of the Matrix
Eigen vector
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Calculate the Eigen Values
12
Find the Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
Eigen Vectors
12
12
Derive the new Dataset
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Derive the new Dataset
12
12
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Acknowledgement
s
Introduction to Machine Learning, Alphaydin
Pattern Classification” by Duda et al., John Wiley & Sons, Chapter-2.
Some material is taken from Prof. Olga Veksler slide’s
Some material is taken from this link
Material in these slides has been taken from, the following
https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/CSC411_Fall16.html
Biomis.org
https://www.youtube.com/watch?v=TQvxWaQnrqI
http://www.cse.psu.edu/~rtc12/CSE586Spring2010/lectures/pcaLectureShort.pdf
resources
19