Principal Component Analysis (PCA) Final

Principal Component Analysis (PCA)
Tushar Jaruhar | Founder, SYMPLFYD

IPL on Mobile
3 Dimension to 2 Dimension
Sad or Happy?
Sad or Happy?
SAD HAPPY
The way you look at Data counts

Analysing Data
 In a 3D space data is arranged on
Income
Experience
3 axis – Experience, Income and
Age
 For 5 variables it would be having
5 axis
 Number of variables = Number
Height of Axis = Number of Dimension
 Challenge: How do we recognize
patterns in n-dimension plane?
Weight
Age
Analysing Data
 The BLUE line passing through
Income the data captures the
Experience DIRECTION of maximum
Variation AND the
MAGNITUDE of maximum
variation
 The RED line is perpendicular
to the BLUE Line and it
captures the DIRECTION and
MAGNITUDE of the Second
Highest Variation
 The GREEN Line captures the
DIRECTION and MAGNITUDE
of Third Highest Variation
Age
Dimension Reduction 80%-90% Variation Captured along 2
Dimensions: PC1 and PC2
100% Variation Captured
Income along 3 Dimensions 10% - 20% of Variation along Green
Experience Line is lost as the 3rd dimension has
been removed
3D 2D PC2
PC1
Note: Age, Experience and Income is not

the Axis
It has been replaced by PC1 and PC2
Age The data color has changed as some
information has been discarded
Data and Information
 Suppose, I have data on 1000 variables such as – Income, Age, Experience Level,
Education, Gender etc.
 Information: “What is conveyed or represented by a particular arrangement or

sequence of things”
 Do we need so many data variables to extract the relevant information for our
business problem?
 Data can be highly correlated for example Experience Level and Age
 Knowing years of experience -> indicative age range can be obtained
 So, do we need both experience level and age data?

Applications
 Facial Recognition
 Engineering
 Google Search
 Reduction in Number of Variables
 Removing Noise (Redundancy)

Eigen Vector and Eigen Values
An important relationship
Example
Direction (Ф) of the vector did not
change.
Only the vector became longer
Same
A𝒗
4
0 1 1 2 1
= =2 x
2 1 2 𝒗
2 4 2
Ф
A v k λ v
1 2
When a VECTOR (𝒗) is multiplied by a matrix (𝑨) and the

𝑨𝒗= 𝝀𝒗 resultant is the product of a Scalar (𝝀) and Vector (𝒗)
Then, the vector (𝒗) is called Eigen Vector …and ….the

Scalar (𝝀) is called Eigen Value
Eigen Vector and Eigen Value
 The term Eigen is a German word and it means “mine” or “my”
 Vector has magnitude and direction
 When a Matrix is multiplied by its Eigen Vector it results in the Eigen Vector
being multiplied by a Scalar. This Scalar is called Eigen Value
 Each and every Eigen Value will have an Eigen Vector associated to it
 The relationship Av = λv allows us to extract Eigen Values and Associated Eigen

Vectors
Role of Eigen Vector and Eigen Value in Principal
Component Analysis
 The 1st Eigen Vector is defined as the
direction in which data has Maximum
5 Variance – y1
2nd Principal 1st Principal  Principal Component 1
Component, y2 Component, y1
 The 2nd Eigen Vector is defined as the
4 direction in which data has the 2nd
Highest Variance – y2
 Principal Component 2
 Both principal component are orthogonal

3 or at 90 degrees to each other as per the
PCA model
 In this model data has only 2 Dimensions
and hence the principal components are 2
 What would happen if data had 100
2 Dimensions?
4.0 4.5 5.0 5.5 6.0
Role of Eigen Vector and Eigen Value in Principal
Component Analysis
 The 1st Eigen Value is defined as
the magnitude of the Maximum
5
2nd Principal 1st Principal Variance along Principal
Component, y2 Component, y1 Component 1 denoted by λ1
λ1 λ2  The 2nd Eigen Value is defined as

4
the magnitude of the 2nd Highest
Variance along Principal
Component 2 denoted by λ2
3
 For each eigen value there is a
principal component. In other
words for each eigen value we
have a eigen vector
2
4.0 4.5 5.0 5.5 6.0
Our Goal
 Start with the variance and covariance matrix of the data set
 Obtain the value of maximum variance along each component which is the
Eigen Value
 Get the direction of principal components associated with the eigen vector
 Keep only those EIGEN VECTORS (2 or 3) that explain 80 to 90% of the variance
in data
 Use these 2-3 Eigen Vectors to transform the original data set into components
(face example)
 Naturally, all the variance in the data set has not been accounted for some
eigen vectors have been discarded because their contribution to the variance in
data was not significant = reduced dimesionality
Numerical Example: Understand the Data
Obs x1 x2 X1 and X2 are two variables and we have 10 observations

Our Goal is to extract the Principal Components of this data set
1 2.5 2.4
2 0.5 0.7 Step 1: Plot and evaluate the data points
3 2.2 2.9
Plot of X1 versus X2
4 1.9 2.2 3.5
5 3.1 3 3
Highly correlated
data r = 92.5%
6 2.3 2.7 2.5
2
7 2 1.6 1.5
8 1 1.1 1
9 1.5 1.6 0.5
10 1.1 0.9 0 0.5 1 1.5 2 2.5 3 3.5

Numerical Example: Center the Data
Step 2: Compute the means
𝑥1 = 1.81 and 𝑥2 = 1.91
Obs x1 x2
1 2.5 2.4 Step 3: Centre the data by subtracting the mean
2 0.5 0.7
3 2.2 2.9 Obs x1* = x1-𝑥1 x2* = x2-𝑥2
4 1.9 2.2 1 0.69 0.49 What is the mean of x1*
2 -1.31 -1.21 and x2*?
5 3.1 3 3 0.39 0.99
6 2.3 2.7 4 0.09 0.29 It is 0
7 2 1.6 5 1.29 1.09
6 0.49 0.79
8 1 1.1 7 0.19 -0.31
9 1.5 1.6 8 -0.81 -0.81
9 -0.31 -0.31
10 1.1 0.9 10 -0.71 -1.01
Numerical Example: Compare the Original Data
with Centered Data
Plot of X1 versus X2 Centered Data

3.5 1.5
3
2.5 1
2
1.5 0.5
1
0.5 0
-1.5 -1 -0.5 0 0.5 1 1.5
0
0 1 2 3 4 -0.5
-1
Center Point of Data
-1.5
Numerical Example: Compute the Variance and
Covariance of the Centered Data
Step 4: Using the Variance.S and Covariance.S function the variance and
covariance can be computed. This gives the Variance and Covariance matrix
Var (x1*) Cov(x1*,x2*) 0.616556 0.615444
Cov(x1*,x2*) Var(x2*) 0.615444 0.716556

Numerical Example: Compute the Eigen Values from
the Variance-Covariance Matrix
Step 5: From the variance and covariance matrix the eigen values can be obtained.
Recall the eigen values are the measures of maximum variance along a Principal
Component
5
Find the value of : λ1 λ2 2nd Principal 1st Principal
Component, y2 Component, y1
λ1 λ2
4
2
4.0 4.5 5.0 5.5 6.0
Numerical Example: Compute the Eigen Value
𝑨𝒗= 𝝀𝒗
𝝀 is a scalar and it has been converted into matrix
𝑨 𝒗 = 𝝀 𝑰𝒗 form by multiplying with the Identity Matrix 𝑰 so
that operations on matrices can be performed
𝒗 – the eigen vector cannot be a 0
(𝑨 − 𝝀 𝑰)𝒗 = 𝟎 vector because it has to give direction of

the principal component. So for the
equation to hold (𝑨 − 𝝀 𝑰) = 0
Det|𝑨 − 𝝀 𝑰| = 𝟎
Det|𝑨 − 𝝀 𝑰| = 𝟎
0.616556 0.615444 1 0
−𝝀 = 𝟎
0.615444 0.716556 0 1
0.616556 0.615444 𝝀 0
− 0 𝝀
= 𝟎
0.615444 0.716556
0.616556 0.615444 𝝀 0
− 0 𝝀
= 𝟎
0.615444 0.716556
0.616556 - λ 0.615444
= 𝟎
0.615444 0.716556 - λ
(𝟎. 𝟔𝟏𝟔𝟓𝟓𝟔 − 𝝀) ∗ (𝟎. 𝟕𝟏𝟔𝟓𝟓𝟔 − 𝝀) – 0.615444* 0.61544 = 0

(λ2– 1.333111 λ + 0.441796) – 0.378772 = 0
(𝟎. 𝟔𝟏𝟔𝟓𝟓𝟔 − 𝝀) ∗ (𝟎. 𝟕𝟏𝟔𝟓𝟓𝟔 − 𝝀) – 0.615444* 0.61544 = 0

(λ2 – 0.883593 λ + 0.441796) – 0.378772 = 0
λ2 – 0.883593 λ + 0.063024 = 0
This is a Quadratic Equation and will have two roots. The roots can
be obtained from the standard formula
−𝑏± 𝑏2 −4𝑎𝑐
λ1 ,λ2 = = 1.284028, 0.049083
2𝑎
For each eigen value there is a corresponding eigen vector which is also know as the
Principal Component
Numerical Example: Compute Eigen Vector
For λ1 = 1.284028
-0.66747*V11 + 0.61544*V12 = 0
(𝑨 − 𝝀𝟏 𝑰)𝒗 = 𝟎 0.615444*V11 – 0.56747*V12 = 0
V11 = .92205 V12

V11 = .92205 V12
0.616556 - λ1 0.615444 V11
=𝟎 If V12 = 1, then V11 = .92205
0.615444 0.716556 – λ1
V12
-0.66747 0.615444 V11 V11 0.92205

V1
=𝟎 V12 1
0.615444 -0.56747 V12
Numerical Example: Eigen Vector
An Eigen Vector should be a UNIT Vector that is the magnitude of this vector should be 1
The magnitude of this vector is 𝑽𝟐𝟏𝟏 + 𝑽𝟐𝟏𝟐 = 1.850176 > 1
We need to SCALE each value obtained by 1.850176
Check the
V11 0.92205 / 1.850176 0.677874
magnitude of
V1 the vector and
V12 1 / 1.850176 0.735179
it will be
approx. 1
This eigen vector corresponds to the first eigen value of 1.284028

Numerical Example: Compute Eigen Vector
For λ2 = 0.049083
0.567472*V21 + 0.61544*V22 = 0
(𝑨 − 𝝀𝟐 𝑰)𝒗 = 𝟎 0.615444*V21 + 0.667472*V22 = 0
V22 = -0.922053 V21

V22 = -0.922053 V21
0.616556 - λ2 0.615444 V21
=𝟎 If V22 = -1, then V21 = +1.084537
0.615444 0.716556 – λ2
V22
0.567472 0.615444 V21 V21 1

V2
=𝟎 V22 -0.922053
0.615444 0.667472 V22
Numerical Example: Eigen Vector
An Eigen Vector should be a UNIT Vector that is the magnitude of this vector should be 1
The magnitude of this vector is 𝑽𝟐𝟏𝟏 + 𝑽𝟐𝟏𝟐 = 1.360214 > 1
We need to SCALE each value obtained by 1.360214
Check the
V21 1/ 1.360214 0.735179
magnitude of
V2 the vector and
V22 -0.922053 / 1.360214 -0.677874
it will be
approx. 1
This eigen vector corresponds to the second eigen value of 0.049083

Numerical Example: Analysis of Eigen Vector and
Eigen Value
 The total variance is 100% which is λ1 + λ2
 The first eigen vector or First Principal Component captures 96.31% of variance. This is
computed from λ1 / λ1 + λ2
 The second eigen vector or Second Principal Component captures 3.69% of variance.
This is computed from computed from λ2 / λ1 + λ2 Centered Data
1.5
V1 V2
1
0.677874 0.735179 0.5
V 0
0.735179 -0.677874
-1.5 -1 -0.5 0 0.5 1 1.5
-0.5
Λ1 = 1.284028 Λ2 = 0.049083
-1
-1.5
Numerical Example: Transformation of Data
What does Transformation of Data Imply?
The data point (2.5,2.5) is based on the x1-x2

X2 coordinates
PC1
(2.5 , 2.5)
2.5 This data point has to be re-positioned as per
PC1 PC1 and PC2
Y2
Y1
Therefore, we have to find Y1 and Y2
To do this we take the data matrix and multiply

it by the vector (v)
This is called transformation and all data points

are now referenced to PC1 and PC2
2.5 X1
Numerical Example: Transformation of Data
X1 X2 Y1 Y2
0.69 0.49 0.828 0.175
-1.31 -1.21 -1.778 -0.143
0.39 0.99 0.992 -0.384
0.09 0.29 V1 V1 0.274 -0.130
1.29 1.09 1.676 0.209
0.49 0.79 0.677873 0.735179 0.913 -0.175
0.19 -0.31 0.735179 -0.677873 -0.099 0.350
-0.81 -0.81 -1.145 -0.046
-0.31 -0.31 -0.438 -0.018
-0.71 -1.01 -1.224 0.163
Mean Adjusted Data Eigen Vectors Transformed Data

Numerical Example: Transformed Data
Transformed data
3.5 0.400
0.300
3
0.200
2.5
0.100
2
0.000
-2.000 -1.500 -1.000 -0.500 0.000 0.500 1.000 1.500 2.000
1.5
-0.100
1
-0.200
0.5
-0.300
0 -0.400
0 0.5 1 1.5 2 2.5 3 3.5
-0.500
Numerical Example: Analysis of Eigen Vector and
Eigen Value
 The total variance is 100% which is λ1 + λ2
 The first eigen vector or First Principal Component captures 96.31% of variance. This is
computed from λ1 / λ1 + λ2
 The second eigen vector or Second Principal Component captures 3.69% of variance.
This is computed from computed from λ2 / λ1 + λ2
 The first principal component explains 96.31% of the variance. The second principal
component can be dropped without losing much information
V1 V2 V1
0.677874 0.735179 0.677874

V V
0.735179 -0.677874 0.735179
Λ1 = 1.284028 Λ2 = 0.049083 Λ1 = 1.284028

Numerical Example: Dimension Reduction
X1 X2 Y1
0.69 0.49 0.828
-1.31 -1.21 -1.778
0.39 0.99 0.992
0.09 0.29 V1 0.274
1.29 1.09 1.676
0.49 0.79 0.677873 0.913
0.19 -0.31 0.735179 -0.099
-0.81 -0.81 -1.145
-0.31 -0.31 -0.438
-0.71 -1.01 -1.224

Numerical Example: Dimension Reduction
3.5
3 0.828
-1.778 Group 1 Group 2
2.5
0.992
2 0.274
1.676 -
1.5 0.913 1.778 0 1.676
-0.099
1
-1.145
Reduced from 2 Dimensional to one
0.5 -0.438
Dimension
-1.224
0
0 0.5 1 1.5 2 2.5 3 3.5
PCA Model
X1 X2 V1 V2 Y1 Y2
x11 x12
x21 x22 v11 v12 x11 v11 + x12 v21 x11 v12 + x12 v22
x31 x32 v21 v22 x21 v11 + x22v21 x21 v12 + x22v22
x31 v11 + x32v21 x31 v12 + x32v22
Suppose that X1 is Income and X2 is Age
PC1: Y1 = V11 * X1 + V21 * X2 PC1: Y1 = V11 * Income + V21 * Age

PC2: Y2 = V12 * X1 + V22 * X2 PC2: Y2 = V12 * Income + V22 * Age

Principal Component Analysis (PCA) Final

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Principal Component Analysis (PCA) Final

Caricato da

Copyright:

Formati disponibili

Principal Component Analysis (PCA)

Tushar Jaruhar | Founder, SYMPLFYD

The way you look at Data counts

Note: Age, Experience and Income is not

 Information: “What is conveyed or represented by a particular arrangement or

 Knowing years of experience -> indicative age range can be obtained

 So, do we need both experience level and age data?

 Reduction in Number of Variables

 Removing Noise (Redundancy)

When a VECTOR (𝒗) is multiplied by a matrix (𝑨) and the

Then, the vector (𝒗) is called Eigen Vector …and ….the

 Vector has magnitude and direction

 The relationship Av = λv allows us to extract Eigen Values and Associated Eigen

 Both principal component are orthogonal

λ1 λ2  The 2nd Eigen Value is defined as

Obs x1 x2 X1 and X2 are two variables and we have 10 observations

9 1.5 1.6 0.5

10 1.1 0.9 0 0.5 1 1.5 2 2.5 3 3.5

Plot of X1 versus X2 Centered Data

Var (x1*) Cov(x1*,x2*) 0.616556 0.615444

Cov(x1*,x2*) Var(x2*) 0.615444 0.716556

(𝑨 − 𝝀 𝑰)𝒗 = 𝟎 vector because it has to give direction of

(𝟎. 𝟔𝟏𝟔𝟓𝟓𝟔 − 𝝀) ∗ (𝟎. 𝟕𝟏𝟔𝟓𝟓𝟔 − 𝝀) – 0.615444* 0.61544 = 0

(𝟎. 𝟔𝟏𝟔𝟓𝟓𝟔 − 𝝀) ∗ (𝟎. 𝟕𝟏𝟔𝟓𝟓𝟔 − 𝝀) – 0.615444* 0.61544 = 0

V11 = .92205 V12

-0.66747 0.615444 V11 V11 0.92205

The magnitude of this vector is 𝑽𝟐𝟏𝟏 + 𝑽𝟐𝟏𝟐 = 1.850176 > 1

We need to SCALE each value obtained by 1.850176

This eigen vector corresponds to the first eigen value of 1.284028

V22 = -0.922053 V21

0.567472 0.615444 V21 V21 1

The magnitude of this vector is 𝑽𝟐𝟏𝟏 + 𝑽𝟐𝟏𝟐 = 1.360214 > 1

We need to SCALE each value obtained by 1.360214

This eigen vector corresponds to the second eigen value of 0.049083

0.677874 0.735179 0.5

The data point (2.5,2.5) is based on the x1-x2

To do this we take the data matrix and multiply

This is called transformation and all data points

Mean Adjusted Data Eigen Vectors Transformed Data

0.677874 0.735179 0.677874

Λ1 = 1.284028 Λ2 = 0.049083 Λ1 = 1.284028

Mean Adjusted Data Eigen Vectors Transformed Data

Mean Adjusted Data Eigen Vectors Transformed Data

Suppose that X1 is Income and X2 is Age

PC1: Y1 = V11 * X1 + V21 * X2 PC1: Y1 = V11 * Income + V21 * Age

Potrebbero piacerti anche

Var (x1) Cov(x1,x2*) 0.616556 0.615444

Cov(x1,x2) Var(x2*) 0.615444 0.716556