Linear Algebra and Probability Theory

Quick Tour of Basic Linear Algebra and Probability Theory
Quick Tour of Basic Linear Algebra and

Probability Theory
CS246: Mining Massive Data Sets
Winter 2011
Basic Linear Algebra
Outline
1 Basic Linear Algebra
2 Basic Probability Theory
Matrices and Vectors
Matrix: A rectangular array of numbers, e.g., A R
mn
:
A =
_
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_
_
_
_
_
Vector: A matrix consisting of only one column (default) or
one row, e.g., x R
n
x =
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
Matrices and Vectors
Matrix: A rectangular array of numbers, e.g., A R
mn
:
A =
_
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_
_
_
_
_
Vector: A matrix consisting of only one column (default) or
one row, e.g., x R
n
x =
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
Matrix Multiplication
If A R
mn
, B R
np
, C = AB, then C R
mp
:
C
ij
=
n
k=1
A
ik
B
kj
Special cases: Matrix-vector product, inner product of two
vectors. e.g., with x, y R
n
:
x
T
y =
n
i =1
x
i
y
i
R
Matrix Multiplication
If A R
mn
, B R
np
, C = AB, then C R
mp
:
C
ij
=
n
k=1
A
ik
B
kj
Special cases: Matrix-vector product, inner product of two
vectors. e.g., with x, y R
n
:
x
T
y =
n
i =1
x
i
y
i
R
Properties of Matrix Multiplication
Associative: (AB)C = A(BC)
Distributive: A(B + C) = AB + AC
Non-commutative: AB = BA
Block multiplication: If A = [A
ik
], B = [B
kj
], where A
ik
s and
B
kj
s are matrix blocks, and the number of columns in A
ik
is
equal to the number of rows in B
kj
, then C = AB = [C
ij
]
where C
ij
=
k
A
ik
B
kj
Example: If

x R
n
and A = [
a
1
|
a
2
| . . . |
a
n
] R
mn
,
B = [
b
1
|
b
2
| . . . |
b
p
] R
np
:
A

x =
n
i =1
x
i

a
i
AB = [A
b
1
| A
b
2
| . . . | A
b
p
]
ik
], B = [B
kj
], where A
ik
s and
B
kj
ik
is
kj
, then C = AB = [C
ij
]
where C
ij
=
k
A
ik
B
kj
Example: If

x R
n
and A = [
a
1
|
a
2
| . . . |
a
n
] R
mn
,
B = [
b
1
|
b
2
| . . . |
b
p
] R
np
:
A

x =
n
i =1
x
i

a
i
AB = [A
b
1
| A
b
2
| . . . | A
b
p
]
ik
], B = [B
kj
], where A
ik
s and
B
kj
ik
is
kj
, then C = AB = [C
ij
]
where C
ij
=
k
A
ik
B
kj
Example: If

x R
n
and A = [
a
1
|
a
2
| . . . |
a
n
] R
mn
,
B = [
b
1
|
b
2
| . . . |
b
p
] R
np
:
A

x =
n
i =1
x
i

a
i
AB = [A
b
1
| A
b
2
| . . . | A
b
p
]
ik
], B = [B
kj
], where A
ik
s and
B
kj
ik
is
kj
, then C = AB = [C
ij
]
where C
ij
=
k
A
ik
B
kj
Example: If

x R
n
and A = [
a
1
|
a
2
| . . . |
a
n
] R
mn
,
B = [
b
1
|
b
2
| . . . |
b
p
] R
np
:
A

x =
n
i =1
x
i

a
i
AB = [A
b
1
| A
b
2
| . . . | A
b
p
]
ik
], B = [B
kj
], where A
ik
s and
B
kj
ik
is
kj
, then C = AB = [C
ij
]
where C
ij
=
k
A
ik
B
kj
Example: If

x R
n
and A = [
a
1
|
a
2
| . . . |
a
n
] R
mn
,
B = [
b
1
|
b
2
| . . . |
b
p
] R
np
:
A

x =
n
i =1
x
i

a
i
AB = [A
b
1
| A
b
2
| . . . | A
b
p
]
Operators and properties
Transpose: A R
mn
, then A
T
R
nm
: (A
T
)
ij
= A
ji
Properties:
(A
T
)
T
= A
(AB)
T
= B
T
A
T
(A + B)
T
= A
T
+ B
T
Trace: A R
nn
, then: tr (A) =
n
i =1
A
ii
Properties:
tr (A) = tr (A
T
)
tr (A + B) = tr (A) + tr (B)
tr (A) = tr (A)
If AB is a square matrix, tr (AB) = tr (BA)
Transpose: A R
mn
, then A
T
R
nm
: (A
T
)
ij
= A
ji
Properties:
(A
T
)
T
= A
(AB)
T
= B
T
A
T
(A + B)
T
= A
T
+ B
T
Trace: A R
nn
, then: tr (A) =
n
i =1
A
ii
Properties:
tr (A) = tr (A
T
)
tr (A + B) = tr (A) + tr (B)
tr (A) = tr (A)
Transpose: A R
mn
, then A
T
R
nm
: (A
T
)
ij
= A
ji
Properties:
(A
T
)
T
= A
(AB)
T
= B
T
A
T
(A + B)
T
= A
T
+ B
T
Trace: A R
nn
, then: tr (A) =
n
i =1
A
ii
Properties:
tr (A) = tr (A
T
)
tr (A + B) = tr (A) + tr (B)
tr (A) = tr (A)
Transpose: A R
mn
, then A
T
R
nm
: (A
T
)
ij
= A
ji
Properties:
(A
T
)
T
= A
(AB)
T
= B
T
A
T
(A + B)
T
= A
T
+ B
T
Trace: A R
nn
, then: tr (A) =
n
i =1
A
ii
Properties:
tr (A) = tr (A
T
)
tr (A + B) = tr (A) + tr (B)
tr (A) = tr (A)
Special types of matrices
Identity matrix: I = I
n
R
nn
:
I
ij
=
_
1 i=j,
0 otherwise.
A R
mn
: AI
n
= I
m
A = A
Diagonal matrix: D = diag(d
1
, d
2
, . . . , d
n
):
D
ij
=
_
d
i
j=i,
0 otherwise.
Symmetric matrices: A R
nn
is symmetric if A = A
T
.
Orthogonal matrices: U R
nn
is orthogonal if
UU
T
= I = U
T
U
n
R
nn
:
I
ij
=
_
1 i=j,
0 otherwise.
A R
mn
: AI
n
= I
m
A = A
1
, d
2
, . . . , d
n
):
D
ij
=
_
d
i
j=i,
0 otherwise.
nn
T
.
nn
is orthogonal if
UU
T
= I = U
T
U
n
R
nn
:
I
ij
=
_
1 i=j,
0 otherwise.
A R
mn
: AI
n
= I
m
A = A
1
, d
2
, . . . , d
n
):
D
ij
=
_
d
i
j=i,
0 otherwise.
nn
T
.
nn
is orthogonal if
UU
T
= I = U
T
U
n
R
nn
:
I
ij
=
_
1 i=j,
0 otherwise.
A R
mn
: AI
n
= I
m
A = A
1
, d
2
, . . . , d
n
):
D
ij
=
_
d
i
j=i,
0 otherwise.
nn
T
.
nn
is orthogonal if
UU
T
= I = U
T
U
Linear Independence and Rank
A set of vectors {x
1
, . . . , x
n
} is linearly independent if
{
1
, . . . ,
n
}:

n
i =1
i
x
i
= 0
Rank: A R
mn
, then rank(A) is the maximum number of
linearly independent columns (or equivalently, rows)
Properties:
rank(A) min{m, n}
rank(A) = rank(A
T
)
rank(AB) min{rank(A), rank(B)}
rank(A + B) rank(A) + rank(B)
A set of vectors {x
1
, . . . , x
n
{
1
, . . . ,
n
}:

n
i =1
i
x
i
= 0
Rank: A R
mn
Properties:
rank(A) min{m, n}
rank(A) = rank(A
T
)
A set of vectors {x
1
, . . . , x
n
{
1
, . . . ,
n
}:

n
i =1
i
x
i
= 0
Rank: A R
mn
Properties:
rank(A) min{m, n}
rank(A) = rank(A
T
)
Matrix Inversion
If A R
nn
, rank(A) = n, then the inverse of A, denoted
A
1
is the matrix that: AA
1
= A
1
A = I
Properties:
(A
1
)
1
= A
(AB)
1
= B
1
A
1
(A
1
)
T
= (A
T
)
1
Range and Nullspace of a Matrix
Span: span({x
1
, . . . , x
n
}) = {
n
i =1
i
x
i
|
i
R}
Projection:
Proj (y; {x
i
}
1i n
) = argmin
vspan({x
i
}
1i n
)
{||y v||
2
}
Range: A R
mn
, then R(A) = {Ax| x R
n
} is the span
of the columns of A
Proj (y, A) = A(A
T
A)
1
A
T
y
Nullspace: null (A) = {x R
n
| Ax = 0}
Span: span({x
1
, . . . , x
n
}) = {
n
i =1
i
x
i
|
i
R}
Projection:
Proj (y; {x
i
}
1i n
) = argmin
vspan({x
i
}
1i n
)
{||y v||
2
}
Range: A R
mn
n
} is the span
of the columns of A
Proj (y, A) = A(A
T
A)
1
A
T
y
n
| Ax = 0}
Span: span({x
1
, . . . , x
n
}) = {
n
i =1
i
x
i
|
i
R}
Projection:
Proj (y; {x
i
}
1i n
) = argmin
vspan({x
i
}
1i n
)
{||y v||
2
}
Range: A R
mn
n
} is the span
of the columns of A
Proj (y, A) = A(A
T
A)
1
A
T
y
n
| Ax = 0}
Span: span({x
1
, . . . , x
n
}) = {
n
i =1
i
x
i
|
i
R}
Projection:
Proj (y; {x
i
}
1i n
) = argmin
vspan({x
i
}
1i n
)
{||y v||
2
}
Range: A R
mn
n
} is the span
of the columns of A
Proj (y, A) = A(A
T
A)
1
A
T
y
n
| Ax = 0}
Span: span({x
1
, . . . , x
n
}) = {
n
i =1
i
x
i
|
i
R}
Projection:
Proj (y; {x
i
}
1i n
) = argmin
vspan({x
i
}
1i n
)
{||y v||
2
}
Range: A R
mn
n
} is the span
of the columns of A
Proj (y, A) = A(A
T
A)
1
A
T
y
n
| Ax = 0}
Determinant
A R
nn
, a
1
, . . . , a
n
the rows of A,
S = {
n
i =1
i
a
i
| 0
i
1}, then det (A) is the volume of
S.
Properties:
det (I) = 1
det (A) = det (A)
det (A
T
) = det (A)
det (AB) = det (A)det (B)
det (A) = 0 if and only if A is invertible.
If A invertible, then det (A
1) = det (A)
Quadratic Forms and Positive Semidenite Matrices
A R
nn
, x R
n
, x
T
Ax is called a quadratic form:
x
T
Ax =
1i ,j n
A
ij
x
i
x
j
A is positive denite if x R
n
: x
T
Ax > 0
A is positive semidenite if x R
n
: x
T
Ax 0
A is negative denite if x R
n
: x
T
Ax < 0
A is negative semidenite if x R
n
: x
T
Ax 0
Quadratic Forms and Positive Semidenite Matrices
A R
nn
, x R
n
, x
T
Ax is called a quadratic form:
x
T
Ax =
1i ,j n
A
ij
x
i
x
j
A is positive denite if x R
n
: x
T
Ax > 0
A is positive semidenite if x R
n
: x
T
Ax 0
A is negative denite if x R
n
: x
T
Ax < 0
A is negative semidenite if x R
n
: x
T
Ax 0
Eigenvalues and Eigenvectors
A R
nn
, C is an eigenvalue of A with the
corresponding eigenvector x C
n
(x = 0) if:
Ax = x
eigenvalues: the n possibly complex roots of the
polynomial equation det (A I) = 0, and denoted as
1
, . . . ,
n
Properties:
tr (A) =
n
i =1
i
det (A) =
n
i =1
i
rank(A) = |{1 i n|
i
= 0}|
Matrix Eigendecomposition
A R
nn
,
1
, . . . ,
n
the eigenvalues, and x
1
, . . . , x
n
the
eigenvectors. X = [x
1
|x
2
| . . . |x
n
], = diag(
1
, . . . ,
n
),
then AX = X.
A called diagonalizable if X invertible: A = XX
1
If A symmetric, then all eigenvalues real, and X orthogonal
(hence denoted by U = [u
1
|u
2
| . . . |u
n
]):
A = UU
T
=
n
i =1
i
u
i
u
T
i
A special case of Signular Value Decomposition
Basic Probability Theory
Outline
1 Basic Linear Algebra
2 Basic Probability Theory
Elements of Probability
Sample Space : Set of all possible outcomes
Event Space F: A family of subsets of
Probability Measure: Function P : F R with properties:
1 P(A) 0 (A F)
2 P() = 1
3 A
i
s disjoint, then P(
i
A
i
) =
i
P(A
i
)
Conditional Probability and Independence
For events A, B:
P(A|B) =
P(A
B)
P(B)
A, B independent if P(A|B) = P(A) or equivalently:
P(A
B) = P(A)P(B)
Random Variables and Distributions
A random variable X is a function X : R
Example: Number of heads in 20 tosses of a coin
Probabilities of events associated with random variables
dened based on the original probability function. e.g.,
P(X = k) = P({ |X() = k})
Cumulative Distribution Function (CDF) F
X
: R [0, 1]:
F
X
(x) = P(X x)
Probability Mass Function (pmf): X discrete then
p
X
(x) = P(X = x)
Probability Density Function (pdf): f
X
(x) = dF
X
(x)/dx
Properties of Distribution Functions
CDF:
0 F
X
(x) 1
F
X
monotone increasing, with lim
x
F
X
(x) = 0,
lim
x
F
X
(x) = 1
pmf:
0 p
X
(x) 1
x
p
X
(x) = 1
xA
p
X
(x) = p
X
(A)
pdf:
f
X
(x) 0
_
f
X
(x)dx = 1
_
xA
f
X
(x)dx = P(X A)
Expectation and Variance
Assume random variable X has pdf f
X
(x), and g : R R.
Then
E[g(X)] =
_

g(x)f
X
(x)dx
for discrete X, E[g(X)] =
x
g(x)p
X
(x)
Properties:
for any constant a R, E[a] = a
E[ag(X)] = aE[g(X)]
Linearity of Expectation:
E[g(X) + h(X)] = E[g(X)] + E[h(X)]
Var [X] = E[(X E[X])
2
]
Some Common Random Variables
X Bernoulli (p) (0 p 1):
p
X
(x) =
_
p x=1,
1 p x=0.
X Geometric(p) (0 p 1): p
X
(x) = p(1 p)
x1
X Uniform(a, b) (a < b):
f
X
(x) =
_
1
ba
a x b,
0 otherwise.
X Normal (,
2
):
f
X
(x) =
1
2
e
1
2
2
(x)
2
Multiple Random Variables and Joint Distributions
X
1
, . . . , X
n
random variables
Joint CDF: F
X
1
,...,X
n
(x
1
, . . . , x
n
) = P(X
1
x
1
, . . . , X
n
x
n
)
Joint pdf: f
X
1
,...,X
n
(x
1
, . . . , x
n
) =

n
F
X
1
,...,X
n
(x
1
,...,x
n
)
x
1
...x
n
Marginalization:
f
X
1
(x
1
) =
_
. . .
_
f
X
1
,...,X
n
(x
1
, . . . , x
n
)dx
2
. . . dx
n
Conditioning: f
X
1
|X
2
,...,X
n
(x
1
|x
2
, . . . , x
n
) =
f
X
1
,...,X
n
(x
1
,...,x
n
)
f
X
2
,...,X
n
(x
2
,...,x
n
)
Chain Rule: f (x
1
, . . . , x
n
) = f (x
1
)
n
i =2
f (x
i
|x
1
, . . . , x
i 1
)
Independence: f (x
1
, . . . , x
n
) =
n
i =1
f (x
i
).
More generally, events A
1
, . . . , A
n
independent if
P(
i S
A
i
) =
i S
P(A
i
) (S {1, . . . , n}).
Random Vectors
X
1
, . . . , X
n
random variables. X = [X
1
X
2
. . . X
n
]
T
random vector.
If g : R
n
R, then
E[g(X)] =
_
R
n
g(x
1
, . . . , x
n
)f
X
1
,...,X
n
(x
1
, . . . , x
n
)dx
1
. . . dx
n
if g : R
n
R
m
, g = [g
1
. . . g
m
]
T
, then
E[g(X)] =
_
E[g
1
(X)] . . . E[g
m
(X)]
T
Covariance Matrix:
= Cov(X) = E
_
(X E[X])(X E[X])
T
Properties of Covariance Matrix:
ij
= Cov[X
i
, X
j
] = E
_
(X
i
E[X
i
])(X
j
E[X
j
])
symmetric, positive semidenite

Multivariate Gaussian Distribution
R
n
, R
nn
symmetric, positive semidenite
X N(, ) n-dimensional Gaussian distribution:
f
X
(x) =
1
(2)
n/2
det ()
1/2
exp
_
1
2
(x )
T
1
(x )
_
E[X] =
Cov(X) =
Parameter Estimation: Maximum Likelihood
Parametrized distribution f
X
(x; ) with parameter(s) unknown.
IID samples x
1
, . . . , x
n
observed.
Goal: Estimate
MLE:

= argmax
{f (x
1
, . . . , x
n
; )}
MLE Example
X Gaussian(,
2
). = (,
2
) unknown. Samples x
1
, . . . , x
n
.
Then:
f (x
1
, . . . , x
n
; ,
2
) = (
1
2
2
)
n/2
exp
_
n
i =1
(x
i
)
2
2
2
_
Setting:
log f
= 0 and
log f
= 0
Gives:

MLE
=
n
i =1
x
i
n
,
2
MLE
=
n
i =1
(x
i
)
2
n
If not possible to nd the optimal point in closed form, iterative
methods such as gradient decent can be used.
Some Useful Inequalities
Markovs Inequality: X random variable, and a > 0. Then:
P(|X| a)
E[|X|]
a
Chebyshevs Inequality: If E[X] = , Var (X) =
2
, k > 0,
then:
Pr (|X | k)
1
k
2
Chernoff bound: X
1
, . . . , X
n
iid random variables, with
E[X
i
] = , X
i
{0, 1} (1 i n). Then:
P(|
1
n
n
i =1
X
i
| ) 2 exp(2n
2
)
Multiple variants of Chernoff-type bounds exist, which can
be useful in different settings
References
1 CS229 notes on basic linear algebra and probability theory
2 Wikipedia!

Linear Algebra and Probability Theory

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Algebra and Probability Theory

Caricato da

Copyright:

Formati disponibili

Quick Tour of Basic Linear Algebra and Probability Theory

Quick Tour of Basic Linear Algebra and

Properties of Covariance Matrix:

symmetric, positive semidenite

Potrebbero piacerti anche