Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Learning Systems
CS6787 Lecture 1 — Fall 2018
Fundamentals of
Machine Learning
? Machine Learning
in Practice
this course
What’s missing in the basic stuff ?
Efficiency!
Motivation:
Machine learning applications
involve large amounts of data
• Math/statistics knowledge
• At the level of the entrance exam for CS 4780
Grading
• Paper presentations
• Discussion participation
• Paper reviews
• Final project
Paper presentations
• Presenting in groups of two-to-three
• Signups on Wednesday
Discussion and Paper Reviews
• Each paper presentation will be followed by questions and breakout
discussion
• For each class period, submit a review of one of the two papers
• Detailed instructions are on the course webpage
Final Project
• Open-ended: work on what you think is interesting!
• Groups of up to three
loss function
N
X
min f (x; yi ) training
model x examples
i=1
2
f (x) = x
f (x) = |x|
y x
= (1 ↵)e + ↵e
Properties of convex functions
• Any line segment we draw between two points lies above the curve
h(x) = f (Ax + b)
• Compositions of convex functions are NOT generally convex
• Neural nets are like this
h(x) = f (g(x))
Convex Functions: Alternative Definitions
• First-order condition
hx y, rf (x) rf (y)i 0
• Second-order condition
r2 f (x) ⌫ 0
• This means that the matrix of second derivatives is positive semidefinite
2
f (x) = x
f 00 (x) = 2 0
Example: Exponential
x
f (x) = e
00 x
f (x) = e 0
Example: Logistic Loss
x
f (x) = log(1 + e )
x
0 e 1
f (x) = x
= x
1+e 1+e
x
00 e 1
f (x) = x )2
= 0.
(1 + e (1 + ex )(1 + e x)
Strongly Convex Functions
• Basically the easiest class of functions for optimization
• First-order condition:
hx y, rf (x) rf (y)i µkx yk2
• Second-order condition:
r2 f (x) ⌫ µI
• Equivalently:
µ 2
h(x) = f (x) 2 kxk is convex
Which of the functions we’ve looked at are
strongly convex?
Which of the functions we’ve looked at are
strongly convex?
Concave functions
• A function is concave if its negation is convex
00 1
f (x) = 0
x2
Why care about convex functions?
Convex Optimization
• Goal is to minimize a convex function
4.5
4
3.5
3
2.5
2
1 .5
1
0.5
0
-2.5 -2 -1 .5 -1 -0.5 0 0.5 1 1 .5 2 2.5
Gradient Descent
x x ↵rf (x)
4.5
4
3.5
3
2.5
2
1 .5
1
0.5
0
-2.5 -2 -1 .5 -1 -0.5 0 0.5 1 1 .5 2 2.5
Gradient Descent Converges
• Iterative definition of gradient descent
xt+1 = xt ↵rf (xt )
• Assumptions/terminology:
Global optimum is x⇤
Bounded second derivative µI r2 f (x) LI
Gradient Descent Converges (continued)
xt+1 x⇤ = xt x⇤ ↵(rf (xt ) rf (x⇤ ))
⇤ 2 ⇤
= xt x ↵r f (zt )(xt x )
2 ⇤
= (I ↵r f (zt ))(xt x ).
⇥ 2
⇤
assuming the upper bound E krf (x; y) rh(x)k M
Stochastic Gradient Descent Convergence
• Already we can see that this converges to a fixed point of
⇥ ⇤ 2
⇤ ↵M
E kxt+1 x k =
2µ ↵µ2