Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Machine Learning
Feedback
Preferences
Logout
Sofia Cardita
Home Course Progress Video Lectures Programming Exercises Review Questions Q&A Forum Course Schedule Course Materials Course Information Octave Installation
To review the material and deepen your understanding of the course content, please answer the review questions below, and hit submit at the bottom of the page when you're done.
You are allowed to take/re-take these review quizzes multiple times, and each time you will see a slightl different set of questions or answers. We will use only your highest score, and strongly encourage you t continue re-taking each quiz until you get a 100% score at least once. (Even after that, you can re-take to review the content further, with no risk of your final score being reduced.) To prevent rapid-fire guessing, the system enforces a minimum of 10 minutes between each attempt.
Question 1
About Us
Consider the problem of predicting how well a student does in her second year of college/university, given how well they did in their first year. Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). would like to predict the value of y, which we define as the number of "A" grades they get in their seco year (sophomore year).
Questions 1 through 4 will use the following training set of a small sample of different students' performances. Here each row is one training example. Recall that in linear regression, our hypothesis h (x) = 0 + 1 x, and we use m to denote the number of training examples.
x 3 2 4 0
y 4 1 3 1
For the training set given above, what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).
4
Score
Choice explanation
m = 4 examples.
www.ml-class.org/course/quiz/feedback?submission_id=76152
1/4
10/13/11
Machine Learning
Score
Choice explanation
= 0 + 1 x = 0 + 1.5 2 = 3
Question 3
For this question, continue to assume that we are using the training set given above. Recall our definit
1 m 2
of the cost function was J( 0 , 1 ) = 2m i=1 (h (x (i) ) y (i) ) . What is J(0, 1)? In the box below please enter your answer (use decimals instead of fractions if necessary, e.g., 1.5).
1.6875,0.25
Score 0.00
0.00 / 1.00
h (x) = 0 + 1 x = x. So, 1 m J( 0 , 1 ) = (h (x (i) ) y (i) ) 2 2m i=1 1 = ((1) 2 + (1) 2 + (1) 2 + (1) 2 ) 24 4 = 8 = 0.5
Question 4 Let f be some function so that f( 0 , 1 ) outputs a number. For this problem, f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so local optima). Suppose we use gradient descent to try to minimize Which of the following statements are true? (Check all that apply.)
f may ha f( 0 , 1 ) as a function of 0
Setting the learning rate to be very small is not harmful, and can only speed up the convergen of gradient descent. No matter how 0 and 1 are initialized, so long as is sufficiently small, we can safely expect gradient descent to converge to the same solution. If the first few iterations of gradient descent cause f( 0 , 1 ) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value. If the learning rate is too small, then gradient descent may take a very long time to converge.
www.ml-class.org/course/quiz/feedback?submission_id=76152
2/4
10/13/11
Machine Learning
Your answer
Score
Choice explanation
Setting the learning rate to be very small is 0.25 not harmful, and can only speed up the convergence of gradient descent.
If the learning rate is small, gradient descen ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.
No matter how 0 and 1 are initialized, so long as is sufficiently small, we can safely expect gradient descent to converge to the same solution. If the first few iterations of gradient descent cause
0.25
initial condition, gradient descent may end u at different local optima. 0.25 If alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f( 0 least a little bit. If gradient descent instead increases the objective value, that means
decrease, then the most likely cause is that we have set the learning rate to too large a value.
alpha is too large (or you have a bug in your code!). If the learning rate is too small, then gradient 0.25 descent may take a very long time to converge.
If the learning rate is small, gradient descen ends up taking an extremely small step on time to converge.
Total
1.00 / 1.00
Question 5
Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some 0 , 1 such that J( 0 , 1 ) = 0. Which of the statements below must then be true? (Check all that apply.)
We can perfectly predict the value of y even for new examples that we have not yet seen. can perfectly predict prices of even new houses that we have not yet seen.) Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. For this to be true, we must have y (i) = 0 for every value of i = 1, 2, , m. For these values of 0 and 1 that satisfy J( 0 , 1 ) = 0 , we have that h (x (i) ) = y (i) for ever training example (x (i) , y (i) )
Your answer Score We can perfectly predict the value of y even 0.00 for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.) Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. For this to be true, we must have every value of i = 1, 2, , m. 0.25
perfectly, this does not mean that we'll alway make perfect predictions on houses in the future/on houses that we have not yet seen.
The cost function J( 0 , 1 ) for linear regression has no local optima (other than the global minimum), so gradient descent w
(i)
= 0 for 0.25
not get stuck at a bad local minimum. So long as all of our training examples lie on a straight line, we will be able to find 1 so that J( 0 , 1 ) = 0. It is not necessary that y (i)
For these values of 0 and 1 that satisfy 0.25 J( 0 , 1 ) = 0, we have that h (x (i) ) = y (i) for every training example Total
(x (i) , y (i) )
= 0 for all of our examples. J( 0 , 1 ) = 0, that means the line defined by the equation "y = 0 + 1 x" perfectly fits
www.ml-class.org/course/quiz/feedback?submission_id=76152
3/4
10/13/11
Machine Learning
www.ml-class.org/course/quiz/feedback?submission_id=76152
4/4