Sei sulla pagina 1di 4

10/13/11

Machine Learning

Feedback

Preferences

Logout

Sofia Cardita

II. Linear regression with one variable - Feedback


You achieved a score of 3.75 out of 5.00. Your answers, as well as our explanations, are shown below.

Home Course Progress Video Lectures Programming Exercises Review Questions Q&A Forum Course Schedule Course Materials Course Information Octave Installation

To review the material and deepen your understanding of the course content, please answer the review questions below, and hit submit at the bottom of the page when you're done.

You are allowed to take/re-take these review quizzes multiple times, and each time you will see a slightl different set of questions or answers. We will use only your highest score, and strongly encourage you t continue re-taking each quiz until you get a 100% score at least once. (Even after that, you can re-take to review the content further, with no risk of your final score being reduced.) To prevent rapid-fire guessing, the system enforces a minimum of 10 minutes between each attempt.

Question 1

About Us

Consider the problem of predicting how well a student does in her second year of college/university, given how well they did in their first year. Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). would like to predict the value of y, which we define as the number of "A" grades they get in their seco year (sophomore year).

Questions 1 through 4 will use the following training set of a small sample of different students' performances. Here each row is one training example. Recall that in linear regression, our hypothesis h (x) = 0 + 1 x, and we use m to denote the number of training examples.

x 3 2 4 0

y 4 1 3 1

For the training set given above, what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).
4

Your answer 4 Total

Score

Choice explanation

1.00 1.00 / 1.00

Question explanation m is the number of training examples. In this example, we have

m = 4 examples.

www.ml-class.org/course/quiz/feedback?submission_id=76152

1/4

10/13/11

Machine Learning

Question 2 Suppose we set 0


3

= 0, 1 = 1.5. What is h (2)?

Your answer 3 Total

Score

Choice explanation

1.00 1.00 / 1.00

Question explanation Setting x = 2, we have h (x)

= 0 + 1 x = 0 + 1.5 2 = 3

Question 3

For this question, continue to assume that we are using the training set given above. Recall our definit
1 m 2

of the cost function was J( 0 , 1 ) = 2m i=1 (h (x (i) ) y (i) ) . What is J(0, 1)? In the box below please enter your answer (use decimals instead of fractions if necessary, e.g., 1.5).
1.6875,0.25

Your answer 1.6875,0.25 Total

Score 0.00

Choice explanation (No match)

0.00 / 1.00

Question explanation When 0 = 0 and 1 = 1 , we have

h (x) = 0 + 1 x = x. So, 1 m J( 0 , 1 ) = (h (x (i) ) y (i) ) 2 2m i=1 1 = ((1) 2 + (1) 2 + (1) 2 + (1) 2 ) 24 4 = 8 = 0.5

Question 4 Let f be some function so that f( 0 , 1 ) outputs a number. For this problem, f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so local optima). Suppose we use gradient descent to try to minimize Which of the following statements are true? (Check all that apply.)

f may ha f( 0 , 1 ) as a function of 0

Setting the learning rate to be very small is not harmful, and can only speed up the convergen of gradient descent. No matter how 0 and 1 are initialized, so long as is sufficiently small, we can safely expect gradient descent to converge to the same solution. If the first few iterations of gradient descent cause f( 0 , 1 ) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value. If the learning rate is too small, then gradient descent may take a very long time to converge.

www.ml-class.org/course/quiz/feedback?submission_id=76152

2/4

10/13/11

Machine Learning

Your answer

Score

Choice explanation

Setting the learning rate to be very small is 0.25 not harmful, and can only speed up the convergence of gradient descent.

If the learning rate is small, gradient descen ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.

No matter how 0 and 1 are initialized, so long as is sufficiently small, we can safely expect gradient descent to converge to the same solution. If the first few iterations of gradient descent cause

0.25

This is not true, because depending on the

initial condition, gradient descent may end u at different local optima. 0.25 If alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f( 0 least a little bit. If gradient descent instead increases the objective value, that means

f( 0 , 1 ) to increase rather than

decrease, then the most likely cause is that we have set the learning rate to too large a value.

alpha is too large (or you have a bug in your code!). If the learning rate is too small, then gradient 0.25 descent may take a very long time to converge.

If the learning rate is small, gradient descen ends up taking an extremely small step on time to converge.

each iteration, and therefore can take a long

Total

1.00 / 1.00

Question 5

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some 0 , 1 such that J( 0 , 1 ) = 0. Which of the statements below must then be true? (Check all that apply.)

We can perfectly predict the value of y even for new examples that we have not yet seen. can perfectly predict prices of even new houses that we have not yet seen.) Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. For this to be true, we must have y (i) = 0 for every value of i = 1, 2, , m. For these values of 0 and 1 that satisfy J( 0 , 1 ) = 0 , we have that h (x (i) ) = y (i) for ever training example (x (i) , y (i) )

Your answer Score We can perfectly predict the value of y even 0.00 for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.) Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. For this to be true, we must have every value of i = 1, 2, , m. 0.25

Choice explanation Even though we can fit our training set

perfectly, this does not mean that we'll alway make perfect predictions on houses in the future/on houses that we have not yet seen.

The cost function J( 0 , 1 ) for linear regression has no local optima (other than the global minimum), so gradient descent w

(i)

= 0 for 0.25

not get stuck at a bad local minimum. So long as all of our training examples lie on a straight line, we will be able to find 1 so that J( 0 , 1 ) = 0. It is not necessary that y (i)

For these values of 0 and 1 that satisfy 0.25 J( 0 , 1 ) = 0, we have that h (x (i) ) = y (i) for every training example Total

(x (i) , y (i) )

= 0 for all of our examples. J( 0 , 1 ) = 0, that means the line defined by the equation "y = 0 + 1 x" perfectly fits

all of our data. 0.75 / 1.00

www.ml-class.org/course/quiz/feedback?submission_id=76152

3/4

10/13/11

Machine Learning

www.ml-class.org/course/quiz/feedback?submission_id=76152

4/4

Potrebbero piacerti anche