ML Suggestion

Analytical Questions
1. Consider the following neural network which takes two binary-valued inputs x1,x2 ∈ {0, 1} and
outputs h Θ (x) . Which of the following logical functions does it (approximately) compute? Give
proper explanation for your answer. 3
2. Consider the neural network given below. Write the equation which correctly computes the
activation a1(3) ? Note: g(z) is the sigmoid activation function. 2
Consider the neural network given above. You'd like to compute the activations of the hidden layer
a(2) ∈ R3. One way to do so is the following Octave code:
% Theta1 is Theta with superscript "(1)" from lecture

% ie, the matrix of parameters for the mapping from layer 1 (input) to layer 2
% Theta1 has size 3x3
% Assume 'sigmoid' is a built-in function to compute 1 / (1 + exp(-z))
a2 = zeros (3, 1);
for i = 1:3
for j = 1:3
a2(i) = a2(i) + x(j) * Theta1(i, j);
end
a2(i) = sigmoid (a2(i));
end
You want to have a vectorized implementation of this (i.e., one that does not use for loops).
Which of the following implementations correctly compute a(2) . ? Check all that apply.
1. a2 = sigmoid (Theta2 * x);

2. z = sigmoid(x); a2 =sigmoid (Theta1 *z);
3. a2 = sigmoid (x * Theta1);
4. z = Theta1 * x; a2 =sigmoid (z); 4
4. Which of the following statements are true? Check all that apply. 4
1. A two layer (one input layer, one output layer; no hidden layer) neural network can
represent the XOR function.
2. If a neural network is overfitting the data, one solution would be to increase the
regularisation parameter λ .
3. Suppose you have a multiclass classification problem with 3 classes , trained with a 3 layer
network.Let a1(3) = (hΘ (x))1 be the activation of the first output unit, and similarly a2(3) =
(hΘ (x))2 and a 3(3) = (hΘ (x))3 . Then for any input x, it must be the case that a1(3) +a2(3) +a3(3)
= 1.
4. The activation values of the hidden units in a neural network, with the sigmoid activation
function applied at every layer, are always in the range (0, 1).
5. You are training a three layer neural network and would like to use backpropagation to compute the gradient of the cost function. In the backpropagation algorithm,
(2) (2) + (3) * (2)
one of the steps is to update Δij := Δ δi for every i,j. Write down the equation in vectorised for this step
(a )
?2
6. Suppose Theta1 is a 2x5 matrix, and Theta2 is a 3x6 matrix. You set thetaVec = [Theta1(:) ;
Theta2(:)]. Which of the following correctly recovers Theta2? 3
1. reshape(thetaVec(10:27),3,6)
7. Let J (θ)=3θ3+2. Let θ= 1 ,and ε = 0.01 . Use the formula (J(θ+ ε)- J( θ-ε))/2 ε to numerically
compute an approximation to the derivative at θ=1. 2
1. If our neural network overfits the training set one reasonable step to take is to increase the
regularization parameter λ .
2. Computing the gradient of the cost function in a neural network has the same efficiency
when we use backpropagation or when we numerically compute it using the method of
gradient checking.
3. For computational efficiency, after we have performed gradient checking to verify that our
backpropagation code is correct, we disable gradient descent checking before using back
propagation to train the network.
4. Using a large value of λ can hurt the performance of neural network: the only reason we do
not set λ to be too large is to avoid numerical problems.
9. Suppose you have a dataset with n = 10 features and m = 5000 examples. After training your
logistic regression classifier with gradient descent, you find that it has underfit the training set and
does not achieve the desired performance on the training or cross validation sets. Which of the
following might be promising steps to take? Check all that apply. 4
1. Use an SVM with a Gaussian kernel

2. Try using a neural network with a large no of hidden units
3. Reduce the no of examples in the training set.
4. Use an SVM with a linear kernel, without introducing new features.
1. If you are training multiclass SVMs with one-vs-all method,it is not possible to use
a kernel.
2. If the data are linearly separable, an SVM using a linear kernel will return the same
parameters θ regardless of the chosen value of C (i.e., the resulting value of θ does not
depend on C ).
3. It is important to perform feature normalization before using the Gaussian kernel.
4. The maximum value of Gaussian Kernel is 1.
11.You train a learning algorithm, and find that it has unacceptably high error on the test set. You
plot the learning curve, and obtain the figure below. Is the algorithm suffering from high bias, high
variance, or neither?
3
12. Suppose you have implemented regularized logistic regression to classify what object is in an
image (i.e., to do object recognition). However, when you test your hypothesis on a new set of
images, you find that it makes unacceptably large errors with its predictions on the new images.
However, your hypothesis performs well (has low error) on the training set. Which of the following
are promising steps to take? Check all that apply.
4
1. Try evaluating the hypothesis on a cross validation set rather than test set.
2. Get more training examples.
3. Try adding polynomial features
4. Try using a smaller set of features
13.Suppose you have implemented regularized logistic regression to predict what items customers
will purchase on a web shopping site. However, when you test your hypothesis on a new set of
customers, you find that it makes unacceptably large errors in its predictions. Furthermore, the
hypothesis performs poorly on the training set. Which of the following might be promising steps
to take? Check all that apply.
4
1. Try adding polynomial features
2. Try to obtain and use additional features
3. Try using a smaller set of features
4. Use fewer training examples
1. If the neural network has much lower training error than test error, then adding more layers
will help bring down the test error down because we can fit the test set better.
2. When debugging learning algorithms, it is useful to plot a learning curve to understand if
there is high bias or high variance problem.
3. If a learning algorithm is suffering from high bias, only adding more training examples may
not improve the test error significantly.
4. A model with more parameters is more prone to overfitting and typically has higher
variance.
15. You are working on a spam classification system using regularized logistic regression. "Spam"
is the positive class (y = 1) and "not spam" is the negative class (y = 0). You have trained your
classifier, and there are m = 1000 examples in the cross-validation set. The chart of predicted class
vs. actual class is: 6
85 890
15 10
The horizontal values are for actual class while vertical values are for predicted class.
1. What is the classifier's recall (as a value from 0 to 1)?
2. What is the classifier’s precision ?
3. What is the F1-score ?
16. Suppose you are working on a spam classifier, where spam emails are positive examples ( y=1)
and non-spam emails are negative examples ( y= 0 ). You have a training set of emails in which
99% of the emails are non-spam and the other 1% is spam. Which of the following statements are
true? Check all that apply. 4
1. If you always predict non-spam, your classifier have 99% accuracy on the training set,and it
will likely perform similarly on the cross validation set.
2. If you always predict spam, your classifier will have a recall of 100% and precision of 1%.
3. If you always predict non-spam , your classifier will have 99% accuracy on the training set,
but will do much worse on the cross-validation set because i overfit the training data.
4. If you always predict non-spam , your classifier will have a recall of 0%.
17.Suppose you have trained a logistic regression classifier which is outputting hΘ (x).We predict
1 if hΘ (x) ≥ threshold , and predict 0 if hΘ (x) < threshold ,where currently the threshold is set to
0.5. Suppose you increase the threshold to 0.9. Which of the following are true? Check all that
apply. 4
1. (a) The classifier is likely to have unchanged precision and recall nut higher accuracy.
2. (b) The classifier is likely to have a lower recall.
3. (c) The classifier is likely to have a higher recall.
4. (d) The classifier is likely to have a lower precision.
18. Consider the problem of predicting how well a student does in her second year of
college/university, given how well they did in their first year. Specifically, let x be equal to the
number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of
college (freshman year). We would like to predict the value of y, which we define as the number of
"A" grades they get in their second year (sophomore year). 6
X Y
3 4
2 1
4 3
0 1
1. For the training set given above, what is the value of m ?

2. What is J(0,1) ?
3. Suppose we set θ0 = 0, θ 1 = 1.5 . What is hΘ (2) ?

ML Suggestion

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

ML Suggestion

Caricato da

Copyright:

Formati disponibili

Analytical Questions

% Theta1 is Theta with superscript "(1)" from lecture

1. a2 = sigmoid (Theta2 * x);

1. Use an SVM with a Gaussian kernel

1. For the training set given above, what is the value of m ?

Potrebbero piacerti anche