Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CSE 446
What assumptions are we making when the value of m is very large vs. very small? How does this affect
the test accuracy?
As m gets closer to infinity, the probability of any given word being in a spam/ham email,
namely P(word | spam) or P(word | ham), approaches p. As a result, we are assuming that words are seen
many times (when m is large) versus less or even once (when m is small). This has the effect of
smoothing the differences between the words. In this case, the testing accuracy dropped, especially
quite substantially when the m value was increased by a factor of 1000 and 10000.
If you were a spammer, how would you modify your emails to beat the classifiers we have learned
above?
If I were a spammer, I would choose to include words in my spam emails that commonly appear
in ham emails, as well as use the words in ham emails with a similar frequency. This would be effective
because it would decrease the likelihood that the probability of an email being spam given the words in
its text is larger than the probability of an email being ham given the words in its text.
3 Neural Networks
4.2
Design a two-input perceptron that implements the Boolean function A ^ B.
Weights: W0 = -0.8, W1 = 0.5, W2 = -.05
Vector of input values < X0, X1, X2>
Output
1
1
1
1
1
1
-1
-1
1
-1
-1
1
-1
1
-1
-1
Hidden layer
Node 1 (A ^ B output
(-0.8*1) + (0.5*1) + (-0.5*1) = -0.8 -1
(-0.8*1) + (0.5*1) + (-0.5*-1) = 0.2 1
(-0.8*1) + (0.5*-1) + (-0.5*-1) = -0.8 -1
(-0.8*1) + (0.5*-1) + (-0.5*1) = -1.8 -1
Node 2 ( A B output
(-0.8*1) + (-0.5*1) + (0.5*1) = -0.8 -1
(-0.8*1) + (-0.5*1) + (0.5*-1) = -1.8 -1
(-0.8*1) + (-0.5*-1) + (0.5*-1) = -0.8 -1
(-0.8*1) + (-0.5*-1) + (0.5*1) = 0.1 1
Table continues
Vector of input values < X0, X1, X2>
X0
X1 = Node 1
X2 = Node 2
1
-1
-1
1
1
-1
1
-1
-1
1
-1
1
Output
-1
1
-1
1
4.8
Revise the BACKPROPAGATION algorithm in Table 4.2 so that it operates on units using the squashing
function than in place of the sigmoid function. That is, assume the output of a single unit o = tanh(
wx . Give the weight update rule for output layer weights and hidden layer weights.
(1 oh )
wkh *
(1 ok 2)(tk ok)
koutputs
4 Ensemble Methods
Description of the Code
My code implementing the bagging on top of my ID3 decision tree algorithm is contained in
ID3Bagging.java. It has a method called createDecisionTrees() which creates and stores an array of
decision trees (classifiers) as a field that use the given threshold. The method testData() uses these
trees to predict labels of the examples. It prints the prediction accuracy for 1, 2, 40 decision trees,
calling testDataSetNumSamples() which actually tests the data, taking the mean vote of a given
number of samplesdecision trees, and computes the accuracy. When the program is run, it prints the
training and test accuracies of 1, 2, ,40 samplings, once for each threshold value (1.0, 0.05, 0.01).