Sei sulla pagina 1di 5

Autumn Johnson

CSE 446

2 Nave Bayes Classifier


Description of the Code
SpamClassifer.java stores and calculates the important probabilities for the Nave Bayes
implementation upon construction. It has two public methods, getHamPriorProb() and
getSpamPriorProb(), that return the prior probabilities of an email being spam/ham. It also has a method
getCondProbGivenWords() which returns the conditional probability an email classification or ham
given it has a word in its text (i.e. P(spam | word) or P(ham | word), depending on the value of the
passed boolean parameter). This method uses either a Map of probabilities P(word | spam) or one of
P(word | ham), that were calculated during the preprocessing in calculateCondProb() and
getCondProb(). Finally, it also has a method that returns a List of the 5 most likely words given an
email is spam/ham.
SpamClassifierTest.java parses the testing data and uses probabilities provided by the classifier
as well as the Nave Bayes assumption to calculate conditional probability P(spam | word1, word2,
,wordn) and P(ham | word1, word2,,wordn). It compares these to classify whether an email is spam
or ham and then prints the prediction accuracy. It prints other information about the training data too,
including prior probabilities and 5 most likely words of a spam/ham email.
Prior probabilities:
P(spam)
57.36667%
P(ham)
42.63334%
Five most likely words given a document is spam:
1. a
2. corp
3. to
4. enron
5. the
Five most likely words given a document is ham:
1. a
2. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3. to
4. enron
5. the
Accuracy: 88.7%

M Value vs Testing Accuracy

What assumptions are we making when the value of m is very large vs. very small? How does this affect
the test accuracy?
As m gets closer to infinity, the probability of any given word being in a spam/ham email,
namely P(word | spam) or P(word | ham), approaches p. As a result, we are assuming that words are seen
many times (when m is large) versus less or even once (when m is small). This has the effect of
smoothing the differences between the words. In this case, the testing accuracy dropped, especially
quite substantially when the m value was increased by a factor of 1000 and 10000.
If you were a spammer, how would you modify your emails to beat the classifiers we have learned
above?
If I were a spammer, I would choose to include words in my spam emails that commonly appear
in ham emails, as well as use the words in ham emails with a similar frequency. This would be effective
because it would decrease the likelihood that the probability of an email being spam given the words in
its text is larger than the probability of an email being ham given the words in its text.

3 Neural Networks
4.2
Design a two-input perceptron that implements the Boolean function A ^ B.
Weights: W0 = -0.8, W1 = 0.5, W2 = -.05
Vector of input values < X0, X1, X2>

(W0 * X0) + (W1* X1) + (W2*X2)

Output

1
1
1
1

1
1
-1
-1

1
-1
-1
1

(-0.8*1) + (0.5*1) + (-0.5*1) = -0.8


(-0.8*1) + (0.5*1) + (-0.5*-1) = 0.2
(-0.8*1) + (0.5*-1) + (-0.5*-1) = -0.8
(-0.8*1) + (0.5*-1) + (-0.5*1) = -1.8

-1
1
-1
-1

Design a two-layer network of perceptrons that implements A XOR B.


A XOR B = (A ^ B ( A B
Weights:
Node 1 unit: W0 = -0.8, W1 = 0.5, W2 = -0.5
Node 2 unit: W0 = -0.8, W1 = -0.5, W2 = 0.5
Output unit: W0 = 0.3, W1 = 0.5, W2 = 0.5
Vector of input values <
X0, X1, X2>
X0 X1 = A
X2 = B
1
1
1
1
1
-1
1
-1
-1
1
-1
1

Hidden layer
Node 1 (A ^ B output
(-0.8*1) + (0.5*1) + (-0.5*1) = -0.8 -1
(-0.8*1) + (0.5*1) + (-0.5*-1) = 0.2 1
(-0.8*1) + (0.5*-1) + (-0.5*-1) = -0.8 -1
(-0.8*1) + (0.5*-1) + (-0.5*1) = -1.8 -1

Node 2 ( A B output
(-0.8*1) + (-0.5*1) + (0.5*1) = -0.8 -1
(-0.8*1) + (-0.5*1) + (0.5*-1) = -1.8 -1
(-0.8*1) + (-0.5*-1) + (0.5*-1) = -0.8 -1
(-0.8*1) + (-0.5*-1) + (0.5*1) = 0.1 1

Table continues
Vector of input values < X0, X1, X2>
X0
X1 = Node 1
X2 = Node 2
1
-1
-1
1
1
-1
1
-1
-1
1
-1
1

(W0 * X0) + (W1* X1) + (W2*X2)

Output

(0.3*1) + (0.5*-1) + (0.5*-1) = -0.7


(0.3*1) + (0.5*1) + (0.5*-1) = 0.3
(0.3*1) + (0.5*-1) + (0.5*-1) = -0.7
(0.3*1) + (0.5*-1) + (0.5*1) = 0.3

-1
1
-1
1

4.8
Revise the BACKPROPAGATION algorithm in Table 4.2 so that it operates on units using the squashing
function than in place of the sigmoid function. That is, assume the output of a single unit o = tanh(
wx . Give the weight update rule for output layer weights and hidden layer weights.

The algorithm should be modified such that step (T4.3) becomes,


And step (T4.4) becomes,

(1 oh )

wkh *

(1 ok 2)(tk ok)

koutputs

4 Ensemble Methods
Description of the Code
My code implementing the bagging on top of my ID3 decision tree algorithm is contained in
ID3Bagging.java. It has a method called createDecisionTrees() which creates and stores an array of
decision trees (classifiers) as a field that use the given threshold. The method testData() uses these
trees to predict labels of the examples. It prints the prediction accuracy for 1, 2, 40 decision trees,

calling testDataSetNumSamples() which actually tests the data, taking the mean vote of a given
number of samplesdecision trees, and computes the accuracy. When the program is run, it prints the
training and test accuracies of 1, 2, ,40 samplings, once for each threshold value (1.0, 0.05, 0.01).

Number of Samples vs. Accuracy, Threshold = 1.0

Training data w/ sampling


Testing data w/ sampling
Without sampling

Number of Samples vs. Accuracy, Threshold = 0.05

Training data w/ sampling


Testing data w/ sampling
w/o sampling

Number of Samples vs. Accuracy, Threshold = 0.01

Training data w/ sampling


Testing data w/ sampling
w/o sampling

Is Bagging useful when p != 1.0 ? Explain.


Yes, bagging is useful when p != 1.0 since it clearly increases my accuracies, especially after the
number of samples increased enough. Even though having a p != 1.0 (and instead like 0.05 and 0.01)
prunes the tree, it does not compensate for all the overfitting. Therefore, bagging is still helpful since it
addresses this problem as well.

Potrebbero piacerti anche