Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Evgueni’s part
Question 1
1. As you probably know a learning
problem is said to be well-posed if and
only if the class of tasks T, the
performance measure P, and the
experience E are determined. Please
provide one learning problem described
in terms of T, P, and E. In addition,
provide a possible solution to the
problem.
A Possible Answer to Question 1
• Task T: To improve classification skills of
a medical doctor.
• Performance Measure P: the accuracy of
the doctor on new patient cases.
• Experience E: previous patient cases
considered by the doctor. Each case can
be: (1) positive if the doctor diagnosis was
correct, or (2) negative if the doctor
diagnosis was incorrect.
A Possible Answer to Question 1
• Experience E: previous patient cases
considered by the doctor. Each case can be: (1)
positive if the doctor diagnosis was correct, or
(2) negative if the doctor diagnosis was
incorrect.
No
Tree Learning
Y N
Validation set
A Possible Answer to Question 3
The fact that the growing set is large
implies that the decision tree is large and
the validation set is small. Thus the
accuracy of classification nodes on the
validation set is low. This implies that we
prune more. Thus, the final decision tree
will be small. In addition the final decision
tree will be inaccurate since the
validation set is unrepresentative.
Question 4
Consider the following data table, describing people,
where ‘class’ (0 or 1) is the class of the instances for
training a classifier.
hair location children size SIN class
brown ottawa 3 big ‘650786281’ 0
Predicted
True
pos neg
pos tpr fnr
neg fpr tnr
Answer to Question 5
The classifiers that lie on the diagonal (0,0)-(1,1) have the
property that tpr is equal to fpr. Since P = N, we have:
Ar
tpr P tnr N
(tnr 1 fpr )
PN
tpr P (1 fpr ) N
(tpr fpr )
PN
tpr P (1 tpr ) N
PN
tpr ( P N ) N
(P N )
PN
N
0 .5
PN
ROC Space
Question 6
Assume that we have an instance space described
by two discrete attributes and a training set
consisting of a single instance repeated 100 times.
In 80 of the 100 cases, the instance is labeled as
positive; in the other 20, it is labeled as negative.
What will be the posterior positive-class
probabilities that the Naïve Bayes classifier
provides for this instance, assuming that this
classifier has been trained using the 100-example
data set? Please explain your answer.
Answer to Question 6
• P(+) is 0.8; P(A1|+)=1.0; P(A2|+)=1.0
• P(-) is 0.2; P(A1|-)= 1.0; P(A2|-)= 1.0
Thus,
• P(+|A1,A2) =
= P(+) * P(A1,A2|+) / (P(A1,A2)
= P(+) * P(A1,A2|+) /
(P(A1,A2|+)*P(+)+ P(A1,A2|-)*P(-)) (total probs)
= P(+) * P(A1|+)*P(A2|+) /
(P(A1|+)*P(A2|+)* P(+) + P(A1|-)*P(A2|-)* P(-) )
= 0.8/(0.8+0.2)= 0.8