Sei sulla pagina 1di 2

ELE 364: HW #8

1. A test is used to predict if a person has a certain disease. The table below shows the test predictions
on 20 subjects. Based on the table, compute the following evaluation measures.

ID Target Prediction ID Target Prediction


1 True True 11 False False
2 False False 12 True True
3 True True 13 True False
4 False True 14 True True
5 False True 15 False True
6 True True 16 True True
7 False False 17 True True
8 True False 18 False False
9 True True 19 True True
10 True True 20 False False

(a) A confusion matrix and classification accuracy.


(b) The average class accuracy using the arithmetic mean and harmonic mean.
(c) The precision, recall, and F1 measure.
(d) Another test whose confusion matrix is given below is also used to detect the same disease.
Which performance metrics would you use to compare the tests? Which test is better?
Target ↓ | Prediction → True False
True 9 1
False 5 5

2. The table below shows the predictions made for a continuous target feature by two different
prediction models for a test set.

ID Target Model 1 Prediction Model 2 Prediction


1 1.200 1.400 1.700
2 2.100 2.500 2.700
3 3.700 3.300 3.300
4 4.100 4.700 4.800
5 5.300 6.300 4.900
6 5.700 5.100 5.300
7 3.200 3.000 2.900
8 7.100 6.700 6.500
9 4.300 4.700 4.900
10 4.500 4.000 4.400

(a) Based on these predictions, calculate the sum of squared errors. Which model is better based
on this evaluation measure.
(b) Calculate the R2 measure. Which model is better based on the R2 measure.
(c) Based on the evaluation measures calculated, which model do you think is performing better
for this dataset?

3. A company develops two different models to predict the behavior of its stock in the market. The
tools predict whether the stock will rise or fall. They measure the true positive rate (TPR) and
false positive rate (FPR) based on the prediction scores and resulting predictions of the models.
The table below shows the TPR and FPR calculated at four threshold values. The threshold
values are different for each model.

Model 1 Model2
TPR 0.4 0.5 0.6 0.9 1 TPR 0.2 0.3 0.7 1 1
FPR 0.3 0.5 0.7 0.9 1 FPR 0.2 0.4 0.6 0.8 1

(a) Plot the ROC curve for each model. Assume the TPR value between two threshold values is
equal to the TPR value calculated at the higher threshold value. For example, for model 1,
TPR is 0.4 when FPR is 0.2 and TPR is 0.9 at FPR=0.8.
(b) Compute the area under the curve (AUC) for each model. Which model performs better?
(c) Compute the Gini coefficient for each model.

4. A news agency is using a model to predict the party affiliation of its subscribers. The table below
shows the prediction frequencies of the model at the time the model was built, for the month after
deployment, and for a month-long period one year after deployment.

Target Original Sample 1 Sample 2


Party A 300 160 350
Party B 400 180 200
Non-affiliated 200 120 300

(a) Draw the bar plots of these three sets of prediction frequencies. Does the model need to be
retrained at these points based on the frequency plots?
(b) Calculate the stability index for the periods of Sample 1 and Sample 2, and determine whether
the model should be retrained at these points. Does the change in the prediction distribution
indicate that the model does not work well anymore?

Potrebbero piacerti anche