Inductive Learning Answers

Assignment 1
Inductive Learning
Team members:
Sunny Bangale - shb170230

Rupali Sahay - rxs173730
Inductive Learning
1. Consider the problem of gradient descent that was discussed in class - you would
like to predict the number of A grades that a student in the second year of the
M.S. program receives (y) based on the number of A grades that the student
received in the first year of the M.S. program (x). You propose a hypothesis of the
form h(x) = θ0 + θ1x, where θ0 and θ1 are parameters that you want to find. The
data is presented below:
x Y
3 2
1 2
0 1
4 3
You start with an initial choice of parameters as: θ0 = 0 and θ1 = 1. You can assume
that the error function is:
1 Σ
i=m
J= (hθ (x(i) ) − y (i) ) 2
2m
i=1
where m is the number of training examples. Run at most 5 rounds of the gradient
descent algorithm discussed in class. Does your error go down after 5 rounds?
Show all the steps of your calculation.
Ans.
Please turn over
3 of 3
3 of 3
3 of 3
3 of 3
3 of 3
3 of 3
3 of 3
2. Suppose there is a testing machine for a disease that can identify the disease in
80% of the cases, and also in 90% of the cases it is able to correctly predict those
who do not have the disease.
Identify the values of False Positive and False Negative in percent
Ans.
A testing machine for a disease identifies diseases in 80% of the cases.
Let’s consider that 100 people have diseases.
According to the first statement,

80 people were detected to have the disease when they actually had a disease
Thus, True Positive = 80%
The remaining 20 were not detected with the disease, although they actually had the disease
Thus, False Negative = 20%
The testing machine for a disease is able to predict in 90% cases for those who do not have the disease.
Let’s consider that 100 people do not have diseases.
According to the second statement,

90 people were not detected with the disease, when they did not have any disease
Thus, True Negative = 90%
The remaining 10 people were detected with the disease, although they did not have the disease.
Thus, False Positive = 10%
3. What are the pros and cons of the following

a. Selecting the most specific hypothesis (S) based on a training data.
b. Selecting the most general hypothesis (G) based on a training data.
Ans.
The pros and cons of selecting the most specific hypothesis (S) based on the training data,
Pros:
1. Specific hypothesis is consistent with the training data.
2. In case there are multiple hypotheses consistent with the training examples, FIND-S will find the
most specific.
3. The most specific hypothesis is very precise.
4. It consists of all positive examples in the hypotheses space.
5. Training examples are consistent in the most specific hypothesis.
Cons:
1. It has no way to determine whether it has found the only hypothesis in H consistent with the data
or whether there are many other consistent hypotheses as well.
2. Learning algorithm could be preferred over specific algorithm because at least we can characterize
its uncertainty regarding the true identity of the target concept.
3. The most specific hypothesis has no scope for generalization.
4. It does not consider any other training examples than just positive cases.
The pros and cons of selecting the most general hypothesis (G) based on the training data,
Pros:
1. General hypothesis is consistent with the data.
2. The most general hypothesis has a lot of scope for generalization
3. It can’t miss out on any possible hypotheses
4. It consists of all assumption of positive and negative examples in the hypotheses space.
3 of 3
Cons:
1. It may give out false positive result.
1. It is very general and might affect the outcome.
2. It can lead to overfitting as it generalizes the hypotheses to accommodate the training data
3. There’s lot of room for error considering the large amount of data points as it tries to accommodate
all the data given.
4. What is a consistent hypothesis and version space?
A concept where all the maximally general members evaluate to a general boundary (G) and all the
maximally specific members to a specific boundary (S) is called as Consistent Hypothesis.
Consistent means that there’s an agreement with the observed data.
A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each
example (x, c(x)) in D.
Consistent (h, D) = ¥ (x, c(x)) € D :: h(x) = c(x)
Version space is a hierarchical representation of all the hypotheses that are more general than the
specific hypothesis (S) and less general than the general hypothesis (G).
The version space, denoted VS HD with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with the training examples in D.
VSHD = { h € H | Consistent (h, D)}
5. The most general hypothesis has (don’t care) ? value for each attribute.
6. Consider the ML task of finding an approximation to the job finding problem for
UTD students i.e. the function f : X → Y where X is the set of attributes defined
below and Y is a Boolean output.
X = (x1, x2, x3, x4) such that
x1 is a boolean indicating whether GPA ≥ 3.5
x2 is a boolean indicating whether student has taken CS 6375
x3 is a boolean indicating whether student has taken CS 6350
x4 is a boolean indicating whether student has taken Years of Work Experience > 2
For each attribute, there can be three possible choices - 1, 0, or ? (don’t care).
a. How many instances| i.e.
| X are possible?
b. How many labeling of these instances are possible? (Remember it’s binary
classification problem and each labeling represents a possible hypothesis)
c. If you would like to limit the classifier to a decision tree of depth 2, how many
hypotheses are possible?
Hint: First choose 2 attributes out of 4, create decision trees out of them, and find
possible ways of labeling
Ans.
a) Number of instances possible |X| = 34 = 81
3 of 3
b) Number of labeling possible = 2^34 =281
c) For a depth = 2 decision tree, the number of leaf nodes for 2 possible outcomes (0,1) would be 4
Thus, attribute can be chosen in 4C2 ways for a decision tree of depth=2
Therefore, total number of hypotheses = 4C2 * 4 = 24
7. Apply the Find-S algorithm on the following dataset for UTD students. There are 5
attributes
xGPA is a boolean indicating whether GPA > 3.5
xWorkEx is a boolean indicating whether Years of Work
Experience>2
xCS6375 is a boolean indicating whether student has taken CS
6375
xCS6350 is a boolean indicating whether student has taken CS

6350
xJava is a boolean indicating whether student has taken advanced Java skills
You are given the following dataset along with the class variable i.e. outcome variable
where 1 indicates student got internship and 0 means student didn’t get it. Each data
point is in the form
((xGPA, xWorkEx, xCS6375, xCS6350, xJava ), outcome)
((1, 1, 0, 1, 1), 1)
((0, 1, 0, 1, 1), 0)
((1, 1, 1, 1, 0), 1)
((0, 0, 0, 1, 1), 0)
((1, 1, 1, 1, 1), 1)
Ans. Let x be the instances and h be the hypothesis
Step 1:
X1 = (1, 1, 0, 1, 1) and outcome = 1
H1 = (1, 1, 0, 1, 1)
Step 2:
X2 = (0, 1, 0, 1, 1) and outcome = 0
H2 = (1, 1, 0, 1, 1)
Step 3:
X3 = (1, 1, 1, 1, 0) and outcome = 1
H3 = (1, 1, ?, 1, ?)
Step 4:
X4 = (0, 0, 0, 1, 1 ) and outcome = 0
H4 = (1, 1, ?, 1, ?)
Step 5:
X5 = (1, 1, 1, 1, 1 ) and outcome = 1
H5 = (1, 1, ?, 1, ?)
Thus, the final hypothesis as a result of applying the Find-S algorithm is, (1, 1, ?, 1, ?) which
3 of 3
implies the following,
If xGPA =1 ie, GPA > 3.5
And,
If xWorkEx =1 ie, Years of Work Experience > 2
And,
If xCS6350 =1 ie, student has taken CS 6350
Then, the student will to get an internship.
8. Consider the decision tree shown below. There are two splitting attributes GPA
and years of work experience. The class labels are shown below the leaf nodes.
Write the final hypothesis shown by this decision tree in the form of Disjunctive
Normal Form (DNF)
Figure 1: The labels near the leaf nodes represent class attribute
i.e. outcome
Ans. ( ( GPA < 3.5 ) ^ ( Exp >= 3 ) ) v ( ( GPA >= 3.5) ^ ( Exp >= 1) )
3 of 3
9. Solve question 2.4 from Tom Mitchell’s book
3 of 3
3 of 3
10. Solve question 2.5 from Tom Mitchell’s book
3 of 3
3 of 3
3 of 3

Inductive Learning Answers

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Inductive Learning Answers

Caricato da

Copyright:

Formati disponibili

Assignment 1

Sunny Bangale - shb170230

Please turn over

According to the first statement,

According to the second statement,

3. What are the pros and cons of the following

4. What is a consistent hypothesis and version space?

Consistent (h, D) = ¥ (x, c(x)) € D :: h(x) = c(x)

VSHD = { h € H | Consistent (h, D)}

xCS6350 is a boolean indicating whether student has taken CS

Ans. Let x be the instances and h be the hypothesis

Potrebbero piacerti anche