Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Home Assignment 5
The deadline for this assignment is 2 January 2018. You must submit your
individual solution electronically via the Absalon home page.
A solution consists of:
• A PDF file with detailed answers to the questions, which may include graphs
and tables if needed. Do not include your source code in the PDF file.
• A .zip file with all your solution source code (Matlab / R / Python scripts
/ C / C++ / Java / etc.) with comments about the major steps involved in
each question (see below). Source code must be submitted in the original
file format, not as PDF.
• IMPORTANT: Do NOT zip the PDF file, since zipped files cannot be
opened in the speed grader. Zipped pdf submissions will not be graded.
• Your PDF report should be self-sufficient. I.e., it should be possible to
grade it without opening the .zip file. We do not guarantee opening the
.zip file when grading.
• Your code should be structured such that there is one main file (or one main
file per question) that we can run to reproduce all the results presented in
your report. This main file can, if you like, call other files with functions,
classes, etc.
• Your code should include a README text file describing how to compile
and run your program, as well as a list of all relevant libraries needed for
compiling or using your code.
1
2. Bound the VC-dimension of H above (i.e., |H| = M ).
4. You can use Point 1.4 in Home Assignment 4 to bound mH (n), as well as
mH (2n). Show that for n ≥ 2 bounding mH (2n) directly leads to a tighter
result than bounding it via mH (n)2 using Point 3.
2 VC-dimension
1. Let H+ be the class of “positive” circles in R2 (each h ∈ H+ is defined by the
center of the circle c ∈ R2 and its radius r ∈ R; all points inside the circle
are labeled positively and outside negatively). Prove that dV C (H+ ) ≥ 3.
3. What kind of statement would you have to prove in order to show that
dV C (H+ ) ≤ 3? What kind of statement would you have to prove in order
to show that dV C (H) ≤ 4?
2
book (Abu-Mostafa et al., 2012). If you would like to cite other sources it is
advisable to double-check with us their trustworthiness.
In this question we use the following terminology. A high-probability bound is a
bound that holds with high probability for all hypotheses h in the relevant hy-
pothesis space. We call a bound trivial if the statement can be obtained without
making any calculations (for example, if a bound states that the probability of
some event is bounded by 2, then it is a trivial bound, because we know without
any calculations that probabilities are always bounded by 1).
The subpoints of this question progressively build on their predecessors. However,
the question is built in such a way that even if you get stuck at one point you
may still be able to proceed with subsequent points, so if it happens to you do
not give up too early.
3
4. Assume that you would S like to work with K different bandwidth parameters
γ1 , . . . , γK . Let H = K
i=1 Hγi . Derive a margin-based bound for learning
with H.
5. There are several ways of selecting γ. One way is to select γ that minimizes
the generalization bound you have derived in the previous point. Another
way is to set aside a validation set Sval and select γ that minimizes the
validation error L̂(w, Sval ). Assume that you have K different hypotheses
w1 , . . . , wK corresponding to K different values of γ and you pick the one
that minimizes the validation error. Lets denote it by w∗ and lets assume
that the size of the validation set Sval is m. Derive a generalization bound
for w∗ in terms of its validation error.
In which case(s) do we need to modify the bound, so that it will take into
account selection of C and in which case(s) we don’t? In other words: if
you would be required to rederive the bounds in Points 4 and 5 under the
assumption that you are tuning both γ and C, what bound(s) will remain
the same and what bound(s) will have an additional factor M appearing
somewhere inside the bound?
Please, explain your answer. Simple yes/no answers will not be accepted.
4 Random Forests
4.1 Normalization
As discussed, normalizing each component to zero mean and variance one (mea-
sured on the training set) is a common preprocessing step, which can remove
undesired biases due to different scaling. Using this normalization affects differ-
ent classification methods differently.
4
• Is nearest neighbor classification affected by this type of normalization? If
your answer is yes, give an example. If your answer is no, provide convincing
arguments why not.
References
Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin. Learning from data. AML-
book, 2012.