## Description

Machine Learning Foundations

Homework #3

Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions

(without the source code) for all problems.

For problems marked with (*), please follow the guidelines on the course website and upload your source

code to designated places. You are encouraged to (but not required to) include a README to help the

TAs check your source code. Any programming language/platform is allowed.

Discussions on course materials and homework solutions are encouraged. But you should write the final

solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but

not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework

solutions and/or source codes to your classmates at any time.

You should write your solutions in English or Chinese with the common math notations introduced in

class or in the problems. We do not accept solutions written in any other languages.

This homework set comes with 200 points and 20 bonus points. In general, every homework set would come with a full credit of 200 points, with some possible bonus points.

1. (80 points) Go register for the Coursera version of the second part of the class ( https://www.

coursera.org/teach/ntumlone-algorithmicfoundations/ ) and solve its homework 3. The

registration should be totally free. Then, record the highest score that you get within up to 3

trials. Please print out a snapshot of your score as an evidence. (Hint: The problems below are

simple extensions of the Coursera problems.)

2. (40 points) When using SGD on the following error function and ‘ignoring’ some singular points

that are not differentiable, prove or disprove that err(w) = max(0, −ywT x) results in PLA.

3. (40 points) Write down the derivation steps of Question 17 of Homework 3 on Coursera.

4. (20 points, *) For Questions 19 and 20 of Homework 3 on Coursera, plot a figure that shows Ein(wt)

as a function of t for both the gradient descent version and the stochastic gradient descent version

on the same figure. Describe your findings. Please print out the figure for grading.

5. (20 points, *) For Questions 19 and 20 of Homework 3 on Coursera, plot a figure that shows Eout(wt)

as a function of t for both the gradient descent version and the stochastic gradient descent version

on the same figure. Describe your findings. Please print out the figure for grading.

Bonus: Smart ‘Cheating’

6. (Bonus 20 points) For a regression problem, the root-mean-square-error (RMSE) of a hypothesis h

on a test set {(xn, yn)}

N

n=1 is defined as

RMSE(h) =

vuut

1

N

X

N

n=1

(yn − h(xn))2

.

1 of 2

Machine Learning Foundations (NTU, Fall 2018) instructor: Hsuan-Tien Lin

Please consider a case of knowing all the (test) xn, none of the yn, but allowed to query RMSE(h)

for some h.

For any given set of hypotheses {h1, h2, · · · , hK}. Assume that every RMSE(hk) = ek has been

queried and thus known. Also, assume that RMSE(h0) = e0 is known where h0(x) = 0 for any x.

Let H(x) = PK

k=1 wkhk(x). Derive a closed-form solution from

min

w1,w2,··· ,wK

RMSE (H)

from e0, e1, . . . , eK, h0, h1, . . . , hK, and x1, x2, . . . , xN . (The solution can be used to optimize the

“test” performance that aggregates all hk’s together, which is a common trick used in data mining

competitions to fight on the leaderboard.)

2 of 2