APM 598: Homework 1

Ex 1. [Overfitting]

A dataset {xi

, yi}i=1…N has been generated from an unknown polynomial P∗(x) (see

figure 1 and code page 3 to load the data ’data_HW1_ex1.csv’ in Python), i.e.

yi = P∗(xi) + εi

where εi

is some noise’. The goal is to estimate the polynomial P∗ and in its degree k∗

(i.e. k∗ = deg(P∗)).

Figure 1: Data points {xi

, yi}i=1..N generated from a polynomial P∗.

a) For each k = 0 . . . 12, estimate the polynomial Pˆ

k of order k that minimizes the

mean-square-error (use polyfit):

Pˆ

k = argmin

P, deg(P)≤k

ℓ(P) with ℓ(P) = 1

N

X

N

i=1

|yi − P(xi)|

2

.

Plot the loss ℓ(Pˆ

k) as a function of k (i.e. plot the points {k, ℓ(Pˆ

k)}k=0…12).

b) Split the data-set {xi

, yi}i=1…N into training and test sets using (respectively) 80%

and 20% of the dataset. We define two loss functions:

ℓtrain(P) = 1

Ntrain

X

i in train set

|yi − P(xi)|

2

(1)

ℓtest(P) = 1

Ntest

X

i in test set

|yi − P(xi)|

2

(2)

1

where Ntrain and Ntext denote the number of elements in (respectively) the train

and test sets (Ntrain + Ntext = N).

For each k = 0 . . . 12, estimate the polynomial ePk of order k that minimizes the

mean-square-error over the training set:

ePk = argmin

P, deg(P)≤k

ℓtrain(P).

Plot the values of both loss functions ℓtrain and ℓtest at ePk (i.e. plot the points

{k, ℓtrain(Pˆ

k)}k=0…12 and {k, ℓtest(Pˆ

k)}k=0…12).

c) Use ℓtest to guess the order k∗ of the unknown polynomial P∗ and give an estimate

of its coefficients.

Ex 2. [Gradient descent]

The goal is to implement and test gradient descent methods. We use linear regression

as a toy problem using the data set {xi

, yi}i=1..N from Ex 1 and we would like to minimize

the following loss function:

ℓ(a, b) = 1

N

X

N

i=1

|yi − (a + bxi)|

2

.

a) Compute the gradient of the loss function ∇ℓ = (∂aℓ, ∂bℓ).

Deduce (numerically) the minimum (a∗, b∗) of ℓ.

b) Starting from (a0, b0) = (1.8, 1.4) and using the constant learning rate η = .05,

implement the gradient descent algorithm:

an+1

bn+1 !

=

an

bn

!

− η∇ℓ(an, bn).

Estimate the rate of convergence (i.e. how fast does ∥(an, bn) − (a∗, b∗)∥ converge

to zero?).

c) Implement the momentum and Nesterov algorithm, denoting θn = (an, bn),

momentum (

vn+1 = γvn + η∇ℓ(θn)

θn+1 = θn − vn+1

Nesterov (

vn+1 = γvn + η∇ℓ(θn − γvn)

θn+1 = θn − vn+1

Estimate the convergence rate for γ = .9 and η = .05.

extra) Test other gradient descent methods (e.g. Adam, AdaGrad, RMSProp) using class

defined in either Pytorch or TensorFlow.

2

Ex 3. [Classification]

The goal is to implement a linear classifier for the data-set MNIST-fashion (see also

code page 3).

a) Adapt the linear classifier for the MNIST data-set to the new data-set.

b) Train the model using 40 epochs with learning rate η = 10−4 and the gradient

descent method of your choices.

Plot the evolution of the loss at each epoch.

c) Draw the ’template’ of each class.

##############################################

## Exercise 1 (Python) ##

##############################################

import matplotlib.pyplot as plt

import numpy as np

# get the data

data = np.loadtxt(‘data_HW1_ex1.csv’,delimiter=’,’)

x,y = data[:,0],data[:,1]

# plot

plt.figure(1)

plt.plot(x,y,’o’)

plt.xlabel(r’$x$’)

plt.ylabel(r’$y$’)

plt.show()

##############################################

## Exercise 3 (Python) ##

##############################################

from torchvision import datasets

import matplotlib.pyplot as plt

# Download and load

data_collection = datasets.FashionMNIST(‘data_fashionMNIST’, train=True, download=True)

# Illustration

label_fashion = dict([(0,’T-shirt’),(1,’trouser’),(2,’pullover’),(3,’dress’),(4,’coat’),

(5,’sandal’),(6,’shirt’),(7,’sneaker’),(8,’bag’),(9,’boot’)])

X,y = data_collection.__getitem__(123)

plt.figure(1);plt.clf()

plt.imshow(X)

plt.title(“Example of image with label “+label_fashion[y.item()])

plt.show()

3

## Reviews

There are no reviews yet.