CMPT 419/726: Assignment 2

Assignment 2: Classification / Deep learning

1 Softmax for Multi-Class Classification

The softmax function is a multi-class generalization of the logistic sigmoid:

p(Ck|x) = exp(ak)

P

j

exp(aj )

(1)

Consider a case where the activation functions aj are linear functions of the input. Assume there

are 3 classes (C1, C2, C3), and the input is x = (x1, x2) ∈ R

2

• a1 = 3×1 + 1×2 + 1

• a2 = 1×1 + 3×2 + 2

• a3 = −3×1 + 1.5×2 + 2

The image below shows the 3 decision regions induced by these activation functions, their common

point intersection point (in green) and decision boundaries (in red).

Answer the following questions. For 2 and 3, you may provide qualitative answers (i.e. no need to

analyze limits).

1. (3 marks) What are the probabilities p(Ck|x) at the green point?

2. (3 marks) What happens to the probabilities along each of the red lines? What happens as

we move along a red line (away from the green point)?

3. (4 marks) What happens to the probabilities as we move far away from the intersection point,

staying in the middle of one region?

2

CMPT 419/726: Assignment 2 (Spring 2020) Instructor: Mo Chen

2 Error Backpropagation

We will derive error derivatives using back-propagation on the network below.

Notation: Please use notation following the examples of names for weights given in the figure.

For activations/outputs, the red node would have activation a

(1)

2 = w

(1)

21 x1 + w

(1)

22 x2 + w

(1)

23 x3 and

output z

(1)

2 = h(a

(1)

2

).

Activation functions: Assume the activation functions h(·) for the hidden layers are logistics. For

the final output node assume the activation function is an identity function h(a) = a.

Error function: Assume this network is doing regression, trained using the standard squared error

so that En(w) = 1

2

(y(xn, w) − tn)

2

.

input output

x1

x2

x3

w(1)

11 w(2)

11

w(3)

11

Consider the output layer.

• Calculate ∂En(w)

∂a(3)

1

. Note that a

(3)

1

is the activation of the output node, and that ∂En(w)

∂a(3)

1

≡ δ

(3)

1

.

• Use this result to calculate ∂En(w)

∂w(3)

12

.

Next, consider the penultimate layer of nodes.

• Write an expression for ∂En(w)

∂a(2)

1

. Use δ

(3)

1

in this expression.

• Use this result to calculate ∂En(w)

∂w(2)

11

.

Finally, consider the weights connecting from the inputs.

• Write an expression for ∂En(w)

∂a(1)

1

. Use the set of δ

(2)

k

in this expression.

• Use this result to calculate ∂En(w)

∂w(1)

11

.

3

CMPT 419/726: Assignment 2 (Spring 2020) Instructor: Mo Chen

3 Logistic Regression

In this question you will examine optimization for logistic regression.

1. Download the assignment 2 code and data from the website. Run the script logistic regression.py

in the P3 directory. This code performs gradient descent to find w which minimizes negative

log-likelihood (i.e. maximizes likelihood).

Include the final output of Figures 2 and 3 (plot of separator path in slope-intercept space;

plot of neg. log likelihood over epochs) in your report.

Why are these plots oscillating? Briefly explain why in your report.

2. Create a Python script logistic regression mod.py for the following.

Modify logistic regression.py to run gradient descent with the learning rates η =

0.5, 0.3, 0.1, 0.05, 0.01.

Include in your report a single plot comparing negative log-likelihood versus epoch for these

different learning rates.

Compare these results. What are the relative advantages of the different rates?

3. Create a Python script logistic regression sgd.py for the following.

Modify this code to do stochastic gradient descent. Use the parameters

η = 0.5, 0.3, 0.1, 0.05, 0.01.

Include in your report a new plot comparing negative log-likelihood versus iteration using

stochastic gradient descent.

Is stochastic gradient descent faster than gradient descent? Explain using your plots.

4

CMPT 419/726: Assignment 2 (Spring 2020) Instructor: Mo Chen

4 Fine-Tuning a Pre-Trained Network

In this question you will experiment with fine-tuning a pre-trained network. This is a standard

workflow in adapting existing deep networks to a new task.

We will utilize PyTorch (https://pytorch.org) a machine learning library for python.

The provided code builds upon ResNet 50, a state of the art deep network for image classification.

ResNet 50 has been designed for ImageNet image classification with 1000 output classes.

The ResNet 50 model has been adapted to solve a (simpler) different task, classifying an image as

one of 10 classes on CIFAR10 dataset.

The code imagenet finetune.py does the following:

• Constructs a deep network. This network starts with ResNet 50 up to its average pooling

layer. Then, a small network with 32 hidden nodes then 10 output nodes (dense connections)

is added on top.

• Initializes the weights of the ResNet 50 portion with the parameters from training on ImageNet.

• Performs training on only the new layers using CIFAR10 dataset – all other weights are fixed

to their values learned on ImageNet.

The code and data can be found on the course website. For convenience, Anaconda (https:

//www.anaconda.com) environment config files with the latest stable release of PyTorch and

torchvision are provided for Python 2.7 and Python 3.6 for Linux and macOs users. You can use

one of the config files to create virtual environments and test your code. To set up the virtual

environment, install Anaconda and run the following command

conda env create -f CONFIG_FILE.

Replace CONFIG FILE with the path to the config files you downloaded. To activate the virtual

environment, run the following command

source activate ENV_NAME

Replacing ENV NAME with cmpt419-pytorch-python27 or cmpt419-pytorch-python36

depending on your Python version.

Windows users please follow the instructions on PyTorch website (https://pytorch.org)

to install manually. PyTorch only supports Python3 on Windows!

If you wish to download and install PyTorch by yourself, you will need PyTorch (v 0.4.1), torchvision (v 0.2.1), and their dependencies.

What to do:

Start by running the code provided. It will be *very* slow to train since the code runs on a CPU.

You can try figuring out how to change the code to train on a GPU if you have a good GPU and

want to accelerate training. Try to do one of the following tasks:

5

CMPT 419/726: Assignment 2 (Spring 2020) Instructor: Mo Chen

• Write a Python function to be used at the end of training that generates HTML output showing each test image and its classification scores. You could produce an HTML table output

for example.

• Run validation of the model every few training epochs on validation or test set of the dataset

and save the model with the best validation error.

• Try applying L2 regularization to the coefficients in the small networks we added.

• Try running this code on one of the datasets in torchvision.datasets (https://pytorch.

org/docs/stable/torchvision/datasets.html) except CIFAR100. You may

need to change some layers in the network. Try creating a custom dataloader that loads data

from your own dataset and run the code using your dataloader. (Hints: Your own dataset

should not come from torchvision.datasets. A standard approach is to implement your own

torch.utils.data.Dataset and wrap it with torch.utils.data.DataLoader)

• Try modifying the structure of the new layers that were added on top of ResNet 50.

• Try adding data augmentation for the training data using torchvision.transforms and then implementing your custom image transformation methods not available in torchvision.transforms,

like gaussian blur.

• The current code is inefficient because it recomputes the output of ResNet 50 every time a

training/validation example is seen, even though those layers aren’t being trained. Change

this by saving the output of ResNet 50 and using these as input rather than the dataloader

currently used.

• The current code does not train the layers in ResNet 50. After training the new layers for

a while (until good values have been obtained), turn on training for the ResNet 50 layers to

see if better performance can be achieved.

Put your code and a readme file for Problem 4 under a separate directory named P4 in the code.zip

file you submit for this assignment. The readme file should describe what you implemented for

this problem and what each one of your code files does. It should also include the command to run

your code. If you have any figures or tables to show, put them in your report for this assignment

and mention them in your readme file.

6

Submitting Your Assignment

The assignment must be submitted online at https://courses.cs.sfu.ca. You must submit three files:

1. An assignment report in PDF format, called report.pdf. This report must contain the

solutions to questions 1 and 2 as well as the figures / explanations requested for 3 and 4.

(please take screenshots from your entire screen for the figures requested for questions 3 and

4.)

2. A .zip file of all your code, called code.zip.

7