ELEC/COMP 447/546

Assignment 4

Introduction

1.1 Basics of Autograd (5 points)

a. In the provided notebook, fill in the function sin_taylor() with code to

approximate the value of the sine function using the Taylor approximation

(defined here). You can use numpy.math.factorial()to help you.

b. Create a tensor x with value 𝜋/4 . Create a new tensor y = sin_taylor(x).

Use y.backward() to evaluate the gradient of y at x. Is this value a close

approximation to the exact derivative of sine at x?

c. Now, create a NumPy array x_npy of 100 random numbers drawn uniformly

from [−𝜋, 𝜋] (use np.random.uniform). Create a tensor x from that array and

place the tensor onto the GPU. Again, evaluate y = sin_taylor(x). This

time, y is a vector. If you run y.backward(), it will throw an error because

autograd is meant to evaluate the derivative of a scalar output with respect to

input vectors (see tutorial pages above). Instead, run either one of these two

lines (they do the same thing):

y.sum().backward()

y.backward(gradient=torch.ones(100))

What is happening here? We are creating a ‘dummy’ scalar output (let’s call it z),

which contains the sum of values in y, and acts as the final scalar output of our

computation graph. Due to the chain rule of differentiation, dz/dx will yield the

same value as dy/dx.

d. Get the gradient tensor dz/dx and convert that tensor to a Numpy array. Plot

dz/dx vs. x_npy, overlaid on a cosine curve. Confirm that the points fall on the

curve and put this plot in your report.

1.2 Image Denoising (5 points)

In this problem, you will denoise this noisy parrot image, which we denote I. To do so,

you will create a denoising loss function, and use autograd to optimize the pixels of a

new image J, which will be a denoised version of I.

a. In your Colab notebook, implement denoising_loss() to compute the

following loss function:

𝑙𝑜𝑠𝑠 = ‖𝐼 − 𝐽‖1 + 𝛼 (‖

𝑑𝐽

𝑑𝑥 ‖

1

+ ‖

𝑑𝐽

𝑑𝑦 ‖

1

)

The first component is a data term making sure that the predicted image J is not

too far from the original image I. The second term is a regularizer which will

reward J if it is smoother, quantified using J’s spatial derivatives. We have

provided you a function get_spatial_gradients() to compute the gradients.

b. Implement gradient descent to optimize the pixels of J using your loss function

and autograd. Initialize J to be a copy of I. Try different values for the learning

rate and 𝛼 and find a combination that does a good job. Put the smoothed image

J, along with the learning rate and 𝛼 you used in your report.

c. ELEC/COMP 546 Only: Change the loss function to use L2 norms instead of L1.

Does it work better or worse? Why?

Hints (you should use all of these in your solution):

• torch.clone: performs a deep copy of a tensor

• To require storing gradients for a tensor x, use: x.requires_grad_(True)

• In the code, you will see the statement with torch.no_grad():. Any

statements written within that block will not update the computation graph. Put

your gradient descent step within that block, since that operation should not

update the graph.

• Remember to zero out the gradient buffer of J after each step using

J.grad.zero_().

• Remember to normalize the gradient to a unit vector before using it in your

gradient descent step.

• To plot an image in tensor J using matplotlib, you will have to first detach it from

the computation graph (to not track its gradients), move it from the GPU to the

CPU, and convert to a NumPy array. You can do this in one line with:

J = J.detach().cpu().numpy().

2.0 Training an image classifier (10 points)

In this problem, you will create and train your first neural network image classifier!

Before starting this question, please read the following pages about training neural

networks in PyTorch:

1. Data loading

2. Models

3. Training loop

We will be using the CIFAR10 dataset, consisting of 60,000 images of 10 common

classes. Each image is of size 32 x 32 x 3. Download the full dataset as one .npz file

here, and add it to your Google Drive. This file contains three objects: X: array of

images, y: array of labels (specified as integers in [0,9]), and label_names: list of class

names. Please complete the following:

a. Finish implementing the CIFARDataset class. See comments in the code for

further instructions.

b. Add transforms: RandomHorizontalFlip, RandomAffine ([-5, 5] degree

range, [0.8, 1.2] scale range) and ColorJitter ([0.8, 1.2] brightness range,

[0.8, 1.2] saturation range). Don’t forget to apply the ToTensor transform first,

which converts a H x W x 3 image to a 3 x H x W tensor, and normalizes the

pixel range to [0,1]. You will find the transform APIs in this page.

c. Implement a CNN classifier with the structure in the following table. You will find

the APIs for Conv, Linear, ReLU, and MaxPool in this page. The spatial

dimensions of an image should NOT change after a Conv operation (only after

Maxpooling).

Layer Output channels for Conv/output neurons for Linear

3 x 3 Conv + ReLU 50

MaxPool (2 x 2) X

3 x 3 Conv + ReLU 100

MaxPool (2 x 2) X

3 x 3 Conv + ReLU 100

MaxPool (2 x 2) X

3 x 3 Conv + ReLU 100

Linear + ReLU 100

Linear 10

d. Implement the training loop.

e. Train your classifier for 15 epochs. The GPU, if accessible, will result in faster

training. Make sure to save a model checkpoint at the end of each epoch, as you

will use them in part f. Use the following training settings: batch size = 64,

optimizer = Adam, learning rate = 1e-4.

f. Compute validation loss per epoch and plot it. Which model will you choose and

why?

g. Run the best model on your test set and report:

i. Overall accuracy (# of examples correctly classified / # of examples)

ii. Accuracy per class

iii. Confusion matrix: A 10 x 10 table, where the cell at row i and column j

reports the fraction of times an example of class i was labeled by your

model as class j. Please label the rows/columns by the object class name,

not indices.

iv. For the class on which your model has the worst accuracy (part ii), what is

the other class it is most confused with? Show 5-10 test images that your

model confused between these classes and comment on what factors

may have caused the poor performance.

h. ELEC/COMP 546 Only: Change the last two Conv blocks in the architecture to

Residual blocks and report overall accuracy of the best model. Recall that a

residual block has the form:

Submission Instructions

All code must be written using Google Colab (see course website). Every student must

submit a zip file for this assignment in Canvas with 2 items:

1. An organized report submitted as a PDF document. The report should contain all

image results (intermediate and final), and answer any questions asked in this

document. It should also contain any issues (problems encountered, surprises)

you may have found as you solved the problems. Please add a caption for

every image specifying what problem number it is addressing and what it is

showing. The heading of the PDF file should contain:

a. Your name and Net ID.

b. Names of anyone you collaborated with on this assignment.

c. A link to your Colab notebook (remember to change permissions on your

notebook to allow viewers).

## Reviews

There are no reviews yet.