Exercise II AMTH/CPSC 663b

5/5 - (2 votes)

Exercise II
AMTH/CPSC 663b
Problem 1
i
1. Provide a geometric interpretation of gradient descent in the one-dimensional case. (Adapted from the
Nielsen book, chapter 1)
The current problem set requires a working installation of PyTorch (vl.4), torchvision (v0.5), and matplotlib (v3.1).
Any formatting that allows the TAs to quickly determine which part
of the problem the code is used for is fine. This means a Jupyter notebook
with headings is allowed, as long as both the *.ipynb file and the PDF from
Jupyter are submitted.
Published: Tuesday, February 22, 2022
Due: Monday, March 7, 2022 – 11:59 PM
Programming assignments should use built-in functions in Python
and PyTorch; In general, you may use the scipy stack [1]; however,
exercises are designed to emphasize the nuances of machine learning and
deep learning algorithms – if a function exists that trivially solves an entire
problem, please consult with the TA before using it.
2. An extreme version of gradient descent is to use a mini-batch size ofjust 1. This procedure is known as
online or incremental learning. In online learning, a neural network learns from just one training input
at a time (just as human beings do). Name one advantage and one disadvantage of online learning
compared to stochastic gradient descent with a mini-batch size of, say, 20. (Adapted from the Nielsen
book, chapter 1)
Compress your solutions into a single zip file titled <lastname and
initials>_ assignment2.zip, e.g. for a student named Tom Marvolo Riddle, riddletm assignment2.zip. Include a single PDF titled
<lastname and initials>_assignment2.pdf and any Python scripts
specified. Any requested plots should be sufficiently labeled for full points.
Problem 2
i1 o1
.05 .01
‘.25 w3’
.50 wA,
o2 | .’.81
.10 .99
b1 35 b2
1 1
Figure 1: Simple neural network with initial weights and biases.
Problem 3
2
1. Backpropagation with a single modified neuron (Nielsen book, chapter 2)
Suppose we modify a single neuron in a feedforward network so that the output from the neuron is
given by f(£>j wjxj + b)> where fis some function other than the sigmoid. How should we modify the
backpropagation algorithm in this case?
1. It can be difficult at first to remember the respective roles of the ys and the as for cross-entropy. It’s easy
to get confused about whether the right form is — [y In a + (1 — y) In (1 — a)] or — [a In y + (1 — a) In (1 — y)].
What happens to the second of these expressions when y = 0 or 1? Does this problem afflict the first
expression? Why or why not? (Nielsen book, chapter 3)
3. Backpropagation with linear neurons (Nielsen book, chapter 2)
Suppose we replace the usual non-linear σ function {sigmoid) with σ(ζ) — z tliroughout the network.
Rewrite the backpropagation algorithm for this case.
15 w11 /7?Y
|.2O w2 ,Z .45 w6 |
V V
2. Backpropagation with softmax and the log-likelihood cost (Nielsen book, chapter 3)
To apply the backpropagation algorithm for a network containing sigmoid layers to a network with a
softmax layer, we need to figure out an expression for the error = dC/dz^ in the final layer. Show
that a suitable expression is: δ’- = a1· — yj
Problem 4
3
• Finish the provided script to train the model with each of the above loss functions.
• Create a plot of the training accuracy vs epoch for each loss function (2 lines, 1 plot)
• Create a plot of the test accuracy vs epoch for each loss function (2 lines, 1 plot)
3. Given the network in Figure 1, calculate the derivatives of the cost with respect to the weights and
the biases and the backpropagation error equations (i.e. δ1 for each layer I) for the first iteration using
the cross-entropy cost function. Please use sigmoid activation function on hl, h2, ol, and o2. Initial
weights are colored in red, initial biases are colored in orange, the training inputs and desired outputs
are in blue. This problem aims to optimize the weights and biases through backpropagation to make
the network output the desired results.
2. Using the same set-up from prob 4.1, let’s now add regularization to the previous network. For the
following experiments, you may use the best performing loss function from prob 4.1. Generalization
gap below refers to the (train accuracy – test accuracy).
1. Download the python template prob4. py and read through the code which implements a neural network
with PyTorch based on MNIST data. Within the provided python file is a basic scaffold of a model
training workflow. You may use the existing functions in the script for both 4.1 and 4.2. Compare
the squared loss and cross entropy loss. To do this,
Which loss function converges fastest? Which achieves the highest test accuracy? Provide some rational
as to the observed differences.
• Implement LI regularization and train the model using λ € {0.001,0.005}. Create a plot of the
train accuracy, test accuracy, and generalization gap vs epoch for each A (3 plots, 2 lines each).
• Implement L2 regularization and train the model using Λ € {0.001,0.01,0.1}. Create a plot of the
train accuracy, test accuracy, and generalization gap vs epoch for each A (3 plots, 3 lines each).
• Apply dropout to both hidden layers and train the model using p € {0.05,0.1,0.5}. Hint: To
implement dropout, you can use a special type of PyTorch layer included in torch.nn [2]. Create
a plot of the train accuracy, test accuracy, and generalization gap vs epoch for each p value (3
plots, 3 lines each)
• Using the loss data you’ve collected so far, create a plot of the test accuracy vs epoch for each of
the experiments performed for prob 4.2. (8 lines, 1 plot)
Are the final results sensitive to each parameter? Is there any regularization method which performs
best?
2. Show that the cross-entropy is still minimized when σ(ζ) = y for all training inputs (i.e. even when y G
(0,1)). When this is the case the cross-entropy has the value: C = — Iny + (1 — y) In (1 — y)]
(Nielsen book, chapter 3)
Bonus
1. Where does the softmax name come from? (Nielsen book, chapter 3)
Optional
2. Show that δ1 = ((wz+1)T<5z+1) © σ'(ζζ) can be rewritten as 6‘
References
scipy.org/stackspec.html
[2] “Pytorch
4
[1] “The scipy stack specification^.” [Online]. Available: https://www.
nn module docs^[.” [Online]. Available: https://pytorch.org/docs/stable/nn.html
= E'(zz)(wz+1)T<5i+1.
1. Alternate presentation of the equations of backpropagation (Nielsen book, chapter 2)
Show that 6L — VaC © σ'(ζ£) can be written as 6L — ^2\zL)S7aC, where Σ'(ζ£) is a square matrix
whose diagonal entries are the values σ'(ζ^) and whose off-diagonal entries are zero.
3. By combining the results from 1 and 2, show that
δ1 = Σ'(ζ’)(™’+Τ… Σ'(ζ£-1)(«>ί’)τΣ'(ζί’)ναα

Exercise II AMTH/CPSC 663b

Share this:

Related

Related products

Exercise IV AMTH/CPSC 663b

Exercise III AMTH/CPSC 663b

Assignment 1 CPSC/AMTH/CBB 663