Assignment 1

CPSC/AMTH/CBB 663

Please include all of your written answers and figures in a single PDF,

titled <lastname and initals>_assignmentl.pdf. Put this and all other

relevant files (most notably, your code) into a folder called <lastname

and initials>_assignment 1 and then zip this folder into a single zip

file. If all goes according to plan, this file should be called <lastname

and initials>_assignmentl .zip (e.g. for a student named Tom Marvolo

Riddle, riddletm_assignmentl.zip). Be sure to rename the folder before

zipping it, lest it revert to its previous name when uncompressed.

We’ve provided skeleton code for each major function you’ll write in

a file called psl_functions.py. Please fill in these functions (preserving their names, arguments, and outputs) and include your completed

psl-functions.py in your assignment zip file (this is needed by our grading

scripts). Any supplemental code you write (e.g. calling these functions to

generate plots, or trying out different parameters) can be handled however

you choose. A well-structured Jupyter notebook with neatly produced and

labelled figures is an excellent way to compile assignment reports; just be

sure to submit a PDF and the separate psi-functions.py file alongside

the notebook. However you produce your report, ensure all your figures are

clearly labelled.

Programming assignments should use built-in functions in Python

and PyTorch. In general, you may use the scipy stack [1]; however,

exercises are designed to emphasize the nuances of machine learning and

deep learning algorithms – if a function exists that trivially solves an entire

problem, please consult with the TA before using it.

Problem 1

What are the characteristics of a machine learning algorithm and what is meant by “learning” from data?

Problem 2

Least Squares Solution: w* (XTX)~1XTy.

Problem 3

2

1. Load the dataset from file assignmentl.zip and normalize the features using min-max scaling so that

each feature has the same range of values.

2. Find the optimal weights (in terms of MSE) for fitting a polynomial function to the data in all 6 cases

generated above using a polynomial of degree 1, 2, and 9. Use the least squares analytical solution

given above. Do not use built-in methods for regression. Plot the fitted curves on the same plot as the

data points (you can plot all 3 polynomial curves on the same plot). Report the fitted weights and the

MSE in tables. Qualitatively assess the fit of the curves. Does it look like any of the models overfit,

underfit, or appropriately fit the data? Explain your reasoning in one to two sentences (no calculations

necessary).

L2 Norm: |rr||2 =

2. Write a program that applies a fc-nn classifier to the data with k € {1,5,10,15}. Calculate the test

error using both leave-one-out validation and 5-fold cross validation. Plot the test error as a function

of k. You may use the existing methods in scikit-learn or other libraries for finding the fc-nearest

neighbors, but do not use any built-in fc-nn classifiers. Any reasonable handling of ties in finding

fc-nearest neighbors is okay. Also, do not use any existing libraries or methods for cross validation. Do

any values of k result in underfitting or overfitting?

ΣΝ2

,i=l

1. Write code in Python that randomly generates N points sampled uniformly in the interval x & [—1,3].

Then output the function y = x2 — 3rr + 1 for each of the points generated. Then write code that adds

zero-mean Gaussian noise with standard deviation σ to y. Make plots of x and y with N e {15,100}

and σ € {0,0.05,0.2} (there should be six plots in total). Save the point sets for following questions.

Hint: You may want to check the NumPy library for generating noise.

3. Apply L2 norm regularization with a 9-degree polynomial model to the cases with σ — 0.05 and

N € {15,100}. Vary the parameter λ, and choose three values of A that result in the following

scenarios: underfitting, overfitting, and an appropriate fit. Report the fitted weights and the MSE in

each of these scenarios. Hint: The least squares solution can also be used for polynomial regression.

Check slides of lecture 2 for details on L2 norm regularization.

Problem 4

Wi =0.6

-0.4

w7 = 1

Output -0.5

w, w8 = 1

-0.5

w6 =0.8

Figure 1: Multilayer perceptron with three inputs and one hidden layer. Numbers in circles are biases.

3

5. Using perceptrons with appropriate weights and biases, design an adder that does two-bit binary

addition. That is, the adder takes as input two two-bit binary numbers (i.e. 4 binary inputs) and adds

1. Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive

constant, c > 0. Show that the behavior of the network doesn’t change. (Exercise in Chi Nielsen book)

4. If we change the perceptrons in Figure 1 to sigmoid neurons what are the outputs for the same inputs

(e.g., inputs of [0,0,0], [0,0,1], …)?

*2

Xl

X3

/ VV4 “0.4

-0.6

W2\=-0.7

\w3. For each possible input of the MLP in Figure 1, calculate the output. I.e., what is the output if

X — [0,0,0], X = [0,0,1], etc. You should have 8 cases total.

3. Apply two other classifiers of your choice to the same data. For these additional classifiers, you may

use existing libraries, such as scikit-learn classifiers, but for cross-validation, you should reuse your

method from 3.2 or modify it slightly. Possible algorithms include (but are not limited to) logistic

regression, QDA, naive Bayes, SVM, and decision trees. Use 5-fold cross validation to calculate the

test error. Report the training and test errors. If any tuning parameters need to be selected, use crossvalidation and report the training and test error for several values of the tuning parameters. Which of

the classifiers performed best? Did any of them underfit or overfit the data? How do they compare to

the fc-nn classifiers in terms of performance?

2. Given the same setup of problem 4.1 – a network of perceptrons – suppose that the overall input to

the network of perceptrons has been chosen and fixed. Suppose the weights and biases are such that

wx+b / 0 for the input x to any particular perceptron in the network. Now replace all the perceptrons

in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c > 0.

Show that in the limit as c —> 00 the behavior of this network ofsigmoid neurons is exactly the same as

the network of perceptrons. How can this fail when wx + b — 0 for one of the perceptrons? (Exercise

in Chi Nielsen book)

Problem 5

Here are the experiments:

• Experiment with the optimizer and activation function of your network.

4

3. Print a confusion matrix showing which digits were misclassified, and what they were misclassified as.

What numbers are frequently confused with one another by your model? (You may use sklearn’s

confusion matrix function to generate the matrix.)

1. The time has come — to implement your first fully-connected neural network in PyTorch! For this

assignment, we’ll be training the network on the canonical MNIST dataset. After building the network,

we’ll experiment with an array of hyperparameters, tweaking the network’s width, depth, learning rate

and more in pursuit of the highest classification accuracy we can muster. You might also choose to

match wits with your classmates, by vying to get your network on the class leaderboard of MNIST

scores: https://piazza.com/class/kyoikimyzbz6xj ?cid=8

them together. Don’t forget to include the carry bit. The resulting output should be the two-bit sum

and the carry bit for a total of three binary outputs.

You may find the Pytorch tutorials helpful as you complete this problem: https://pytorch.org/

tutorials/beginner/basics/intro .html. If you haven’t yet, we suggest you go through them —

especially the tutorial on the optimization loop, which you will need to build more or less from scratch.

• Follow the TODOs in FCNN.py to build a two-layer fully-connected neural network. We’ve provided code to handle the dataset and model initiation, but you need to supply the training logic.

• Try adjusting the learning rate (by making it smaller) if your model is not converging/improving

in accuracy. You might also try increasing the number of epochs used.

• Try changing the width of the hidden layer, keeping the activation function that performs best.

Remember to add these results to your table.

• Experiment with the non-linearity used before the middle layer. Here are some activation functions

to choose from: relu, softplus, elu, tanh.

• Lastly, try adding additional layers to your network. How do 3, 4, and 5 layer networks perform?

Is there a point where accuracy stops increasing?

• Try training your network without a non-linearity between the layers (i.e. a “linear activation”).

Then try adding a sigmoid non-linearity, first directly on the input to the first layer, then on the

input to the second layer. You should experiment with these independently and in combination.

Record your test results for each in a table

2. Create a plot of the training and test error vs the number of iterations. How many iterations are

sufficient to reach good performance?

References

scipy.org/stackspec.html

5

4. What was the highest percentage of classification accuracy your fully-connected network achieved?

Briefly describe the architecture and training process that produced it. (If you like, you can take

part in our friendly class competition by posting your results, along with a short description of your

methods, to https://piazza.com/class/kyoikimyzbz6xj?cid=8.)

[1] “The scipy stack specification^.” [Online]. Available: https://www.