CSCI 631 Homework 3

Introduction

For this assignment, we will perform binary classifications of facial expressions using a logistic

regressor, a 1-layer neural network, and the SVM classifier. The main goal is to successfully

write the predict and train functions for both classifiers and compare these with values

obtained from other classifiers. The dataset we are using is quite noisy and was obtained

from the very 2013 Kaggle facial expression classification competition1

. Although the original

dataset contained 7 emotional classes (six of which are shown in Figure 1), we will only use

two of them (happy and sad) for this assignment.

Download the homework3.zip file from myCourses Contents→Programming assignments→Homework3;

this contains the instructions, starter code and image data needed for the assignment.

Figure 1: The top row shows three examples of correctly labeled faces from the Kaggle challenge; left to

right – angry, disgust and fear. The bottom row shows three incorrectly labeled faces; left to right – happy,

sad and surprise. Neutral face is not shown.

Requirements

You should perform this assignment using Python along with any image library of your

choice, and it is due on Sunday November 3rd by 11:59pm. You are required to submit

your code in a Jupyter notebook along with a brief report containing short write-ups based

on the question(s) in the assignment. Your solutions should be zipped and uploaded to

myCourses via Assignments (formerly known as Dropbox) before the due date.

Your submitted zipped file for this assignment should be named LastnameFirstname hw3.zip

and should contain at least two files – LastnameFirstname hw3.pdf and LastnameFirstname hw3.ipynb. Feel free to submit any other auxiliary files required to run your code.

We should be able to execute your code for the assignment from your submitted Jupyter

notebbok. Include a Readme file if necessary (especially if using external libraries other than

those used in the starter code).

1https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-\

\challenge/data

1

CSCI 631 Homework 3 October 22, 2019

The Data Files

You are provided with two data files, where the first file fer3and4train.csv contains the

training data with 12,066 data samples and the second file fer3and4test.csv contains the

data that you will be testing your classifier on. Results should be reported on the test dataset

which contains 2000 data samples. The files were created using fer2013.csv from Kaggle

but have been shuffled and augmented to avoid the class imbalance problem.

All the files are stored as comma separated files with three columns each. The first column

contains the label of the emotional expression, where 3=happy and 4=sad; the second column

contains 2304 integer values (between 0 and 255) obtained by vectorizing 48 × 48 grayscale

images of faces; and the last column states in which part of the development process the

data should be used (i.e training or testing).

Problem 1. The logistic regression classifier (Total 30 points)

Figure 2: The logistic regression classifier

The file LRClass.py contains the skeleton code for your logistic regression module. This

is accompanied by an auxiliary file called helper.py containing the names of some helper

functions needed for your module to run. In the provided Jupyter notebook, different parts of

the code have been marked for your implementation. First load the the training data samples

X and their corresponding class labels Y using the helper function getBinaryfer13Data.

Then use the class object to train the data. Call the train function to learn the weights

and bias of the unit. The following occur within the train function:

i Initialize the weights W to small random numbers (variance – zero); also initialize the

bias b to zero. The function init weight and bias is provided to aid in this.

ii Create a loop over the number of epochs specified. Within the loop, the following

occur:

iii Call a forward function to calculate the predictions P(Y |X) also known as pY . The

forward function implements σ(X ·W+b). The argument of this equation can be implemented in numpy as σ(np.dot(X, W)+b) where σ is the sigmoid activation function;

this is provided also as a helper function.

2

CSCI 631 Homework 3 October 22, 2019

iv Next, learn the weights via back-propagation, by performing gradient descent using

the equations below:

W = W − η

∂J

∂W ; W = W − η · (X · (pY − Y )) (1)

Note: When doing matrix computations, the product of the vectors XY can be

written as np.dot(X.T, Y); Also,

b = b − η

∂J

∂b ; b = b − η · (pY − Y ) (2)

v Apply the forward algorithm to predict the new labels for both the training and validation data. Compute the training cost and validation cost at each epoch and append

to a growing array (one training, the other validation). Keep note of the best error

value on the validation data.

vi Print out the best error value on the validation data. Provide this value in your final

report.

vii Plot your training and validation costs to show how the errors are changing over time.

See Figure 4. Dispaly this in your report.

viii Lastly load a new dataset, your test data from the file fer3and4test.csv. Compute

and print the accuracy (or classification rate) for predicting this new dataset from your

trained network. Provide this information in your report.

Figure 3: An example of the loss curves from the logistic regressor after 1000 epochs

Problem 2. The neural network classifier (Total 45 points)

Similar to the previous exercise, the class NNclass in the Jupyter notebook contains the

skeleton code for your neural network module. Again there are some helper functions in

helper.py. Load the training data samples X and their corresponding class labels Y using

the helper function getBinaryfer13Data. Call the train function to learn the weights and

bias of the unit. The following occur within the train function.

3

CSCI 631 Homework 3 October 22, 2019

Figure 4: The single hidden layer neural network with two softmax output nodes

i Initialize the weights W1, W2 to small random numbers (variance – zero); also initialize

the bias b to zero. The function init weight and bias NN is provided to aid in this.

Remember, this time you need to set the number of hidden units in layer 1. This is

your design choice.

Note: The dimensions of these parameters here are different from those in the LRClass.

W1 is D × M1, where M1 is the number of hidden nodes and the dimension of W2 is

M1 × K, where K is the number of output classes.

ii Create a loop over the number of epochs specified. Within the loop, the following

occur:

iii Call a forward function twice to calculate P(Y train|X) also known as pY and Ztrain

(activations at hidden layer); and the other to calculate P(Y valid|Xvalid) and Zvalid

on the validation data. This implies that your forward function needs to return two

values (i) pY , the output of the softmax classifier and (ii) the hidden activations Z,

based on which activation function you choose (tanh, sigmoid, or ReLU ).

iv Now we do a first round of back propagation by first performing gradient descent using

equations (3) and (4) below;

W2 = W2 − η

∂J

∂W2

; W2 = W2 − η · (Z

· (pY − Y )) (3)

b2 = b2 − η

∂J

∂b2

; b2 = b2 − η · (pY − Y ) (4)

v Then we propagate the errors we got from the previous layer W2 to update W1 and b1

via equations (5)-(7):

∂J

∂Z = (pY − Y ) · W2

· (1 − Z

2

) (5)

W1 = W1 − η · X ·

∂J

∂Z (6)

4

CSCI 631 Homework 3 October 22, 2019

b1 = b1 − η ·

∂J

∂Z (7)

Matrix multiplications in numpy will be sufficient to complete the processes.

Note: Only the training data is used for updating weights in gradient descent.

vi Apply the forward algorithm to predict the new labels for the training and validation

data and also compute the sigmoid costs of predicting them compared with the true

labels. Append the resulting cost to the growing arrays.

vii Keep note of the best error value on the validation data. Also compute and print

out the training and validation classification rates. This should be given in your final

report.

viii Display the graphs of both your training and validation errors (created from the above

process) to show how the errors change with time. Display these in your report.

ix Lastly load a new dataset, your test data from the file fer3and4test.csv. Compute

and display the accuracy (or classification rate) for predicting this new dataset from

your trained network. This value should be shown in your final report.

Figure 5: An example of the loss curves from the logistic regressor after 1000 epochs

Problem 3. Adding regularizers (Total 10 points)

After the initial training and testing, go back and add a regularizer to the cost function of

the neural network and retrain. Now report your new error rates and accuracies.

(a) Add the L1, L2 or ElasticNet regularizer using the formula provided in the slides from

class

(b) Print out your new classification rates, both for the training and validation datasets,

as well as the test dataset.

(c) Discuss your observations in working with a regularizer (in your report).

5

CSCI 631 Homework 3 October 22, 2019

Problem 4. Training with SVM (Total 15 points)

Now go back and we will train with SVM. Scitkit-learns is a good machine learning library

with an easy-to-use SVM interface. Similar to the previous problems, train an SVM on a

portion of the training data, test out your results on a validation set and finally run the true

tests on the test data set. Report your results including the accuracies when using :

(a) A linear kernel

(b) A radial-basis-function (RBF) kernel

(c) A polynomial kernel. Play around with the orders of the polynomial to determine

which one gives the best result.

(d) Discuss your results in working with the 3 different kernels in your report

Note: Training an SVM with over 11,000 data samples of dimension 2034 will take a very

long time.

BONUS: (Total 10 points) Use a different activation function for the neural network to

see if the results will improve. Don’t forget to change your derivatives based on the new

activation function you choose. There are some functions to help with this in helper.py.

Add the code into your ipython notebook and discuss your findings in your report.

You should turn in both your code and report in the zipped file to get full credit.

6