BME 590L: Homework 2




5/5 - (2 votes)

BME 590L: Homework 2

Please write-up solutions to all problems (however you’d like) and submit them in class
or via Github by the above due date. It is important that you show all of your steps – points
will be deducted for simply writing the answer without showing any intermediate steps on
how you arrived at your answer. You may work together to solve these problems, but please
write up your own answer in your own way. Bonus problems are not required, but are worth
extra points.
Remember that there is a coding component to this homework assignment as well,
which can be found on Github in the corresponding folder:
Problem 1: The magic of the max operator. In this problem, we’re going to try to
classify a set of 3 different training data points in two dimensions. Here are the example
points and their associated labels:
x1 =


, y1 = +1; x2 =


, y2 = −1; x3 =


, y3 = +1
(a) Let’s try to classify these points with a linear classifier, y
∗ = sign(wT x), where w is
the unknown 3×1 weight vector that we’d like to solve for. Remember, we set the first
entry of each 3×1 data vector x to 1 to fold the usual constant offset term b into the
inner product between w and x. To solve for w, please construct a 3×3 data matrix
X that holds each of our 3 example points, expressed as a vector, in each matrix row.
Then, solve for the optimal linear classifier weights w by computing the pseudo-inverse
of X, and then by multiplying this pseudo-inverse, X†
, with a 3×1 vector containing
the labels. You can either compute X† and the resulting w by hand, or use a computer
to help.
(b) How well do the optimal weights w do at classifying the original 3 data points? To
check this, it is helpful to solve for y
∗ = sign(wTXˆ ), where Xˆ is a matrix that holds
each of the training data points in its 3 columns, and y
∗ ∈ R
is a vector that holds
the three classification score estimates. Please sketch the 3 points in 2D space, and
discuss why the above classification attempt was or was not so successful.
(c) Maybe it’ll help to do a convolutional classification instead of a straight linear classification? To test this, let’s assume that we’ve heard that a good convolutional filter
to mix this data up a bit is c = [3, −1] (written as a row vector here to save space).
First, construct a convolution matrix W1 ∈ R
4×3 with c in each column. Then, we’ll
try out the following classifier,
∗ = sign(w2
TW1x), (1)
where w2 is a 4×1 weight vector of unknown weights. Is it possible to determine a w2
that can correctly classify the three points? Please provide a mathematical argument
for why or why not.
(d) Finally, let’s try to add a non-linearity in there, to see if that will help. Using the same
W1 as above, let’s solve,
∗ = sign 
TReLU(W1Xˆ )

, (2)
where w2 is again a 4×1 weight vector of unknown weights, and the ReLU operation
is the max() operation that sets all values less than 0 equal to 0. Is it possible to
determine a w2 that can accurately classify the three points contained in the columns
of Xˆ ? Please provide a mathematical argument for why or why not.
Problem 2: Cost function for logistic classification. We showed in class that the crossentropy cost function works well for classifying images into outputs that are treated as probability distributions, as opposed to binary ”yes/no” categories. This cost function takes the
following form for the case of logistic classification:
Lin(w) = 1
1 + e
−ynwT xn

(a) Noting that the standard logistic function θ(x) = e
, please compute the gradient
of Lin and show that,
∇wLin(w) = 1
T xn) (4)
(b) Based on this computation, please show that a misclassified input contributes more to
the gradient than a correctly classified input. This is an important thing to remember
– the inputs that are misclassified will be important in driving your gradient descent
(c) (bonus problem) Compute the Hessian of Lin(w) with respect to w. Then, use this
Hessian to write an expression for the optimal step size to use for a steepest descent
solver for the logistic classification problem.
Problem 3: Perceptron Learning Algorithm. A simple iterative method related to solving the logistic classification problem is termed the Perceptron Learning Algorithm. In the
simplest form of this approach, we’ll start by guessing an initial separation boundary between training data points, where the data points are assigned known labels in one of of
two categories. The separation boundary is defined by a vector of weights w, just like we’ve
been considering in class. At iteration t, where t = 1, 2, …, the weight vector will take on
a value w(t). Then, the algorithm will pick one of the currently misclassified training data
points (x(t), y(t)) and will use it to update w(t). Since the example is misclassified, we have
y(t) 6= sign(wT
(t)x(t)). The update rule is,
w(t + 1) = w(t) + y(t)x(t) (5)
The PLA algorithm continues through the iterative lop until there are no longer any misclassified points in the training data set, and will eventually converge to a linear separator
for linearly separable data.
(a) To show that the weight update rule above moves in the direction of correctly classifying
x(t), first show that
(t)x(t) < 0 (6)
Hint: x(t) is misclassified by w(t)
(b) Then, show that,
(t + 1)x(t) > y(t)w
(t)x(t) (7)
Hint: Use the update rule here
(c) As far as classifying x(t) is concerned, argue that that move from w(t) to w(t + 1) is
a move in the right direction. Feel free to draw a picture of a simple example in 2D if
you find that helpful.
Problem Let’s write out a CNN equation!. Consider the CNN network for image classification sketched on the next page. We’re going to write this out as a series of matrix
operations. Fig. 1(a) contains a simple 3-layer CNN network for 2D images, where a first
convolution layer is comprised of 2 convolutional filters with unknown weights (which creates
2 depth slices). After this convolution, a non-linearly is followed by “sum-pooling”, which
is an operation that takes every group of 2 × 2 pixels in the input and adds them together
to form 1 new pixel at the output, effectively shrinking each image by a factor of 2 in either
dimension. These 3 steps are repeated in Layer 2, where now the smaller set of 2 images are
each convolved with a set of 2 convolutional filters with unknown weights to produce 4 depth
slices. The final fully connected layer maps all of the data to a set of 3 possible categories
via a full matrix multiplication.
In Fig. 1(b), I have sketched the same pipeline for a 1D image, to make things a little
simpler. Let’s assume the 1D input image contains 4 pixels: xn ∈ R
. Please write out this
entire CNN pipeline as a set of matrix operations, where you show the correct size and composition of the elements within each matrix (2 convolution matrices, 2 sum-pooling matrices
and one fully connected matrix). You can just write ’ReLU’ where a nonlinearity is applied,
or ignore them in your analysis (you do not and cannot easily write these nonlinearities as
Input Image 2×2 Conv layer:
2 filters
n x n x 2 n x n
n/2 x n/2 x 4
2×2 Conv layer:
4 filters
ReLU 2×2
3 labels
Input Image
2×1 Conv layer:
2 filters
2×1 Conv layer:
4 filters
ReLU 2×1
3 labels
(a) CNN for 2D images
(b) CNN for 1D image
Figure 1: CNN layout for Problem 4
matrix operations). I’d like to see all the matrices written out to establish the mapping
between xn ∈ R
1×4 and yn ∈ R
. You should express the convolutional filters as variables
within each matrix, c
k = (c(1)l
, c(2)l
) for a 2×1 filter, using k to denote the depth slice and
l the layer it applies to. Note: to deal with the creation of multiple depth slices, you should
“stack” matrices on top of one another.