Machine Learning for Signal Processing

(ENGR-E 511; CSCI-B 590)

Homework 4

P1: When to applaud? [5 points]

1. Piano Clap.wav is an audio signal simulating a music concert, particularly at the end of a

song. As some of the audience haven’t heard of the song before, they started to applaud

before the end of the song (at around 1.5 seconds). Check out the audio.

2. Since I’m so kind, I did the MFCC conversion for you, which you can find in mfcc.mat. If you

load it, you’ll see a matrix X, which holds 962 MFCC vectors, each of which has 12 coefficients.

3. You can find the mean vectors and covariance matrices in MuSigma.mat that I kindly provide,

too. Once you load it, you’ll see the matrix mX, which has two column vectors as the mean

vectors. The first column vector of mX is a 12-dimensional mean vector of MFCCs of the

piano-only frames. The second vector holds another 12-dimensional mean vector of MFCCs

of the claps. In addition to that, Sigma, has a 3D array (12×12×2), whose first 2D slice

(12×12) is the covariance matrix of the piano part, and the upper 2D slice is for the clap

sound.

4. Since you have all the parameters you need, you can calculate the p.d.f. of an MFCC vector

in X for the two multivariate normal (Gaussian) distribution you can define from the two sets

of means and cov matrices. Go ahead and calculate them. Put them in a 2 × 962 matrix:

P =

P(X1|C1) P(X2|C1) · · · P(X962|C1)

P(X1|C2) P(X2|C2) · · · P(X962|C2)

. (1)

1

Note that C1 is for the piano frames while C2 is for the applause. Normalize this matrix,

so that you can recover the posterior probabilities of belonging to the two classes given an

MFCC frame:

P˜

c,t =

P(Xt|Cc)

P2

c=1 P(Xt|Cc)

(2)

Plot this 2 × 962 matrix as an image (e.g. using the imagesc function in MATLAB). This

is your detection result. If you see a frame at t

′ where P˜

1,t′ < P˜

2,t′ , it could be the right

moment to start clapping.

5. You might not like the this result, because it is sensitive to the wrong claps in the middle.

You want to smooth them out. For this, you may want to come up with a transition matrix

with some dominant diagonal elements:

T =

0.9 0.1

0 1

. (3)

What it means is that if you see a C1 frame, you’ll want to stay at that status (no clap) in the

next frame with a probability 0.9, while you want to transit to C2 (clap) with a probability

of 0.1. On the other hand, you absolutely want to stay at C2 once you observe a frame with

that label (you don’t want to get back if you start clapping).

Apply this matrix to your P˜ matrix in a recursive way:

P¯

:,1 = P˜

:,1 (4)

b = arg max

c

P¯

c,t (5)

P¯

:,t+1 = T

⊤

b,: ⊙ P˜

:,t+1 (6)

(7)

First, you initialize the first column vector of your new posterior prob matrix P¯ (you have

no previous frame to work with at that moment). For a given time frame of interest, t + 1,

you first need to see its previous frame to find which class the frame belongs to (by using

the simple max operation on the posterior probabilities at t). This class index b is going to

be used to pick up the corresponding transition probabilities (one of the row vectors of the

T matrix). Then, the transition probabilities will be multiplied to your existing posterior

probabilities at t + 1.

In the end, you may want to normalize them so that they can serve as the posterior probabilities:

P¯

1,t+1 = P¯

1,t+1/

P¯

1,t+1 + P¯

2,t+1

P¯

2,t+1 = P¯

2,t+1/

P¯

1,t+1 + P¯

2,t+1

. (8)

Repeat this procedure for all the 962 frames.

6. Plot your new smoothed post prob matrix P¯ . Do you like it?

7. If you don’t like the na¨ıve smoothing method, there is another option, called the Viterbi

algorithm. This time, you’ll calculate your smoothed post prob matrix in this way:

P¯

:,1 = P˜

:,1 (9)

First, initialize the first frame post prob as usual. Then, for (t+ 1)-th frame, we will construct

P¯

c,t. For this, we need to see t-th frame, but this time we check on all paths to pick up the

best one by calculating all the probabilities of state transitions from c at t to c

′ at t + 1. For

example, from (c, t) to (c

′ = 1, t + 1):

b = arg max

c

T c,1P¯

c,t. (10)

This will tell you which of the two previous states c = 1 and c = 2 at t was the best transition

to the current state C1 at t + 1 (we’re seeing only C1 for now). If b = 1, it’s more probable

that the previous state at t was C1 rather than C2. We keep a record in our B matrix for this

best choices:

B1,t+1 = b (11)

Eventually, we use this best transition probability to construct the post prob of C1 at t + 1:

P¯

1,t+1 = T b,1P¯

b,tP 1,t+1 (12)

(13)

In other words, the prior probability (the transition probability and the accumulated post

prob in the t-th frame), T b,1P¯

b,t, is multiplied to the likelihood P 1,t+1 to form the new post

prob.

We keep this best previous state sequences. For example, B1,102 will hold the more likely

previous state at 102-th frame if that frame’s state is C1, while B2,102 records its corresponding

101-th state that could have transit to C2.

8. Don’t forget to repeat this for P¯

2,t+1.

9. Don’t forget to normalize P¯ as in (8).

10. Once you construct your new post prob matrix from the first frame to the 962-th frame, you

can back track. Choose the larger one from P¯

1,962 and P¯

2,962. Let’s say P¯

2,962 is larger.

Then, refer to B2,962 to decide the state of the 961-th frame, and so on.

11. Report this series of states as your backtracking result (e.g. as a plot). This will be the hidden

state sequence you wanted to know. Is the start of the applause making more sense to you?

P2: Multidimensional Scaling [3 points]

1. MDS pdist.mat holds a 4808×4808 square matrix, which holds squared pairwise Euclidean

distances that I constructed from a set of data points. Of course I won’t share the original

data samples with you. Instead, you’ll write a code for multidimensional scaling so that you

can recover the original map out of L. If your MDS routine is successful, you should be able

to see what the original map was. The original map was in the 2D space, so you may want

to see two largest eigenvectors. Report the map you recovered in the form of a scatter plot

(dots represent the location of the samples).

3

2. Note that the MDS result can be rotated and shifted.

P3: Kernel PCA [4 points]

1. concetric.mat contains 152 data points each of which is a 2-dimensional column vector. If

you scatter plot them on a 2D space, you’ll see that they form the concentric circles you saw

in class.

2. Do kernel PCA on this data set to transform them into a new 3-dimensional feature space,

i.e. X ∈ R

2×152 ⇒ Kernel PCA ⇒ Z ∈ R

3×152

.

3. In theory, you have to do the centering and normalization procedures in the feature space,

but for this particular data set, you can forget about it.

4. You remember how the after-transformation 3D features look like, right? Now your data set

is linearly separable. Train a perceptron that does this linear classification. In other words,

your perceptron will take the transformed version of your data (a 3-dimensional vector) and

do a linear combination with three weights w = [w1, w2, w3]

⊤ plus the bias term b:

yi = σ

w⊤Z:,i + b

, (14)

where yt is the scalar output of your perceptron, σ is the activation function you define. If

you choose logistic function as your activation, then you’d better label your samples with 0

or 1. You know, the first 51 samples are for the inner circle, so their label will be 0, while for

the other outer ring samples, I’d label them with 1.

You’ll need to write a backpropagation algorithm to estimate your parameters w and b, which

minimize the error between yi and ti

. There are a lot of choices, but you can use the sum of

the squared error: P

i

1

2

(yi − ti)

2

. Note that this dummy perceptron doesn’t have a hidden

layer. Note that you may want to initialize your parameters with some small (uniform or

Gaussian) random numbers centered around zero.

5. Your training procedure will be sensitive to the choice of the learning rate and the initialization

scheme (the range of your random numbers). So, you may want to try out a few different

choices. Good luck!

P4: Stereo Matching (revisited) [4 points]

1. You remember you did the GMM-based stereo matching, right? If you forgot, revisit Homework 2 Problem 3 and Problem 4.

2. Extend your implementation with the MRF’s smoothing priors using an eight neighborhood

system (e.g. Ni,j =

(i − 1, j − 1),(i − 1, j),(i − 1, j + 1),(i, j − 1),(i, j + 1),(i + 1, j − 1),(i +

1, j),(i+ 1, j + 1)

. Feel free to choose either ICM or Gibbs sampling. Show me the smoothed

results. You can use the Gaussian-kernel-looking prior probability equations discussed in class

(L14 S8).

## Reviews

There are no reviews yet.