Homework #1 STA410H1F/2102H1F

1. In this problem, you will use the discrete cosine transform (DCT) to denoise an image.

An R function to compute a two-dimensional DCT is available in the R package dtt – this

package contains functions to compute a number of “trigonometric” transforms. The R

function that will be used in this package is mvdct.

(a) In lecture, we defined matrix transforms for vectors of length n, focusing on families of

matrices {An} satisfying AT

nAn = Dn where Dn is a diagonal matrix.

Suppose that Z is an m × n pixel image, which we can represent as an m × n matrix. Then

using {An}, we can define a transform of Z as follows:

Zb = AmZAT

n

Given the m × n matrix Zb, show that we can reconstruct the original image by Z =

D−1

m AT

mZAb

nD−1

n

.

(b) To denoise an image using the DCT, we first compute Zb and then perform hard- or

soft-thresholding to obtain a thresholded (or shrunken) transform Zb∗

(which will typically

be “sparser” than Zb). Our hope is that the components of Zb corresponding to noise in the

image are eliminated and so the denoised image can be obtained by applying the inverse

transform to Zb∗

.

Using mvdct, write R functions to perform both hard- and soft-thresholding (dependent

on a parameter λ). Your functions should not threshold the (1, 1) component of the DCT

matrix. (A simple R function to do hard thresholding will be provided on Quercus; this can

be modified to do soft thresholding.)

(c) The file boats.txt contains a noisy 256 × 256 pixel grayscale image of sailboats. Its

entries are numbers between 0 and 1 where 0 represents black and 1 represents white. The

data can be read into R and displayed as an image as follows:

> boats <- matrix(scan(“boats.txt”),ncol=256,byrow=T)

> image(boats, axes=F, col=grey(seq(0,1,length=256)))

Using your functions from part (b), try to denoise the image as best as possible. (This is

quite subjective but try different methods and parameter values.)

Note: You should not expect your noise-reduced image to be a drastic improvement over

noisy image; in fact, connaisseurs of pointillism may find prefer the noisy image. There

are two issues here: First, we are applying noise reduction to the whole image rather than

dividing the image into smaller sub-images and applying noise reduction to each sub-image.

Second, even very good noise reduction tends to wash out some details, thereby rendering

the noise-reduced image less visually appealing.

2. Suppose that U and V are independent Poisson random variables with means λu and λv.

We then define X = U + 2V , which is said to have a Hermite distribution with parameters

λu and λv. (The Hermite distribution is the distribution of a sum of two correlated Poisson

random variables.)

(a) Show that the probability generating function of X is

g(s) = E(s

X) = exp h

λu(s − 1) + λv(s

2 − 1)i

.

(b) The distribution of X can, in theory, be obtained exactly in closed-form with exact

computation for given λu and λv somewhat more difficult. However, we can approximate

the distribution very well by combining the exact probability generating with the discrete

Fourier transform. The key to doing this is to find M such that P(X ≥ M) is very small so

that we can use the discrete Fourier transform to compute (to a very good approximation)

P(X = x) for x = 0, 1, · · · , M − 1.

One approach to determining M is to use the probability generating function of X and

Markov’s inequality. Specifically, if s > 1 we have

P(X ≥ M) = P(s

X ≥ s

M) ≤

E(s

X)

sM

=

exp [λu(s − 1) + λv(s

2 − 1)]

sm

Use this fact to show that for P(X ≥ M) ≤ , we can take

M = inf

s>1

λu(s − 1) + λv(s

2 − 1) − ln()

ln(s)

(c) Given M (which depends on ), the algorithm for determining the distribution of X goes

as follows:

1. Evaluate the probability generating function g(s) at s = sk = exp(−2πιk/M) for

k = 0, · · · , M − 1; the values of s can be created in R as follows:

> s <- exp(-2*pi*1i*c(0:(M-1))/M)

2. Evaluate P(X = x) by computing the inverse FFT of the sequence {g(sk) : k =

0, · · · , M − 1}:

P(X = x) = 1

M

M

X−1

k=0

exp

2πι

x

M

k

g(sk)

Write an R function to implement this algorithm where M is determined using the method in

part (b) with = 10−5

. Use this function to evaluate the distribution of X for the following

two cases:

(i) λu = 1 and λv = 5;

(ii) λu = 0.1 and λv = 2.

Note that you do not need to evaluate the bound M with great precision; for example, a

simple approach is to take a discrete set of points S = {1 < s1 < s2 < · · · < sk} and define

M = min

s∈S

λu(s − 1) + λv(s

2 − 1) − ln()

ln(s)

where δ = si+1 − si and sk are determined graphically (that is, by plotting the appropriate

function) so that you are convinced that the value of M is close to the actual infimum.

Supplemental problems (not to hand in):

3. As noted in lecture, catastrophic cancellation in the subtraction x − y can occur when x

and y are subject to round-off error. Specifically, if fl(x) = x(1 +u) and fl(y) = y(1 +v) then

fl(x) − fl(y) = x − y + (xu − yv)

where the absolute error |xu − yv| can be very large if both x and y are large; in some cases,

this error may swamp the object we are trying to compute, namely x − y, particularly if

|x − y| is relatively small compared to |x| and |y|. For example, if we compute the sample

variance using the right hand side of the identity

Xn

i=1

(xi − x¯)

2 =

Xn

i=1

x

2

i −

1

n

Xn

i=1

xi

!2

, (1)

a combination of round-off errors from the summations and catastrophic cancellation in the

subtraction may result in the computation of a negative sample variance! (In older versions of

Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.)

In this problem, we will consider two algorithms for computing the sample variances that

avoid this catastrophic cancellation. Both are “one pass” algorithms, in the sense that we

only need to cycle once through the data (as is the case if we use the right hand side of (1));

to use the left hand side of (1), we need two passes since we need to first compute ¯x before

computing the sum on the left hand side of (1). In parts (a) and (b) below, define ¯xk be the

sample mean of x1, · · · , xk and note that

x¯k+1 =

k

k + 1

x¯k +

1

k + 1

xk+1

with ¯x = ¯xn.

(a) Show that Xn

i=1

(xi − x¯)

2

can be computed using the recursion

k

X

+1

i=1

(xi − x¯k+1)

2 =

X

k

i=1

(xi − x¯k)

2 +

k

k + 1

(xk+1 − x¯k)

2

for k = 1, · · · , n − 1. (This is known as West’s algorithm.)

(b) A somewhat simpler one-pass method replaces ¯x by some estimate x0 and then corrects

for the error in estimation. Specifically, if x0 is an arbitrary number, show that

Xn

i=1

(xi − x¯)

2 =

Xn

i=1

(xi − x0)

2 − n(x0 − x¯)

2

.

(c) The key in using the formula in part (b) is to choose x0 to avoid catastrophic cancellation,

that is, x0 should be close to ¯x. How might you choose x0 (without first computing ¯x) to

minimize the possibility of catastrophic cancellation? Ideally, x0 should be calculated using

o(n) operations.

(An interesting paper on computational algorithms for computing the variance is “Algorithms

for computing the sample variance: analysis and recommendations” by Chan, Golub, and

LeVeque; this paper is available on Quercus.)

4. (a) Suppose that A, B, C, and D are matrices so that AC and BD are both well-defined.

Show that

(AC) ⊗ (BD) = (A ⊗ B)(C ⊗ D)

(Hint: This is easier than it looks — the key is to start with the right hand side of the

identity.)

(b) Use the result of part (a) to show that

(A ⊗ B)

−1 = A

−1 ⊗ B

−1

for invertible matrices A and B.

(c) Suppose that H2

k is a 2k × 2

k Hadamard matrix. Prove the claim given in lecture:

H2

k =

Y

k

j=1

(I2

j−1 ⊗ H2 ⊗ I2

k−j )

(Hint: Use induction, starting with the fact that the identity holds trivially for k = 1.)

STA410H1F/2102H1F

# Homework #1 STA410H1F/2102H1F

Original price was: $40.00.$35.00Current price is: $35.00.

## Reviews

There are no reviews yet.