Assignment #2 STA410H1F/2102H1F

1. An interesting variation of rejection sampling is the ratio of uniforms method. We start

by taking a bounded function h with h(x) ≥ 0 for all x and Z ∞

−∞

h(x) dx < ∞. We then

define the region

Ch =

(u, v) : 0 ≤ u ≤

q

h(v/u)

and generate (U, V ) uniformly distributed on Ch. We then define the random variable X =

V/U.

(a) The joint density of (U, V ) is

f(u, v) = 1

|Ch|

for (u, v) ∈ Ch

where |Ch| is the area of Ch. Show that the joint density of (U, X) is

g(u, x) = u

|Ch|

for 0 ≤ u ≤

q

h(x)

and that the density of X is γ h(x) for some γ > 0.

(b) The implementation of this method is somewhat complicated by the fact that it is

typically difficult to sample (U, V ) from a uniform distribution on Ch. However, it is usually

possible to find a rectangle of the form Dh = {(u, v) : 0 ≤ u ≤ u+, v− ≤ v ≤ v+} such that

Ch is contained within Dh. Thus to draw (U, V ) from a uniform distribution on Ch, we can

use rejection sampling where we draw proposals (U

∗

, V ∗

) from a uniform distribution on the

rectangle Dh; note that the proposals U

∗ and V

∗ are independent random variables with

Unif(0, u+) and Unif(v−, v+) distributions, respectively. Show that we can define u+, v− and

v+ as follows:

u+ = max

x

q

h(x) v− = min

x

x

q

h(x) v+ = max

x

x

q

h(x).

(Hint: It suffices to show that if (u, v) ∈ Ch then (u, v) ∈ Dh where Dh is defined using u+,

v−, and v+ above.)

(c) Implement (in R) the method above for the standard normal distribution taking h(x) =

exp(−x

2/2). In this case, u+ = 1, v− = −

q

2/e = −0.8577639, and v+ =

q

2/e = 0.8577639.

What is the probability that proposals are accepted?

1

2. Suppose we observe y1, · · · , yn where

yi = θi + εi (i = 1, · · · , n)

where {εi} is a sequence of random variables with mean 0 and finite variance representing

noise. We will assume that θ1, · · · , θn are smooth in the sense that θi = g(i) for some

continuous and differentiable function g. The least squares estimates of θ1, · · · , θn are trivial

— bθi = yi

for all i — but we can modify least squares in a number of ways to accommodate

the “smoothness” assumption on {θi}. In this problem, we will consider estimating {θi} by

minimizing

Xn

i=1

(yi − θi)

2 + λ

nX−1

i=2

(θi+1 − 2θi + θi−1)

2

where λ > 0 is a tuning parameter that controls the smoothness of the estimates bθ1, · · · ,

bθn.

(This method is known as Whittaker graduation in actuarial science and the HodrickPrescott filter in economics.)

(a) Show that if {yi} are exactly linear, i.e. yi = a × i + b for all i then bθi = yi

for all i.

(b) In principal, we could compute bθ1, · · · ,

bθn using ordinary least squares estimation. Show

that bθ = (bθ1, · · · ,

bθn)

T minimizes

ky

∗ − Xθk

2

where y

∗

is a vector of length 2n−2 and X is an (2n−2)×n matrix. What are y

∗ and X?

(c) When n is large, computing bθ1, · · · ,

bθn directly, for example, using the OLS formulation

in part (b) is computationally expensive when n is large. Alternatively, we could use the

Gauss-Seidel algorithm but it converges slowly, particularly for larger values of λ. One

possible alternatively is a randomized modification of the Gauss-Seidel algorithm, which at

each stage solves a p( n) variable least squares problem.

The basic algorithm is as follows:

0. Initialize bθ.

1. Randomly sample a subset w of size p from the integers 1, · · · , n. Define Xw to be

the submatrix of X with column indices w and Xw¯ to be the submatrix of X with

column indices in the complement of w; define θw and θw¯ analogously so that Xθ =

Xwθw + Xw¯θw¯.

2. Define bθw to minimize

y

∗ − Xw¯

bθw¯ − Xwθw

with respect to θw.

2

3. Repeat steps 1 and 2 until convergence.

Show that the objective function is non-increasing from one iteration to the next.

(d) On Quercus, there is a function HP in a file HP.txt, which implements the randomized

block Gauss-Seidel algorithm outlined in part (c). This function can be used as follows

> r <- HP(x,lambda=2000,p=20,niter=100)

where lambda is the value of λ, p is the value of p, and niter specifies the number of

iterations of the algorithm. The output of the function (contained here in r) consists of two

components: the estimates of {θ} in r$xhat and the values of the objective function for each

iteration in r$ss.

Use this function (with λ = 2000) on data on monthly yields on short-term British securities

over a 21 year period (252 months). Try out various values of p between 5 and 50 (using

1000 iterations). Describe how the objective function decreases with each iteration as p

varies between 5 and 50.

(e) (Optional) Methods such as the randomized block Gauss-Seidel algorithm are essential

in problems where the number of unknown parameters is extremely large. The goal is not to

minimize the objective function but merely to find a solution that’s close to the minimum.

In the context of the randomized block Gauss-Seidel algorithm, what factors should you

consider in choosing p, the numbers of parameters being optimized at each step? Keep in

mind that for least squares, the number of floating point operations needed increases with p

like p

2

.

3

Supplemental problems (not to hand in):

3. To generate random variables from some distributions, we need to generate two independent two independent random variables Y and V where Y has a uniform distribution over

some finite set and V has a uniform distribution on [0, 1]. It turns out that Y and V can be

generated from a single Unif(0, 1) random variable U.

(a) Suppose for simplicity that the finite set is {0, 1, · · · , n − 1} for some integer n ≥ 2. For

U ∼ Unif(0, 1), define

Y = bnUc and V = nU − Y

where bxc is the integer part of x. Show that Y has a uniform distribution on the set

{0, 1, · · · , n − 1}, V has a uniform distribution on [0, 1], and Y and V are independent.

(b) What happens to the precision of V defined in part (a) as n increases? (For example, if

U has 16 decimal digits and n = 106

, how many decimal digits will V have?) Is the method

in part (a) particularly feasible if n is very large?

4. One issue with rejection sampling is a lack of efficiency due to the rejection of random

variables generated from the proposal density. An alternative is the acceptance-complement

(A-C) method described below.

Suppose we want to generate a continuous random variable from a density f(x) and that

f(x) = f1(x) + f2(x) (where both f1 and f2 are non-negative) where f1(x) ≤ g(x) for some

density function g. Then the A-C method works as follows:

1. Generate two independent random variables U ∼ Unif(0, 1) and X with density g.

2. If U ≤ f1(X)/g(X) then return X.

3. Otherwise (that is, if U > f1(X)/g(X)) generate X from the density

f

∗

2

(x) = f2(x)

Z ∞

−∞

f2(t) dt

.

Note that we must be able to easily sample from g and f

∗

2

in order for the A-C method to

be efficient; in some cases, they can both be taken to be uniform distributions.

(a) Show that the A-C method generates a random variable with a density f. What is the

probability that the X generated in step 1 (from g) is “rejected” in step 2?

4

(b) Suppose we want to sample from the truncated Cauchy density

f(x) = 2

π(1 + x

2

)

(−1 ≤ x ≤ 1)

using the A-C method with f2(x) = k, a constant, for −1 ≤ x ≤ 1 (so that f

∗

2

(x) = 1/2 is a

uniform density on [−1, 1]) with

f1(x) = f(x) − f2(x) = f(x) − k (−1 ≤ x ≤ 1).

If g(x) is also a uniform density on [−1, 1] for what range of values of k can the A-C method

be applied?

(c) Defining f1, f2, and g as in part (b), what value of k minimizes the probability that X

generated in step 1 of the A-C algorithm is rejected?

5. Suppose we want to generate a random variable X from the tail of a standard normal

distribution, that is, a normal distribution conditioned to be greater than some constant b.

The density in question is

f(x) = exp(−x

2/2)

√

2π(1 − Φ(b))

for x ≥ b

with f(x) = 0 for x < b where Φ(x) is the standard normal distribution function. Consider

rejection sampling using the shifted exponential proposal density

g(x) = b exp(−b(x − b)) for x ≥ b.

(This proposal density is used by the Monty Python algorithm to sample from the tail of

the normal distribution.)

(a) Define Y be an exponential random variable with mean 1 and U be a uniform random

variable on [0, 1] independent of Y . Show that the rejection sampling scheme defines X =

b + Y/b if

−2 ln(U) ≥

Y

2

b

2

.

(Hint: Note that b + Y/b has density g.)

(b) Show the probability of acceptance is given by

√

2πb(1 − Φ(b))

exp(−b

2/2) .

What happens to this probability for large values of b? (Hint: You need to evaluate M =

max f(x)/g(x).)

5

(c) Suppose we replace the proposal density g defined above by

gλ(x) = λ exp(−λ(x − b)) for x ≥ b.

(Note that gλ is also a shifted exponential density.) What value of λ maximizes the probability of acceptance? (Hint: Note that you are trying to solve the problem

min

λ>0

max

x≥b

f(x)

gλ(x)

for λ. Because the density gλ(x) has heavier tails, the minimax problem above will have the

same solution as the maximin problem

max

x≥b

min

λ>0

f(x)

gλ(x)

which may be easier to solve.)

6. (a) Suppose that E1, E2, · · · are independent Exponential random variables with density

f(x) = λ exp(−λx) for x ≥ 0. Then the distribution of Tk = E1 + · · · + Ek is a Gamma

distribution whose density is

gk(x) = λ

kx

k−1

exp(−λx)

(k − 1)! for x ≥ 0.

Show that

P(Tk ≥ 1) = Z ∞

1

gk(x) dx =

k

X−1

j=0

λ

j

exp(−λ)

j!

(Hint: Use integration by parts.)

(b) How might you use the result of part (a) to generate random variables from a Poisson

distribution with mean λ?

6

## Reviews

There are no reviews yet.