Assignment #4 STA410H1F/2102H1F

1. Consider the model

Yi = θi + εi (i = 1, · · · , n)

where {εi} is a sequence of random variables with mean 0 and finite variance representing

noise. We will assume that θ1, · · · , θn are dependent or “smooth” in the sense that the

absolute differences {|θi −θi−1|} are small for most values of i. Rather than penalizing a lack

of smoothness by P

i

(θi − 2θi−1 + θi−2)

2

(as in Assignment #2), we will consider estimating

{θi} given data {yi} by minimizing

Xn

i=1

(yi − θi)

2 + λ

Xn

i=2

|θi − θi−1| (1)

where λ > 0 is a tuning parameter and Xn

i=2

|θi − θi−1| represents the total variation of {θi}.

The resulting estimates bθ1, · · · ,

bθn, are sometimes called fusion estimates and are useful if

{θi} contain “jumps”, that is, θi = g(i/n) where g is a smooth function with a small number

of discontinuities (i.e. jumps).

The non-differentiable part of the objective function in (1) can be made separable by defining

φi = θi − θi−1 for i = 2, · · · , n and then minimizing

Xn

i=1

(yi − θi)

2 + λ

Xn

i=2

|φi

| (2)

where now each θi (for i = 2, · · · , n) will be a function of θ1, φ2, · · · , φi

. The representation of

the objective function in (2) can be used to compute the parameter estimates using coordinate

descent although there is must faster algorithm. However, (2) is useful for deriving properties

of the estimates.

(a) Show that θk = θ1 +

Pk

i=2 φi

for k ≥ 2.

(b) Show that if bθ1, · · · ,

bθn minimize (1) (or (2)) then

Xn

i=1

(yi − bθi) = 0.

(Hint: Use the representation (2) and compute its partial derivative with respect to θ1.)

1

(c) Show that |yi − bθi

| ≤ λ for all i. (Hint: Show that

∂

(

λ

Xn

i=2

|θi − θi−1|

)

⊂ [−2λ, 2λ]

n

for any θ1, · · · , θn.)

(d) For λ sufficiently large, we will have bθ1 = · · · = bθn = ¯y or equivalently bφ2 = · · · = bφn = 0.

How large must λ be in order to have bθ1 = · · · = bθn = ¯y? (Hint: Look at the sub-gradient

of (2) with respect to (θ1, φ2, · · · , φn); when is (0, 0, · · · , 0) an element of this sub-gradient at

(¯y, 0, · · · , 0)?)

Note: An R function tvsmooth is available on Quercus in a file tvsmooth.txt. You may

find it useful to simulate data from a discontinous function with additive noise and estimate

the function using tvsmooth to gain some insight into this method.

2. Suppose that X1, · · · , Xn are sampled from the following truncated Poisson distribution:

Pλ(Xi = x) = exp(−λ)λ

x

x!κλ(r)

for x = r + 1, r + 2, · · ·

for some integer r ≥ 0 where

κλ(r) = X∞

x=r+1

exp(−λ)λ

x

x!

= 1 −

Xr

x=0

exp(−λ)λ

x

x!

.

Such a sample might arise if we were sampling from a Poisson population but were unable

to observe data less than or equal to r.

The EM algorithm can be employed to estimate λ from the observed X1, · · · , Xn. The

key is to think of the observed data as a subset of some larger (“complete”) data set

X1, · · · , Xn, Xn+1, · · · , Xn+M where M ≥ 0 is a random variable and Xn+1, · · · , Xn+M ≤ r;

given M = m, this complete data set is now assumed to be m + n independent observations

from a Poisson distribution with mean λ. The log-likelihood for the complete data is

lnL(λ) = ln(λ)

nX

+m

i=1

xi − (n + m)λ,

which depends on two unknowns

nX

+m

i=n+1

xi and m. To use the EM algorithm, we need to

estimate these two unknowns.

(a) The probability distribution of M is

Pλ(M = m) =

n + m − 1

m

!

(1 − κλ(r))mκλ(r)

n

for m = 0, 1, 2, · · ·

2

Show that Eλ(M) = n(1 − κλ(r))/κλ(r).

(b) Show that

Eλ

nX

+M

i=n+1

Xi

X1 = x1, · · · , Xn = xn

= Eλ(M)Eλ(Xi

|Xi ≤ r).

(Hint: Note that (a) Xn+1, · · · , Xn+M are independent of X1, · · · , Xn and (b) Xn+1, · · · , Xn+M ≤

r.)

(c) Consider the data given in the table below. They represent the accident claims submitted

to La Royale Belge Insurance Company during a single year. A crude model for the number

of claims submitted for a given policy is Poisson. However, the data below does not provide

the number of policies for which no claims were submitted. We want to estimate λ as well

as to impute (estimate) the number M of policies with no claims.

Number of claims 1 2 3 4 5 6 7

Number of policies 1317 239 42 14 4 4 1

Assume a truncated Poisson model for these data taking r = 0 and estimate λ as well as M

using the EM algorithm (which in this case has a particularly simple form). Do you think the

truncated Poisson model is useful for these data? (For example, do you think your estimate

of M is reasonable?)

3

## Reviews

There are no reviews yet.