EECS 126: Probability and Random Processes

Problem Set 11

1. Midterm

Solve all of the problems on the midterm again (including the ones you got correct).

2. Compression of a Random Source

Suppose I’m trying to send a text message to my friend. In general, I know I need log2

(26)

bits for every letter I want to send because there are 26 letters in the alphabet. However, it

turns out if I have some information on the distribution of the letters, I can do better. For

example, I might give the letter e a shorter bit representation because I know it’s the most

common. Actually, it turns out the number of bits I need on average is the entropy, and in

this problem, we try to show why this is true in general.

Let (Xi)∞

i=1

i.i.d. ∼ p(·), where p is a discrete PMF on a finite set X . We know the entropy of a

random variable X is

H(X) = −

X

x∈X

p(x) log2 p(x)

Since entropy is really a function of the distribution, we could write the entropy as H(p).

(a) Show that

−

1

n

log2 p(X1, . . . , Xn)

n→∞ −−−→ H(X1) almost surely.

(Here, we are extending the notation p(·) to denote the joint PMF of (X1, . . . , Xn):

p(x1, . . . , xn) := p(x1)· · · p(xn).)

(b) Fix > 0 and define A

(n)

as the set of all sequences (x1, . . . , xn) ∈ X n

such that:

2

−n(H(X1)+) ≤ p(x1, . . . , xn) ≤ 2

−n(H(X1)−)

.

Show that P((X1, . . . , Xn) ∈ A

(n)

) > 1 − for all n sufficiently large. Consequently,

A

(n)

is called the typical set because the observed sequences lie within A

(n)

with high

probability.

(c) Show that (1 − )2n(H(X1)−) ≤ |A

(n)

| ≤ 2

n(H(X1)+)

, for n sufficiently large. Use the

union bound.

Parts (b) and (c) are called the asymptotic equipartition property (AEP) because

they say that there are ≈ 2

nH(X1) observed sequences which each have probability

≈ 2

−nH(X1)

. Thus, by discarding the sequences outside of A

(n)

, we need only keep track

of 2nH(X1)

sequences, which means that a length-n sequence can be compressed into

≈ nH(X1) bits, requiring H(X1) bits per symbol.

1

(d) Now show that for any δ > 0 and any positive integer n, if Bn ⊆ X n

is a set with

|Bn| ≤ 2

n(H(X1)−δ)

, then P((X1, . . . , Xn) ∈ Bn) → 0 as n → ∞.

This says that we cannot compress the observed sequences of length n into any set smaller

than size 2nH(X1)

.

[Hint: Consider the intersection of Bn and A

(n)

.]

(e) Next we turn towards using the AEP for compression. Recall that in order to encode

a set of size n in binary, it requires dlog2 ne bits. Therefore, a na¨ıve encoding requires

dlog2

|X |e bits per symbol.

From (b) and (d), if we use log2

|A

(n)

| ≈ nH(X1) bits to encode the sequences in A

(n)

,

ignoring all other sequences, then the probability of error with this encoding will tend

to 0 as n → ∞, and thus an asymptotically error-free encoding can be achieved using

H(X1) bits per symbol.

Alternatively, we can create an error-free code by using 1 + dlog2

|A

(n)

|e bits to encode

the sequences in A

(n)

and 1 + ndlog2

|X |e bits to encode other sequences, where the first

bit is used to indicate whether the sequence belongs in A

(n)

or not. Let Ln be the length

of the encoding of X1, . . . , Xn using this code; show that limn→∞ E[Ln]/n ≤ H(X1) + .

In other words, asymptotically, we can compress the sequence so that the number of bits

per symbol is arbitrary close to the entropy.

2

## Reviews

There are no reviews yet.