COMP-424: Artificial intelligence

Homework 3

General instructions.

Submit a single pdf document containing all your pages of your written solution on your McGill’s myCourses

account. You can scan-in hand-written pages. If necessary, learn how to combine many pdf files into one.

You may solve the questions by hand or by writing a program, but if you write a program, you must not rely

on existing implementations, and must do it from scratch (and must submit your code along with the pdf).

Question 1: Designing a Bayesian Network

(Reproduced from Russell & Norvig, Exercise 14.11, pg. 561)

In your local nuclear power station, there is an alarm that senses when a temperature gauge exceeds a given

threshold. The gauge measures the temperature of the core. Consider the Boolean variables: 𝐴 (alarm

sounds), 𝐹𝐴 (alarm is faulty), and 𝐹𝐺 (gauge is faulty), and the multivalued nodes 𝐺 (gauge reading) and 𝑇

(actual core temperature).

a. Draw a Bayesian network for this domain, given that the gauge is more likely to fail when the core

temperature gets too high.

b. Is your network a polytree? Why or why not?

c. Suppose there are just two possible actual and measured temperatures, normal and high; the probability

that the gauge gives the correct temperature is x when it is working, but y when it is faulty. Give the

conditional probability table associated with G.

d. Suppose the alarm works correctly unless it is faulty, in which case it never sounds. Give the conditional

probability table associated with A.

e. Suppose the alarm and gauge are working and the alarm sounds. Calculate an expression for the

probability that the temperature of the core is too high, in terms of the various conditional probabilities

in the network.

Question 2: Inference in Bayesian Networks

Consider the following Bayesian Network

R: Rush Hour

B: Bad Weather

A: Accident

T: Traffic Jam

S: Sirens

We will denote random variables with capital letters (e.g., R), and the binary outcomes with lowercase

letters (e.g., r, and ¬r).

The network has the following parameters:

P(b)=0.2

P(r)=0.15

P(t|r, b,a) = 0.95

P(t|r, ¬b,a) = 0.92

P(t|r,b, ¬a) = 0.90

P(t|r, ¬b, ¬a) = 0.85

P(t|¬r, b, a) = 0.35

P(t|¬r, b, ¬a) = 0.4

P(t|¬r, ¬b, a) = 0.6

P(t|¬r, ¬b, ¬a) = 0.05

P(s|a) = 0.8

P(s|¬a) = 0.2

P(a|b) = 0.6

P(a|¬b) = 0.4

Compute the following terms using basic axioms of probability and the conditional independency properties

encoded in the above graph. You can use Bayes Ball properties to simplify the computation, if applicable.

a. P(s, r)

b. P(a, ¬t)

c. P(t|s)

Question 3: Variable Elimination

For the graph above, compute the MAP result of querying P(T|a) using variable elimination with the

following order: R,S,B,A,T.

Clearly explain each step. For each of the intermediate factors created, explain what probabilistic function it

represents.

Question 4: Learning with Bayesian Networks

Consider the following Bayesian network. Assume that the variables are distributed according to Bernoulli

distributions.

a. We are given the following dataset with 146 samples, from which we will estimate the parameters of the

model.

A B C D # Instances

0 0 0 0 1

0 0 0 1 4

0 0 1 0 20

0 0 1 1 4

0 1 0 0 34

0 1 0 1 2

0 1 1 0 32

0 1 1 1 0

1 0 0 0 1

1 0 0 1 0

1 0 1 0 1

1 0 1 1 4

1 1 0 0 10

1 1 0 1 19

1 1 1 0 14

1 1 1 1 0

i. Enumerate the parameters that must be learned. Specify the parameter name and the probability

that it represents (i.e., for each parameter, write something in the form, 𝜃𝑋 = Pr(𝑋).

ii. Give the maximum likelihood estimate for each parameter.

iii. Give the maximum a posterior estimate for each parameter after applying Laplace smoothing.

b. Assume that in addition to the data in the table above, you are given the following incomplete data

instances:

A B C D

S1 1 0 1 ?

S2 1 1 ? 0

We will apply the (soft) EM algorithm on these instances. Initialize the model using your parameter

estimates from part a subpart ii (i.e., use the MLE).

i. Show the computation of the first E-step, providing the weights for each possible assignment of

the incomplete data for each sample.

ii. What are the parameters obtained for the first M-step? Weight each of the samples from the

original dataset and the two new samples equally (i.e., you now have 148 samples).

iii. Show the computation of the second E-step.