HOMEWORK 1

Instructions: This is a background self-test on the type of math we will encounter in class. If you find many

questions intimidating, we suggest you drop 760 and take it again in the future when you are more prepared.

Use this latex file as a template to develop your homework. Submit your homework on time as a single pdf file

to Canvas. There is no need to submit the latex source or any code. Please check Piazza for updates about the

homework.

1 Vectors and Matrices [6 pts]

Consider the matrix X and the vectors y and z below:

X =

3 2

−7 −5

y =

2

1

z =

1

−1

1. Computer y

T Xz

Solution goes here.

2. Is X invertible? If so, give the inverse, and if no, explain why not.

Solution goes here.

2 Calculus [3 pts]

1. If y = e

−x + arctan(z)x

6/z − ln

x

x + 1

, what is the partial derivative of y with respect to x?

Solution goes here.

3 Probability and Statistics [10 pts]

Consider a sequence of data S = (1, 1, 1, 0, 1) created by flipping a coin x five times, where 0 denotes that the

coin turned up heads and 1 denotes that it turned up tails.

1. (2.5 pts) What is the probability of observing this data, assuming it was generated by flipping a biased coin

with p(x = 1) = 0.6?

Solution goes here.

2. (2.5 pts) Note that the probability of this data sample could be greater if the value of p(x = 1) was not 0.6,

but instead some other value. What is the value that maximizes the probability of S? Please justify your

answer.

Solution goes here.

3. (5 pts) Consider the following joint probability table where both A and B are binary random variables:

A B P(A, B)

0 0 0.3

0 1 0.1

1 0 0.1

1 1 0.5

1

Homework 1 CS 760 Machine Learning

(a) What is P(A = 0|B = 1)?

Solution goes here.

(b) What is P(A = 1 ∨ B = 1)?

Solution goes here.

4 Big-O Notation [6 pts]

For each pair (f, g) of functions below, list which of the following are true: f(n) = O(g(n)), g(n) = O(f(n)),

both, or neither. Briefly justify your answers.

1. f(n) = ln(n), g(n) = log2

(n).

Solution goes here.

2. f(n) = log2

log2

(n), g(n) = log2

(n).

Solution goes here.

3. f(n) = n!, g(n) = 2n.

Solution goes here.

5 Probability and Random Variables

5.1 Probability [12.5 pts]

State true or false. Here Ω denotes the sample space and Ac denotes the complement of the event A.

1. For any A, B ⊆ Ω, P(A|B)P(A) = P(B|A)P(B).

Solution goes here.

2. For any A, B ⊆ Ω, P(A ∪ B) = P(A) + P(B) − P(B ∩ A).

Solution goes here.

3. For any A, B, C ⊆ Ω such that P(B ∪ C) > 0,

P (A∪B∪C)

P (B∪C) ≥ P(A|B ∪ C)P(B).

Solution goes here.

4. For any A, B ⊆ Ω such that P(B) > 0, P(Ac

) > 0, P(B|AC ) + P(B|A) = 1.

Solution goes here.

5. If A and B are independent events, then Ac

and Bc

are independent.

Solution goes here.

5.2 Discrete and Continuous Distributions [12.5 pts]

Match the distribution name to its probability density / mass function. Below, |x| = k.

(a) Gamma Solution goes here.

(b) Multinomial Solution goes here.

(c) Laplace Solution goes here.

(d) Poisson Solution goes here.

(e) Dirichlet Solution goes here.

(f) f(x; Σ, µ) = √

1

(2π)

kdet(Σ)

exp

−

1

2

(x − µ)

T Σ

−1

(x − µ)

(g) f(x; n, α) = n

x

α

x

(1 − α)

n−x

for x ∈ {0, . . . , n}; 0

otherwise

(h) f(x; b, µ) = 1

2b

exp

−

|x−µ|

b

(i) f(x; n, α) = n!

Πk

i=1xi!

Πk

i=1α

xi

i

for xi ∈ {0, . . . , n} and

Pk

i=1 xi = n; 0 otherwise

(j) f(x; α, β) = β

α

Γ(α)

x

α−1

e

−βx for x ∈ (0, +∞); 0 otherwise

(k) f(x; α) = Γ(Pk

i=1 αi)

Qk

i=1 Γ(αi)

Qk

i=1 x

αi−1

i

for xi ∈ (0, 1) and

Pk

i=1 xi = 1; 0 otherwise

(l) f(x; λ) = λ

x e

−λ

x!

for all x ∈ Z

+; 0 otherwise

Homework 1 CS 760 Machine Learning

5.3 Mean and Variance [10 pts]

1. Consider a random variable which follows a Binomial distribution: X ∼ Binomial(n, p).

(a) What is the mean of the random variable?

Solution goes here.

(b) What is the variance of the random variable?

Solution goes here.

2. Let X be a random variable and E[X] = 1, Var(X) = 1. Compute the following values:

(a) E[5X]

Solution goes here.

(b) Var(5X)

Solution goes here.

(c) Var(X + 5)

Solution goes here.

5.4 Mutual and Conditional Independence [12 pts]

1. (3 pts) If X and Y are independent random variables, show that E[XY ] = E[X]E[Y ].

Solution goes here.

2. (3 pts) If X and Y are independent random variables, show that Var(X + Y ) = Var(X) + Var(Y ).

Hint: Var(X + Y ) = Var(X) + 2Cov(X, Y ) + Var(Y )

Solution goes here.

3. (6 pts) If we roll two dice that behave independently of each other, will the result of the first die tell us

something about the result of the second die?

Solution goes here.

If, however, the first die’s result is a 1, and someone tells you about a third event — that the sum of the two

results is even — then given this information is the result of the second die independent of the first die?

Solution goes here.

5.5 Central Limit Theorem [3 pts]

Prove the following result.

1. Let Xi ∼ N (0, 1) and X¯ =

1

n

Pn

i=1 Xi

, then the distribution of X¯ satisfies

√

nX¯

n→∞ −→ N (0, 1)

Solution goes here.

6 Linear algebra

6.1 Norms [5 pts]

Draw the regions corresponding to vectors x ∈ R

2 with the following norms:

1. ||x||1 ≤ 1 (Recall that ||x||1 =

P

i

|xi

|)

Solution figure goes here.

2. ||x||2 ≤ 1 (Recall that ||x||2 =

pP

i

x

2

i

) Solution figure goes here.

3. ||x||∞ ≤ 1 (Recall that ||x||∞ = maxi

|xi

|) Solution figure goes here.

3

Homework 1 CS 760 Machine Learning

For M =

5 0 0

0 7 0

0 0 3

, Calculate the following norms.

4. ||M||2 (L2 norm)

Solution goes here.

5. ||M||F (Frobenius norm)

Solution goes here.

6.2 Geometry [10 pts]

Prove the following. Provide all steps.

1. The smallest Euclidean distance from the origin to some point x in the hyperplane wT x + b = 0 is |b|

||w||2

.

You may assume w 6= 0.

Solution goes here.

2. The Euclidean distance between two parallel hyperplane wT x+b1 = 0 and wT x+b2 = 0 is |b1−b2|

||w||2

(Hint:

you can use the result from the last question to help you prove this one).

Solution goes here.

7 Programming Skills [10 pts]

Sampling from a distribution. For each question, submit a scatter plot (you will have 2 plots in total). Make sure

the axes for all plots have the same ranges.

1. Make a scatter plot by drawing 100 items from a two dimensional Gaussian N((1, −1)T

, 2I), where I is an

identity matrix in R

2×2

.

Solution figure goes here.

2. Make a scatter plot by drawing 100 items from a mixture distribution 0.3N

(5, 0)T

,

1 0.25

0.25 1 +

0.7N

(−5, 0)T

,

1 −0.25

−0.25 1 .

Solution figure goes here.

4