Homework IV

Problem 1: Convex Functions (40 points)

Prove the following functions are convex.

• (10’) the log-sum-exponential function: f(x) = log Pn

i=1 exp(xi) is a convex function in terms

of x = (x1, . . . , xn)

⊤. (Hint: use the second order condition of convexity).

• (15’) the objective of logistic regression for binary classification:

f(w) = 1

n

Xn

i=1

log(1 + exp(−yiw⊤xi)) + λ

2

∥w∥

2

2

,

where xi ∈ R

d

, yi ∈ {1, −1}. (Hint: first prove the convexity of the function h(s) = log(1 +

exp(s)) and then use the rules that preserve convexity.)

• (15’) the objective of support vector machine: f(w) = 1

n

Pn

i=1 max(0, 1 − yiw⊤xi) + λ

2

∥w∥

2

2

,

where xi ∈ R

d

, yi ∈ {1, −1}. (Hint: similar as above).

Problem 2: the constrained version of Ridge Regression (30 points)

In class, we have mentioned the constrained version of ridge regression:

min

w∈Rd

∥Φw − y∥

2

2

s.t. ∥w∥2 ≤ s

where Φ ∈ R

n×d and y ∈ R

n

. Answer the following questions:

• (20’) Does strong duality hold? If yes, derive the KKT conditions regarding the optimal

solution w∗ to the above problem.

• (optional 10’) Does a close-formed solution exist? If yes, derive the closed-form solution. If

not, can you propose an algorithm for computing the optimal solution (describe the key steps

of your algorithm)?

1

Problem 3: The equivalence between the Maximum Entropy Model

and the Logistic Regression (40 points)

In class, we have talked about the maximum entropy model. For learning the posterior probabilities

Pr(y|x) = p(y|x) for y = 1, . . . , K given a set of training examples (xi

, yi), i = 1, . . . , n, we can

maximize the entropy of the posterior probabilities subject to a set of constraints, i.e.,

max

p(y|xi)

−

Xn

i=1

X

K

y=1

p(y|xi) ln p(y|xi)

s.t. X

K

y=1

p(y|xi) = 1

Xn

i=1

δ(y, yi)

n

fj (xi) = Xn

i=1

p(y|xi)

n

fj (xi), j = 1, . . . , d, y = 1, . . . , K

where δ(y, yi) is equal to 1 if yi = y, and 0 otherwise, and fj (xi) is a feature function. Let us

consider fj (xi) = [xi

]j , i.e., the j-th coordinate of xi

. Please show that the above Maximum

Entropy Model is equivalent to the multi-class logistic regression model (without regularization).

(Hint: use the Lagrangian dual theory)

2

CSCE 633 Machine Learning

# Homework IV- Convex Functions

Original price was: $35.00.$30.00Current price is: $30.00.