HW2 Regression Fun Time!




5/5 - (2 votes)

HW2 Regression Fun Time!

Please run the cell below to import libraries needed for this HW. Please use the autograd numpy, otherwise you will have issues.

import autograd.numpy as np
from autograd import grad
import matplotlib.pyplot as plt
Q1) [10 points] Implement the linear regression model discussed in class below. The function (called model) takes in an array of data points, x , and an array of weights, w, and returns a vector y_predicted containing the linear combination for each of the data points. This is the linear model we have discussed in class. You can assume each data point in x only has one feature. The length of the vector being returned should be the same as x.

def model(x,w):
y_predicted =,w[1:])+w[0]
return y_predicted
Q2) [10 ponts] Implement the least squares function discussed in class below. The function should take in an array of weights, w, an array of x’s and an array of ys. It should use the model function implemented above and return a float indicating the total cost.

def least_squares(w,x,y):
cost = np.sum((model(x,w)-y)**2)/float(y.size)
return cost
Q3) [5 point] This one is a freebie from HW1. Copy and paste your gradient descent function here. Specifically, the one that takes in the cost function as input and return the weight and cost history. We will be using a fixed alpha for this HW. The only difference is that this function should now also take in as input an array of x and ys, corresponding to our data. The w,x, and y are given as inputs to the cost funtion and its gradient.

from autograd import grad

def gradient_descent(g,alpha,max_its,w,x,y):
gradient = grad(g)   ## This is how you use the autograd library to find the gradient of a function
weight_history = [w]
starting_cost = g(w,x,y)
cost_history = [starting_cost]
for i in range(1,max_its):
# define a new and previous value using a step
step = gradient(w,x,y)
w = w – alpha*step
curr_cost = g(w,x,y)

return weight_history,cost_history
Q4) [1 points] Run the code below to import a dataset. Then, plot a scatter plot of the data (x vs y).

# import the dataset
import matplotlib.pyplot as plt
csvname = ‘kleibers_law_data.csv’
data = np.loadtxt(csvname,delimiter=’,’)
x = np.log(data[:-1,:])
y = np.log(data[-1:,:])
<matplotlib.collections.PathCollection at 0x7f8eb99d0ca0>

Q5) [5 points] Use your gradient descent function to learn a linear regression model for the x and y above using the following parameters and plot the cost_history over the 1000 iterations

g = least_squares function you implemented

w = [w_0, w_1] , where w_0 and w_1 are random numbers between -0.1 and 0.1



w = np.random.uniform(-0.1, 0.1, 2)
gd_results = gradient_descent(least_squares,0.01,1000,w,x,y)

# plot cost over 1000
[<matplotlib.lines.Line2D at 0x7f8eb9bbd1c0>]

Q6) [5 points] Use the learned weights from above (note that the “learned” weights are the ones with the lowest cost) to plot the learned line. You can use the linspace method (shown below) to generate a list of xs that you can use for plotting. You need to generate a y for each of the candidate xs using the learned weights. On the same figure, also plot the scatter plot from Q4.

s = np.linspace(np.min(x),np.max(x))
plt.scatter(x,y) # from Q4

# get y for each x using learned weights …?
x2 = s
y2 = (gd_results[0][-1][1]*x2) + gd_results[0][-1][0]# mx+b

Q7) [1 points] Run the code below to import a dataset. Then, plot a scatter plot of the data (x vs y).

# load in dataset
data = np.loadtxt(‘regression_outliers.csv’,delimiter = ‘,’)
x = data[:-1,:]
y = data[-1:,:]
<matplotlib.collections.PathCollection at 0x7f8eb9c42490>

Q8) [10 ponts] Implement the least absolute deviations function discussed in class. The function should take in an array of weights, w, an array of x’s and an array of ys. It should use the model function implemented in Q1 and return a float indicating the total cost.

def least_absolute_deviations(w,x,y):
cost = np.sum(np.absolute(model(x,w)-y))/float(y.size)
return cost
Q9) [5 points] Use the gradient descent function twice to learn two linear models using the new x and y from Q7 using the following parameters and plot the cost_history for both runs on the same plot. Make the plot for the first run blue and the plot for the second run red.

Run 1) g = least_squares function

w = [1.0,1.0]



Run 2) g = least_absoulte_deviations

w = [1.0,1.0]



w = np.array([1.0,1.0])

grad_des_lsf = gradient_descent(least_squares,0.1,100,w,x,y)

grad_des_lad = gradient_descent(least_absolute_deviations,0.1,100,w,x,y)

[<matplotlib.lines.Line2D at 0x7f8eb9c75790>]

Q10) [5 points] Use the learned weights from above to plot the two learned lines (use same colors as above). You can use the linspace method again to generate a list of xs that you can use. On the same figure, also plot the scatter plot from Q7. Which of these lines look like a better fit to you? The green (least absolute deviations) is a better fit.

s = np.linspace(np.min(x),np.max(x))
x_l1 = s
y_l1 = (grad_des_lsf[0][-1][1]*x2) + grad_des_lsf[0][-1][0]# mx+b

x_l2 = s
y_l2 = (grad_des_lad[0][-1][1]*x2) + grad_des_lad[0][-1][0]# mx+b
[<matplotlib.lines.Line2D at 0x7fddb1b46880>]

Q11) [6 points] Implement the mean squared error (MSE) and the mean absolute deviation functions from class. The functions should take in as input an array of actual ys and an array of predicted ys and return the prediction error.

def MSE(y_actual,y_pred):
error = np.sum((y_pred-y_actual)**2)/float(y_actual.size)
return error

def MAD(y_actual,y_pred):
error = np.sum(np.absolute(y_pred-y_actual))/float(y_actual.size)
return error
Q12) [4 points] Use the functions above to report the MSE and MAD for the two models learned in Q9, using the x and y from Q7. You should have 4 values total, two for each model. Which model is doing better? (Note that since you are evaluating the model on the training data, this corresponds to the training error) It looks like the MAD for LAD has the lowest error value and the values for both MSE and MAD do better for LAD

# what are the predicted values?
y_predicted_lsf = (model(x,w)-y)**2
y_predicted_lad = np.absolute(model(x,w)-y)

# calculate the values
mse_lsf = MSE(y,y_pred_lsf)
mad_lsf = MAD(y,y_pred_lsf)

mse_lad = MSE(y,y_pred_lad)
mad_lad = MAD(y,y_pred_lad)

print(“LSF: MSE = ” + str(mse_lsf) + ” MAD = ” + str(mad_lsf))
print(“LAD: MSE = ” + str(mse_lad) + ” MAD = ” + str(mad_lad))
LSF: MSE = 119.69904205469814 MAD = 10.359401368163274
LAD: MSE = 20.77148581018281 MAD = 4.377107611740571
Q13) [6 points] Implement the L1 and L2 regularizers from class. Recall the regularizers take in input the weight vector and return a score based on the L1 or L2 norm of the weights

def L2_regularizer(w):
L2 = np.sum(w**2)
return L2

def L1_regularizer(w):
L1 = np.sum(w)
return L1
Q14) [12 points] Turn the least squares function implemented in Q2 into the Ridge (L2) and Lasso (L1) least squares (covered in class) using the functions implemented in Q13. Recall that  λ  is used as a hyperparameter to specify the smoothness of the function learned (higher  λ  leads to simpler and smoother functions whereas lower  λ  leads to better fitting to the data.  λ=0  is the same as non-regularized least-squares)

def ridge(w,x,y,lmbda):
data = least_squares(w,x,y)
L2 = L2_regularizer(w)
cost = data + (lmbda*L2)
return cost

def lasso(w,x,y,lmbda):
data = least_squares(w,x,y)
L1 = L1_regularizer(w)
cost = data + (lmbda*L1)
return cost
The rest of the questions are for bonus points, but highly recommended
Q15) [2 points] The file ‘weatherHistory.csv’has 96,454 lines, each one corresponding to a data point. Each row (i.e., data point), has several columns. Read the data file. Note that the first line is the header describing each column.

import csv
file = open(“weatherHistory.csv”)
csvreader = csv.reader(file)
header = next(csvreader)
print(header) # will use to determine proper indexing
rows = [] # will use later
for row in csvreader:
# print(rows)
[‘Date’, ‘Summary’, ‘Precip’, ‘Temperature’, ‘Apparent_Temperature’, ‘Humidity’, ‘Wind_Speed’, ‘Wind_Bearing’, ‘Visibility’, ‘Loud_Cover’, ‘Pressure’, ‘Daily_Summary’]
Q16) [5 points] Use the data above to set y to be the temperatures and X to be the following columns (in order): [Apparent_Temperature, Humidity, Wind_Speed, Wind_Bearing, Visibility, Pressure] Basically, we want to see whether we can predict the temperature, using the features in X.

y = []
x = [[]]
for row in rows:
x_sub = [row[4],row[5],row[6],row[7],row[8],row[10]]
We are now going to using a well-known ML library called sklearn. If you do not have it installed, please do so using this instruction:

sklearn comes with many models already implemented, below we import the standard linear regression, Ridge, and Lasso models from sklearn. We also import a method that can divide our data into train/test sets. Please run the cell below.

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
This library is very easy to use. We briefly went over it in the class but please use the API and user guide ( to learn exactly how to use this library.

For instance, learning a linear regression model using sklearn can be done in two lines:

linearModel = LinearRegression(), y_train)

Q17) [2 points] Use the train_test_split to divide your modified data from Q16 into 80% train, 20% test.

#Your code here
Q18) [10 points] Use sklearn to train a LinearRegression model using the data above. Report the performance of the model on the test data (use sklearn’s MSE implementation: Note that the .predict method can be used to get the y_predictions for the test xs.

from sklearn.metrics import mean_squared_error
#Your code here
Q19) [10 points] Repeat Q18 but instead of LinearRegression, use the Ridge and Lasso functions. You can keep the default alpha (note that what we called lambda in the class, the hyperparameter for regularization, is called alpha in sklearn. It is the same thing).

#Your code here
Q20) [4 points] Print the learned parameters for the Ridge and Lasso models (using .coef_). Note that the parameters below correspond to the feature vector ( [Apparent_Temperature, Humidity, Wind_Speed, Wind_Bearing, Visibility, Pressure]), in order. I.e., the first value corresponds to “Apparent_Temperature”, etc. What is the difference between the ridge and lasso parameters? Which features, if any, have been eliminated by lasso?

#Your code here