ENGR 421 DASC 521

Homework 04: Nonparametric Regression

In this homework, you will implement three nonparametric regression algorithms in R, Matlab,

or Python. Here are the steps you need to follow:

1. Read Section 8.8 from the textbook.

2. You are given a univariate regression data set, which contains 272 data points about the

duration of the eruption and waiting time between eruptions for the Old Faithful geyser in

Yellowstone National Park, Wyoming, USA (https://www.yellowstonepark.com/thingsto-do/about-old-faithful), in the file named hw04_data_set.csv.

3. Divide the data set into two parts by assigning the first 150 data points to the training set

and the remaining 122 data points to the test set.

4. Learn a regressogram by setting the bin width parameter to 0.37 and the origin parameter

to 1.5. Draw training data points, test data points, and your regressogram in the same

figure. Your figure should be similar to the following figure.

5. Calculate the root mean squared error (RMSE) of your regressogram for test data points.

The formula for RMSE can be written as

!∑ ($%&$’%) *+,-+ ) %./

0+,-+

.

Your output should be similar to the following sentence.

2 3 4 5

50

60

70

80

90

h = 0.37

Eruption time (min)

Waiting time to next eruption (min)

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

training

test

Regressogram = RMSE is 5.9626 when h is 0.37

6. Learn a running mean smoother by setting the bin width parameter to 0.37. Draw training

data points, test data points, and your running mean smoother in the same figure. Your

figure should be similar to the following figure.

7. Calculate the RMSE of your running mean smoother for test data points. Your output

should be similar to the following sentence.

Running Mean Smoother = RMSE is 6.0890 when h is 0.37

8. Learn a kernel smoother by setting the bin width parameter to 0.37. Draw training data

points, test data points, and your kernel smoother in the same figure. Your figure should

be similar to the following figure.

9. Calculate the RMSE of your kernel smoother for test data points. Your output should be

similar to the following sentence.

Kernel Smoother = RMSE is 5.8744 when h is 0.37

2 3 4 5

50

60

70

80

90

h = 0.37

Eruption time (min)

Waiting time to next eruption (min)

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

training

test

2 3 4 5

50

60

70

80

90

h = 0.37

Eruption time (min)

Waiting time to next eruption (min)

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

training

test

What to submit: You need to submit your source code in a single file (.R file if you are using R,

.m file if you are using Matlab, or .py file if you are using Python) and a short report explaining

your approach (.doc, .docx, or .pdf file). You will put these two files in a single zip file named as

STUDENTID.zip, where STUDENTID should be replaced with your 7-digit student number.

How to submit: Submit the zip file you created to Blackboard. Please follow the exact style

mentioned and do not send a zip file named as STUDENTID.zip.