## Description

Homework 5

Download and install the WEKA library from

http://www.cs.waikato.ac.nz/ml/weka/

The program is in Java, so it runs on any platform. Preferably download the kit that

includes the Java VM. If you have a 64 bit machine, download the 64bit version since

it can use more memory. In runweka.ini change the heap size to at least 1024mb

otherwise you will run out of memory. For the experiments, you could use the Weka

Explorer since it has a nice GUI.

1. Use the covtype dataset from Blackboard to compare a number of learning algorithms. Split the data into training and test sets as specified in the syllabus (training

set contains first 11,340 +3,780 observations, test set contains the remaining 565,892

observations). You will have to modify the files to make them compatible with Weka

as follows:

• Add a first row containing the variable names (e.g. X1, X2, … Y)

• Change the class labels from numeral (1,2,3,4…) to literal (e.g. C1, C2, C3…)

Train the following models on the training set and use the test set for testing. Report in

a table the obtained misclassification errors on the training and test sets and the training

times (in seconds) of all algorithms.

a) A decision tree (J48). (1 point).

b) A Random Forest with 100 trees and one with 300 trees. (1 point).

c) Logistic Regression. (1 point)

d) Naive Bayes. (1 point)

e) Adaboost with 20 weak classifiers that are J48 decision trees, and one with 100

trees. (1 point)

f) LogitBoost with 10 decision stumps, and one with 100 stumps. (1 point)

g) LogitBoost with 100 stumps and weight trimming (pruning) at 95%. (1 point)

h) LogitBoost with 25 M5P regression trees. (1 point)

i) An SVM classifier (named SMO in Weka). Use an RBF kernel and try different

parameters to obtain the smallest test error. Report the parameters that gave the

smallest test error. Note: You should be able to obtain one of the smallest errors

among all these methods. (1 point)

j) Using the miscalssification error table, draw a scatter plot of the test errors (Y)

vs log training times (seconds, on X axis) of all the algorithms from above. (1

point)