Sale!

# COSC 4570/5010 Data Mining Homework #4

\$30.00

Category:

## Description

Page 1 of 2
COSC 4570/5010 Data Mining

Homework #4

Submission guideline You need to submit only one .zip file. Please name the file as “Your Net
id_Homework4.zip”.
1. Problems from the book (Introduction to Data Mining 2nd Edition by Tan,
Steinbach et al.)
Solve the following:
Chapter 7: Problems 18 and 29.
OR
Problems from the book (Introduction to Data Mining 1st Edition by Tan,
Steinbach et al.)
Solve the following:
Chapter 8: Problems 18 and 29.
2. Clustering
Normalized Mutual Information (NMI) is used to evaluate clustering results when the actual
clustering of the data (the number of clusters and the clustering assignments) is known beforehand. NMI can be calculated using Equation 1, where � and ℎ are clusters from two different
clusterings, �\$ and �% are the number of data points in the clusters ℎ and � respectively, �\$,% is
the number of data points in cluster ℎ as well as cluster �, and � is the size of the dataset.
��� = ∑ -.,/ log3 3.,/
3.3/ .,/
45∑ -. log3.
. 3 6 45∑ -/ log3/
/ 3 6
(1)
what are the maximum and minimum values for the NMI? Provide details.
Page 2 of 2
3. Community Detection
[DSCN]1 or from here http://www-personal.umich.edu/~mejn/netdata/karate.zip). The story
behind the data set is quite simple: There was a Karate Club that had an administrator “John A”
and an instructor “Mr. Hi” (both pseudonyms). Then a conflict arose between them, causing the
students (Nodes) to split into two groups. One that followed John and one that followed Mr. Hi.
a) Compute the following statistics describing the datasets:
• number of nodes �
• number of edges �