Sale!

Assignment 2: Association Analysis

Original price was: $35.00.Current price is: $30.00.

Category:
5/5 - (1 vote)

Assignment 2: Association Analysis

Two datasets (Market, Gene) can be found on Piazza. For each dataset, we provide the
transaction-item representation discussed in class—Each row denotes a transaction, and
each transaction consists of a set of items.
In this assignment, you are asked to implement Apriori algorithm that discovers a
collection of frequent itemsets from a transaction database.
Template is for Python 3. You are asked to fill in two functions: apriori_gen and get_freq.
apriori_gen generates new candidate itemsets based on the frequent itemsets found in the
previous iteration, and prunes the itemsets containing any infrequent itemset of size k-1.
get_freq returns the candidate itemsets that meet a minimum support threshold.
Do not change the input and output of the two functions in the template.
Please take the following steps:
1. Implement Apriori algorithm. The pseudo code can be found below.
Apriori (T, minSupport){
// T is the database and minSupport is the minimum support
𝐿1 = {the set of single item whose support is not less than minSupport}
for (k = 2; 𝐿𝑘−1 ≠ ∅; k++ ){
//generate and prune candidate set 𝐶𝑘
𝐶𝑘 is a list of itemsets in which each itemset is formed by merging two
itemsets in 𝐿𝑘−1 if their first k-2 items are identical
Remove an itemset from 𝐶𝑘 if any (k-1)-subset of this candidate itemset is
not in the frequent itemset list 𝐿𝑘−1
//Count the support of each candidate itemset
for each transaction t in database do{
for each candidate 𝑐 in 𝐶𝑘
// increment the support count of all candidate itemsets that are
contained in transaction t
if c is a subset of t then count[c] ← count[c] + 1
}
for each candidate 𝑐 in 𝐶𝑘
// Judge if a candidate itemset is frequent or not
if the support of c is not less than minSupport
then include c in 𝐿𝑘
}
return {𝐿1, 𝐿2, … , 𝐿𝑘}
}
You cannot directly call a function or package that implements Apriori. You need to
implement the algorithm by yourself. If you are not sure about whether it is OK to use a
certain function, please post your question on Piazza.
2. Apply your implemented Apriori on the Market dataset with minimum support
threshold=50%. You should get the following candidate itemsets (Ck) and frequent
itemsets (Lk):
Candidate itemsets:
C2: {Eggs,Key-chain}, {Eggs,Mango}, {Eggs,Onion}, {Eggs,Yo-yo}, {Key-chain,
Mango}, {Key-chain, Onion}, {Key-chain,Yo-yo}, {Mango,Onion}, {Mango,Yo-yo},
{Onion,Yo-yo}
C3: {Eggs,Key-chain,Onion}, {Key-chain,Mango,Onion}
Frequent itemsets:
L1: {Eggs}, {Key-chain}, {Mango}, {Onion}, {Yo-yo}
L2: {Eggs,Key-chain}, {Eggs,Onion}, {Key-chain,Mango}, {Key-chain,Onion}, {Keychain,Yo-yo}, {Mango, Onion}
L3: {Eggs, Key-chain,Onion}
3. If you get the same collection of itemsets at Step 2, you can proceed to apply your
implemented Apriori algorithm on the Gene dataset with minimum support
threshold=50%. You should be able to get 51 length-1 frequent itemsets (L1), 1275
length-2 candidate itemsets (C2), 29 length-2 frequent itemsets (L2), 20 length-3
candidate itemsets (C3) and 2 length-3 frequent itemsets (L3).
4. Prepare your submission. Your final submission should be a zip file named as
Assignment2.zip. In the zip file, you should include:
 A folder “Code”, which contains all the codes used in this assignment. You are
required to fill in the template. Do not change other functions or add other scripts.
 Report: A doc or pdf file named as Assignment2.pdf. The report should consist of
the following parts: 1) The frequent itemsets you obtain on Gene dataset (L1, L2,
L3). 2) The length-3 candidate itemsets generated during Apriori (C3) on Gene
dataset. 3) The codes of your Apriori algorithm implementation.
5. Log in any CSE department server and submit your zip file as follows:
submit_cse469 Assignment2.zip
Please refer to Course Syllabus for late submission policy and academic integrity policy. We will
take the submission time recorded by the server as the time of your submission. This assignment
must be done independently. Running your submitted code should be able to reproduce the results
in your report.

Scroll to Top