Intermediate Data Structures and Algorithms Assignment 8


Rate this product

CMPT 280– Intermediate Data Structures and Algorithms
Assignment 8

Total Marks: 75
1 Submission Instructions
• Assignments must be submitted using Moodle.
• Responses to written (non-programming) questions must be submitted in a PDF file, plain text file
(.txt), Rich Text file (.rtf), or MS Word’s .doc or .docx files. Digital images of handwritten pages
are also acceptable, provided that they are clearly legible.
• Programs must be written in Java.
• If you are using IntelliJ (or similar development environment), do not submit the Module (project).
Hand in only those files identified in Section 5. Export your .java source files from the workspace
and submit only the .java files. Compressed archives are not acceptable.
• No late assignments will be accepted. See the course syllabus for the full late assignment policy for
this class.
2 Background
Nice and short this time!
2.1 The Knapsack Problem
The knapsack problem, like the Travelling Salesperson Problem, cannot be solved by an algorithm on a
silicon computer in less than exponential time (unless P = NP). The problem definition is as follows.
You have a set of n items and each item has a weight and a value. The i-th item has a weight wi and a
value vi
. You also have a knapsack (bag) that can carry a total amount of weight equal to W. The problem
is to find the most valuable set of items that you can fit into the knapsack. In other words, you need to
find the most valuable subset of items whose total weight is less than W.
For the more mathematically inclined, this can be formally stated as:

xivi subject to

xiwi < W.
where xi
is either 0 or 1 (indicating whether or not the i-th item is in the knapsack).
We will investigate backtracking and greedy solutions to the knapsack problem in Questions 2 and 3.
Page 2
3 Your Tasks
Question 1 (30 points):
For this problem you will write a method (or methods) to sort an array of strings using the MSD Radix
Sort. For purposes of this assignment, you may assume that strings contain only the uppercase letters
A through Z.
You have been provided with an IntelliJ module RadixSortMSD-Template which includes a short main
program that will load a data file containing strings to be sorted. There are several files provided
named words-XXXXXX.txt where “XXXXXX” denotes the number of words in the file. The file format
starts with the number of words in the file, followed by one word per line. There is also a file
words-basictest.txt which is a good set of words to use to determine whether your sort is running
The pseudocode for MSD Radix Sort from the notes is duplicated on the next page for your convenience. Note that we are removing the optimization of sorting short lists with insertion sort, as
indicated by the strikethrough text. You may just always recursively radix sort any list with more
than one element on it.1
Complete the following tasks:
1. Write your sort method(s) within the RadixSortMSD class. It should accept an array of strings as
input, and return nothing. When the method returns, the array that was passed in should be in
lexicographic order (i.e. dictionary order).
2. Call your sort at the spot indicated in the main() function.
3. Record in a file called a8q1.txt/doc/pdf the time in milliseconds it takes to sort 50, 100, 500,
1000, 10000, 50000, and 235884 items (there are input files with each of these numbers of words
provided). Include this file in your assignment submission.
4. When you hand in, leave the input file set at words-basictest.txt so that
it is easy for the markers to run your program on this input file to see that it works.
There will be marks allotted to the following aspects of your solution:
Correctness. As always, the solution has to work!
Design and speed of your implementation. Design and speed will be considered together because
they influence each other — a poor design choice may result in a slower runtime. Design-wise,
any reasonable design will be accepted; marks will be deducted for especially poor choices.
Speed-wise, the bar to get full marks here will be fairly low, but if your sort is egregiously slow
we will deduct some points.
Javadoc and inline commenting. As usual, include a javadoc comment with each header, and document meaningful blocks of code with inline comments to enhance understanding of the code.
1You can, however, use the insertion-sort optimization if you want to, but you will probably have to implement your own insertion
sort. If you choose to attempt this optional optimization, you are permitted to use resources from the internet to implement insertion
sort, but only for the insertion sort, and only if you properly attribute any code you use to its original author or website.
Page 3
Implementation Hints
• One of the most important decisions you have to make in your implementation is the choice of
data structure to represent the array of lists used in the sortByDigit() helper method. Choose
carefully! Your choice will have an impact on the speed of your sort, and the ease of implementing it. When considering which data structure to use, you may select from containers in either
lib280 or the standard Java API. Case in point: on my first attempt, I made a bad decision that
caused the sort of 10000 words to take several minutes. Now it takes 24ms (on my computer).
• Although runtimes will vary from machine to machine, on a decent machine you should be able
to sort even the 235884-word input file in less than one second (1000ms). Even on slower machines
if it is taking more than a few seconds, then you’ve done something particularly inefficient.
• Don’t take the pseudocode below too literally. This is very high-level pseudocode which is intended to describe the operation of the algorithm, but intentionally glosses over a lot of details
that become important at implementation-time (that’s what pseudocode is for!). You need to fill
in those details as you go. Don’t be afraid to do what you need to do to get the job done, but that
said, you should not need to write hundreds of lines of code (if you are, you should seek help
and advice from Mark or a TA).
Algoirthm MsdRadixSort ( keys , R )
keys – keys to be sorted
R – the radix
sortByDigit ( keys , R , 0)
Algorithm sortByDigit ( keys , R , i )
keys – keys to be sorted
R – the radix
i – digit on which to partition — i = 0 is the left – most digit
for k = 0 to R -1
list [ k ] = new list // Make a new list for each digit
for each key
if the i – th digit of the key has value k add the key to list k
for k = 0 to R -1
if there is another digit to consider
if list[k] is small
use an insertion sort to sort the items in list[k]
sortByDigit ( list [ k] , i +1)
keys = new list // empty the input list
For k = 0 to R -1
keys = keys append list [ k ]
Page 4
Question 2 (20 points):
This problem deals with backtracking solutions to NP-Complete and NP-hard problems. These are
problems that cannot be solved by a silicon computer in less than exponential time, that is, O(k
) for
some k. The travelling salesperson problem (TSP) is an example of such a problem.
You have been provided with an IntelliJ project, TSP which contains two backtracking solutions to TSP.
The “Naive” solution simply traverses the entire search space and tries every possible solution The
other version prunes the search space by detecting non-feasible partial solutions. If a partial solution
has a cost that exceeds the cost of the best solution found so far, the search immediately backtracks.
This greatly decreases the number of path extensions that are tried. You can see the large difference
when you run the program because both algorithms will count the number of extensions of partial
solutions that are tried and report them. It is possible to do even better than this, if you include
The project contains several files of the from graphXX.dat where “XX” denotes the number of nodes
in the graph. Try changing the input file in the main program to graphs of different size to see the
effect of early detection of non-feasible partial solutions. The number of path extensions will be much
lower compared with trying every solution, and that there is a corresponding increase in efficiency.
However, you will also note that at a certain point, the graph gets too large for either version to solve
the problem in a reasonable amount of time because the savings gained from detecting non-feasible
partial solutions is rapidly outpaced by number of possible solutions. Why? Because the number of
possible solutions increases exponentially with the size of the graph, but the time savings gained by
detection of non-feasible partial solutions increases only sub-exponentially. Try different input files
and try to determine the graph size at which the runtime exceeds 10 or 15 minutes. It will probably
be smaller than you think!
The knapsack is another NP-complete problem that was described in Section 2. You have been provided with a project called Knapsack-template which contains a few things to get you started. In you will find definitions of:
Item class: Item is a non-public class that stores the properties of an item (i.e. it’s weight and value).
KnapsackInstance class: The non-public KnapsackInstance class contains all of the information about
an instance of the knapsack problem to be solved, including the capacity of the knapsack, the
number of items, and the items themselves (stored as an array of Item objects).
Knapsack class: The Knapsack class is where you will write code to solve the knapsack problem. Here
you will find an already completed method called readKnapsackInstance which can read a data
file describing an instance of the knapsack problem and return an instance of KnapsackInstance.
Also in the project, you will find several input files named knapsack-XXXX.dat where the “XXXX”
denotes the number of items in the problem instance. Each contains the data for a different instance of
the knapsack problem. The first line of each file contains the total weight capacity of the knapsack as
a floating-point number. The second line of each file contains the number of items n (an integer). The
subsequent n lines each contain two floating-point numbers which describe a single item: the value
of the item, and the weight the item. There is also a small knapsack-basictest.dat file which you
should use first to ensure your solutions work.
Now, here are your tasks:
1. Study the backtracking solutions to TSP to gain an understanding of how they work.
2. Write a backtracking solution to the Knapsack problem in the Knapsack class. The approach will
be similar to that of the TSP. You need to systematically try every possible solution (i.e. every
subset of the set of items) while using optimizations such as detection of non-feasible solutions
and heuristics where possible to prune the search space. The solution should take the form
of a method that returns the total value of the most valuable set of items that can fit in the
Page 5
knapsack. You may, of course, write other helper methods if you deem it appropriate. Since you
are implementing a backtracking solution it is expected that you should be able to obtain the
optimum solution. The optimum solution for knapsack-basictest.dat is a set of items worth
540.0. Note: you do not have to determine the actual set of items, just the value of the most
valuable set of items that will fit in the knapsack.
Note that the quality of your solution, how you prune the search space, and choice of heuristics will
determine which input files you can process in a few minutes and which you cannot. For example, on
my computer, my solution (which is not especially clever) starts choking when the problem instance
gets up over 500 items. Depending on your solution and your computer, you may do better or worse.
This question will be assessed on correctness of your implementation. While we will not be assessing
your commenting, you should still practice good commenting style to aid the markers in understanding how you designed and implemented your solution. A happy marker is a generous marker!
Page 6
Question 3 (25 points):
The following is a greedy algorithm for solving the knapsack problem:
Given : a set of items with known weight and value ,
and a knapsack with known capacity .
Start with an empty knapsack .
while there is at least one item not in the knapsack that will fit
add to knapsack the most valuable item that can still fit
The idea here is that we repeatedly look at how much weight capacity remains in the knapsack, and
then add the most valuable item that has weight less than or equal to the remaining capacity. This
algorithm will terminate when the remaining capacity in the bag is smaller than the weights of any
remaining items not in the bag. It should be clear that this algorithm can be implemented to run in
polynomial time. However, note that this algorithm is not guaranteed to produce the optimal solution.
Here are the tasks for this question:
1. Implement the greedy algorithm for the knapsack problem given above. Write your solution in (yes, should contain your code for both questions 2 and 3!). Use
comments to make sure that your solution to questions 2 and 3 are clearly labeled so that the
marker can identify them easily. The solution should be in the form of a method that returns the
total value of the items that were packed in the knapsack. Again, you do not have to return the
actual set of items in the knapsack, just the sum of the value of the items in the knapsack. By
luck, the greedy algorithm manages to find the optimum set of items worth 540.0 when given
knapsack-basictest.dat as input. This will not be the case for many of the other given problem
2. For each input file given, compare the result of the greedy algorithm to the result for the backtracking algorithm. In a file called a8q3.txt/doc/pdf:
(a) For each input file, record the total value of the items in the knapsack for each algorithm,
let’s call these Vbacktrack and Vgreedy.
(b) For each input file, record the ratio Vgreedy
. This ratio represents the percentage of the true
optimum value that the greedy algorithm was able to achieve.
(c) Extrapolate or hypothesize, to the extent possible, on the relative performance of the greedy
algorithm vs the backtracking algorithm. Do you think that the greedy algorithm can provide
any guaranteed minimum performance relative to the optimum? Will the greedy algorithm
always choose items with value at least 30% of the optimum? 50%? 90%? Based on your
observations, what do you think?
If you want to generate more problem instances to test your hypothesis for part 2(c), you can use
the RandomKnapsackInstance class that is provided. At the top of the main file you can change the
capacity of the knapsack, the number of items, the maximum weight, and value of an item. Run the
program and it will generate a random datafile that describes an instance of the knapsack problem
that can be read in by the readKnapsackInstance method of the Knapsack class.
This question will be assessed on correctness of your implementation (tasks 1 and 2(a)), and the
observations that you make for task 2(c).
Page 7
4 Files Provided
lib280-asn8: A copy of lib280 which includes:
• solutions to assignment 7;
RadixSortMSD-Template: The project template for question 1.
TSP: Examples of backtracking solutions to the TSP.
Knapsack-Template: The project template for question 2.
5 What to Hand In Your completed radix sort from question 1.
a8a1.txt/doc/pdf: Your timing observations from your sort in question 1. Your backtracking and greedy solutions to the knapsack problem of questions 2 and 3.
a8q3.txt/doc/pdf: Your comparison of the greedy and backtracking algorithms and discussion of your
observations from Question 3, task 2.
Page 8

Open chat
Need help?
Can we help you?