CIS 313, Intermediate Data Structures Programming Assignment 3

0 Instructions

Submit your work through Canvas. You should submit a tar file containing all source files and a README

for running your project. Don’t submit any other files (e.g., test case or pyc files).

More precisely, submit on Canvas a tar file named lastname.tar (where lastname is your last name) that

contains:

• All source files. You can choose any language that builds and runs on ix-dev.

• A file named README that contains your name and the exact commands for building and running

your project on ix-dev. If the commands you provide don’t work on ix-dev, then your project can’t be

graded and there will be a significant penalty.

Here is an example of what to submit:

hampton.tar

README

my class1.py

my class2.py

my class3.py

problem.py

another problem.py

…

README

Andrew Hampton

Problem 1: python problem1.py <input_filename>

Problem 1: python problem1.py <input_filename>

…

Note that Canvas might change the name of the file that you submit to something like lastname-N.tar. This

is totally fine!

1

The grading for the project will be roughly as follows:

Task Points

Problem 1 15

pass given sample test case 5

pass small grading test case 5

pass large grading test case 5

Problem 2 20

pass given sample test case 5

pass small grading test case 5

pass large grading test case 10

Problem 3 15

pass given sample test case 5

pass small grading test case 5

pass large grading test case 5

TOTAL 50

2

1 Applications

Problem 1. Prioritizing HTTP Requests

Suppose you are working on a web application that receives HTTP requests over a LAN in batches. When

you receive a batch of requests, they have already been preprocessed with an estimate of how long it will

take your application to complete each of them.

You want to serve the requests in order of the estimated service time, with the shortest requests being served

first.

Additionally, your application has two tiers of service: A and B. All of the requests in the A tier should be

served before any of the requests in the B tier.

Write a program that will give the correct service order according to the above criteria. Your program must

use one or more priority queues to accomplish this!

In the event of a tie in service time (and tier), the request that appears first in the input list should be served

first (that is, your sort should be stable).

Your program should take a single command-line argument, which will be a filename. The input file will

contain request strings. The first line of the input file will be an integer 0 ≤ N ≤ 106 giving the number of

requests. Following will be N lines, each containing a string having the following format:

IP_ADDR TIER ESTIMATE

IP ADDR is an IP address in decimal IPv4 format. TIER is either A or B. ESTIMATE is an integer

0 < X ≤ 104

representing a time estimate for processing the request. The separator is a single space.

Output the IP addresses in the order the requests will be served, separated by newlines. Again, in the event

of a tie in service time (and tier), the request that appears first in the input list should be served first.

Example input file:

8

10.31.99.245 B 30

10.16.0.105 A 150

10.16.115.160 B 60

10.30.111.90 B 65

10.16.0.105 A 20

10.30.100.100 A 25

10.16.100.115 A 150

10.111.111.119 B 60

Example output:

10.16.0.105

10.30.100.100

10.16.0.105

10.16.100.115

10.31.99.245

10.16.115.160

10.111.111.119

10.30.111.90

3

Problem 2. Rolling Median

Suppose you have streaming (integer) data and want to compute some summary statistics. It’s easy to

calculate the cumulative rolling average: this is a constant time operation (look up the formula if you’re

interested!). What about the cumulative rolling median?

In this problem, you’ll develop an algorithm to compute the cumulative rolling median and test it on simulated streaming data.

We will use the most common definition of median, as described on the Wikipedia page. We can simulate

streaming data by giving a list of integers L of length n and calculating the median on the slice L[1 : i] for

every 0 < i ≤ n.

Your program should take a single command-line argument, which will be a filename. The input file will

contain integers, one per line. The first line of the input file will be an integer 2 ≤ N ≤ 105 giving the

number of integers in the list L. Following will be N lines, each containing an integer 0 ≤ x ≤ 106

.

You should output the median of the slice L[1 : i] for every 0 < i ≤ N, with a newline between each result.

If a median is not an integer, it should be printed to one decimal place. If a median is an integer, it should

be printed as an integer (i.e., without a decimal point). See the sample output below.

Your solution should have runtime complexity O(n log n).

Hint: use two binary heaps, a maxheap to hold the smaller half of the data and a minheap to hold the larger

half of the data. The median of the data is either at the top of one of the heaps or it’s the average of those

two values.

Example input file:

5

1

8

4

3

2

Example output:

1

4.5

4

3.5

3

The first line of the sample input says that the file contains 5 integers. So, we will read in 5 integers and

with each new integer compute the median of those we have seen so far.

The first integer is 1 and the median of {1} is 1. The next integer is 8 and the median of {1, 8} is 4.5. The

next integer is 4 and the median of {1, 8, 4} is 4. The next integer is 3 and the median of {1, 8, 4, 3} is 3.5.

The final integer is 2 and the median of {1, 8, 4, 3, 2} is 3.

4

2 Implementation

Problem 3. Binary Search Tree

For this problem, you will implement a binary search tree with integer keys. Do not use any builtin tree

structures that your language might have. You must implement your own tree class that satisfies the binary

search tree property as described in Chapter 12 (p. 287) of the textbook.

Your binary search tree data structure should implement (at least) the following methods with specified

runtime, where h refers to the height of the tree:

insert(X): Inserts a node X into the tree. O(h)

remove(X): Removes a node X from the tree. This method should be implemented as described in the

textbook (pp. 295-298). In particular, use the in-order successor as the replacement node. Runtime: O(h)

search(X, K): Returns a node in the subtree rooted at node X having key K, if present. Runtime: O(h)

maximum(X): Returns the node in the subtree rooted at node X having the largest key. Runtime: O(h)

minimum(X): Returns the node in the subtree rooted at node X having the smallest key. Runtime: O(h)

to list preorder(): Returns a list of the keys in the tree ordered by a pre-order traversal. Runtime: O(n)

to list inorder(): Returns a list of the keys in the tree ordered by an in-order traversal. Runtime: O(n)

to list postorder(): Returns a list of the keys in the tree ordered by a post-order traversal. Runtime: O(n)

(Depending on the programming language you use, replace a node with a pointer to a node as appropriate.)

Note: It’s important that you implement the remove method as described. The pre- and post-order traversals

will not match the reference output if implemented differently.

Note: The behavior of these methods in exceptional cases is unspecified. You should think about what these

cases might be, and raise appropriate exceptions.

Note: No test case will insert duplicate keys into the tree.

Write a driver program that takes a single command-line argument, which will be a filename. The input file

will contain instructions for tree operations. The first line of the input file will be an integer 0 ≤ N ≤ 106

giving the number of instructions. Following will be N lines, each containing an instruction. The possible

instructions are:

insert K, where −105 ≤ K ≤ 105

is an integer: insert a node with key K into the tree. There is no output.

remove K, where −105 ≤ K ≤ 105

is an integer: remove a node with key K from the tree. If such a node

exists, there is no output. If no such node exists, output TreeError.

search K, where −105 ≤ K ≤ 105

is an integer: output Found if a node exists with key K. If no such node

exists, output NotFound.

max: output the maximum key in the tree. If the tree is empty, output Empty.

5

min: output the minimum key in the tree. If the tree is empty, output Empty.

preprint: print the keys of the tree according to a pre-order traversal, separated by a single space. If the

tree is empty, output Empty.

inprint: print the keys of the tree according to an in-order traversal, separated by a single space. If the

tree is empty, output Empty.

postprint: print the keys of the tree according to a post-order traversal, separated by a single space. If the

tree is empty, output Empty.

Example input file:

20

inprint

remove 2

max

search 5

insert 1

insert 2

search 1

search 2

insert 3

inprint

insert 10

insert 5

inprint

preprint

postprint

search 5

remove 2

inprint

max

min

Example output:

Empty

TreeError

Empty

NotFound

Found

Found

1 2 3

1 2 3 5 10

1 2 3 10 5

5 10 3 2 1

Found

1 3 5 10

10

1

6