Computer Science CSC263H

Homework Assignment #3

• You must submit your assignment as a PDF file, named a3.pdf, of a typed (not handwritten)

document through the MarkUs system by logging in with your CDF account at:

https://markus.teach.cs.toronto.edu/csc263-2019-01

To work with one or two partners, you and your partner(s) must form a group on MarkUs.

• The a3.pdf PDF file that you submit must be clearly legible. To this end, we encourage you to

learn and use the LATEX typesetting system, which is designed to produce high-quality documents

that contain mathematical notation. You can use other typesetting systems if you prefer, but

handwritten documents are not accepted.

• If this assignment is submitted by a group of two or three students, the a3.pdf PDF file that you

submit should contain for each assignment question:

1. The name(s) of the student(s) who wrote the solution to this question, and

2. The name(s) of the student(s) who read this solution to verify its clarity and correctness.

• By virtue of submitting this assignment you (and your partners, if you have any) acknowledge that

you are aware of the homework collaboration policy that is stated in the csc263 course web page:

http://www.cs.toronto.edu/~sam/teaching/263/#HomeworkCollaboration.

• For any question, you may use data structures and algorithms previously described in class, or in

prerequisites of this course, without describing them. You may also use any result that we covered

in class, or is in the assigned sections of the official course textbook, by referring to it.

• Unless we explicitly state otherwise, you should justify your answers. Your paper will be marked

based on the correctness and completeness of your answers, and the clarity, precision, and conciseness of your presentation.

• Your a3.pdf submission should be no more than 6 pages long in a 10pt font.

1

Question 1. (1 marks) In this question, you must use the insertion and deletion algorithms as described

in the “Balanced Search Trees: AVL trees” handout posted on the course web site.

a. Insert keys 0, 25, 19, 5, -2, 28, 13, -5, 2, 6, 14, 7 (in this order) into an initially empty AVL tree,

and show the resulting AVL tree T, including the balance factor of each node (only the final tree should

be shown).

b. Delete key 2 from the above AVL tree T, and show the resulting AVL tree, including the balance

factor of each node.

In each of the above questions, only the final tree should be shown: intermediate trees will be disregarded,

and not given partial credit.

Question 2. (1 marks) You are tasked with designing a data structure that maintains a set of books for

an online bookstore. A book is defined as a 3-tuple (identif ier, price, rating), where

• identif ier is a unique positive integer which identifies the book.

• price is a positive real number which represents the price of the book in dollars.

• rating is a real number in the range [0, 5], which represents how good the book is.

Note that you can not assume that prices and ratings are unique: there may be several books which have

the same price, and there may be several books which have the same rating.

a. Name a standard data structure D that can store the set of books such that each of the following

operations can be done in O(log n) time in the worst-case (where n is the number of books in D):

AddBook(D, x): Insert x which is a new book of the form (identif ier, price, rating) into D.

SearchBook(D, id): Return the price and rating of the book in D whose identif ier is id. If there is

no book in D whose identif ier is id, return nil.

Briefly describe what each node u of your data structure contains, and what is the key that you are using

when adding a new book. Which standard operations of your data structure you are using to implement

AddBook and SearchBook?

In the Parts b, c, d, e below:

• For each tree that you define, describe what each node of your tree contains, and what is the key

that you are using when inserting a new item into this tree.

• For each operation, you should give a clear and concise English description of the algorithm

implementing the operation.

• Also give brief explanations of why each of your algorithms achieve the worst-case time complexity

specified in the subquestion (where n is the number of books in the data structure).

b. Modify D to support the following operation, in addition to the operations of Parts a:

BestBookRating(D, p): Let r be the maximum rating among all books in D whose price is at most p.

Then return r. If no book has price at most p, then return −1.

In addition to the English description of the algorithm implementing the above operation, you must

also give the corresponding pseudocode. Also describe in English any changes you make to the

implementation of the previously defined operations. The worst-case time complexity of each operation

should be O(log n).

Hint: Use an augmented AVL tree, in addition to the data structure used to implement Part a.

2

c. Modify D to support the following operation, in addition to the operations of Parts a, b:

AllBestBooks(D, p): Let r be the maximum rating among all books in D whose price is at most p.

Then return a pointer to a list of all books in D whose rating is r. If no book has price at most p,

then return nil.

Also describe in English any changes you make to the implementation of the previously defined operations

(don’t use pseudo-code). The worst-case time complexity of each operation should be O(log n).

Hint: Use another AVL tree, in addition to the data structures used to implement Parts a, b.

d. Modify D to support the following operation, in addition to the operations of Parts a, b, c:

IncreasePrice(D, p): Increase the price of every book in D by p dollars. Assume that p is a positive

real number.

Also describe in English any changes you make to the implementation of the previously defined operations

(don’t use pseudo-code). The worst-case time complexity of the IncreasePrice operation should be O(1).

The worst-case time complexity of the previously defined operations should remain O(log n).

e. Modify D to support the following operation, in addition to the operations of Parts a, b, c, d:

DeleteBook(D, id): Remove from D the book whose identif ier is id.

Also describe in English any changes you make to the implementation of the previously defined operations (don’t use pseudo-code). The worst-case time complexity of the DeleteBook operation should be

O(log n), where n is the number of books in D. The worst-case time complexity of the previously defined

operations should remain unchanged.

Question 3. (1 marks)

Give an efficient algorithm which, given two sets A and B, outputs every element of A that is not

in B. More precisely, the algorithm’s input are two sets A and B given as unsorted arrays A[1..n]

and B[1..n], each containing n distinct integers. The algorithm must print the elements of the set

A − B = {e | e ∈ A and e /∈ B}.

The expected time complexity of your algorithm should be O(n), under some assumptions that we discussed

in class.

Hint: What data structures or algorithms that we learned in class involve reasoning about probability?

Note: Do not assume that the numbers in the array are small, or they have a small range, e.g., they

could be arbitrary integers. So a solution based on a direct access table or on linear time sorting is not

acceptable.

a. Describe your algorithm clearly and concisely in English, and then give the pseudo-code.

b. Explain why the expected running time of your algorithm is O(n). Explicitly state every assumption

that you need for your complexity analysis.

c. What is the worst-case running time of your algorithm? Use the Θ notation. Justify your answer.

3

[The questions below will not be corrected/graded. They are given here as interesting problems

that use material that you learned in class.]

Question 4. (0 marks) Consider an abstract data type that consists of a set S of integers on which the

following operations can be performed:

Add(i): Adds the integer i to S. If this integer already is in S, then S does not change

Average(t): Returns the average of all elements of S that are less than or equal to the integer t. If all

the elements of S are greater than t, then return 0.

Describe how to implement this abstract data type using an augmented AVL tree T. Each operation should

run in O(log n) worst-case time, where n = |S|. Since this implementation is based on a data structure

and algorithms described in class and in the AVL handout, you should focus on describing the extensions

and modifications needed here.

a. Give a precise and full description of your data structure. In particular, specify what data is associated

with each node, specify what the key is at each node, and specify what the auxiliary information is at

each node. In particular, what is (are) the augmented field(s), and what identity should this (these) fields

satisfy. Illustrate this data structure by giving an example of it (with a small set of your own choice).

b. Describe the algorithm that implements each one of the two operations above, and explain why each

one takes O(log n) time in the worst-case. Your description and explanation should be in clear and concise

English. For the operation Average, you should also give the algorithm’s high-level pseudocode.

Question 5. (0 marks)

The task in this question is to compute the medians of all prefixes of an array. As input we are given the

array A[1 . . n] of distinct integers. Using a heap data structure, design an algorithm that outputs another

array M[1 . . n], so that M[i] is equal to the median of the numbers in the subarray A[1 . . i]. Recall that

when i is odd, the median of A[1 . . i] is the element of rank (i+1)/2 in the subarray, and when i is even, the

median is the average of the elements with ranks i/2 and i/2 + 1. Your algorithm should run in worst-case

time O(n log n).

a. Describe your algorithm in clear and concise English, and also provide the corresponding pseudocode.

Argue that your algorithm is correct.

b. Justify why your algorithm runs in time O(n log n).

Question 6. (0 marks)

Dynamic storage allocation is a resource management task that is required at several levels of computer

systems: for example, an operating system must allocate chunks of main memory to processes, and a file

system must allocate chunks of disk space to files. We can describe the problem of storage management

in general terms as follows. There is a store S of N equal-size cells, indexed as S[i] for i ∈ {1, 2, . . . , N}.

In practice, the cells of the store are individually addressable units of some type of memory. For example,

they may be bytes, words or pages of main memory; or they may be blocks or sectors of disk space.

The storage manager maintains at all times a pool of available fragments. Intuitively, a fragment

is a maximal sequence of consecutive cells that are available to be allocated to a process that needs them.

When a process requests a fragment of size `, the storage manager determines if the pool of available

fragments contains a fragment of size k ≥ `. If it does not, the request is rejected. If the pool contains a

fragment of size k ≥ `, the storage manager removes it from the pool, splits it into one fragment of size

` and one fragment of size k − `, allocates the former to the requesting process, and returns the latter (if

k − ` 6= 0) to the pool. When a process is done with a fragment that it was previously allocated, it returns

4

that fragment to the storage manager. The storage manager returns the fragment to the storage pool after

coalescing it with its adjacent fragments, if such fragments are presently in the pool.

More precisely, a fragment is a pair (a, k), where a, k are positive integers such that a + k − 1 ≤ N; a

is the fragment’s start address (the index of the first cell in the fragment) and k is its size (the number

of cells in the fragment). Thus, fragment (a, k) corresponds to the subarray S[a..a + k − 1] of the store S.

Two fragments F = (a, k) and F

0 = (a

0

, k0

) are adjacent if the start address of one is immediately after

the last cell of the other (i.e., a = a

0 + k

0 or a

0 = a + k); F and F

0 are overlapping if some cell belongs

to both of them (i.e., the intervals [a, a + k − 1] and [a

0

, a0 + k

0 − 1] are not disjoint). The pool of available

fragments is a set of fragments that does not contain any adjacent or overlapping fragments. Initially, the

pool of available fragments consists of a single fragment, namely (1, N).

We can formulate storage management as an abstract data type. The object manipulated by this ADT

is the pool of available fragments, and there are two operations:

• Allocate(P, `): Intuitively, this is a request to allocate a fragment of size `.

P is a pool of available fragments, and ` is an integer such that 1 ≤ ` ≤ N. If P contains no fragment

whose size is at least `, the operation returns “reject”. If P contains some fragment, say (a, k), such

that k ≥ `, the operation returns (a, `), and adjusts P by replacing (a, k) with (a + `, k − `), if k > `

— or replacing it with nothing, if k = `.

• Release(P,(a, `)): Intuitively, this returns a previously allocated fragment to the pool.

P is a pool of available fragments, and (a, `) is a fragment that does not overlap any fragment in P.

The operation adjusts P by returning the fragment (a, `) to it, after coalescing it with any adjacent

fragments.

When the operation Allocate(P, `) is invoked and the pool P contains several fragments of size ` or

more, there are various strategies that the storage manager could use to decide which fragment to use.

Following are three common strategies:

• First-fit: choose the fragment of adequate size that has the smallest possible start address.

• Best-fit: choose a smallest fragment of adequate size.

• Worst-fit: choose a largest fragment of adequate size.

In this question, you must describe how to implement the storage management ADT described above

using an augmented AVL tree (which represents the pool P of currently available fragments). Your implementation should have the property that the Allocate operation uses the first-fit strategy. Furthermore, each operation Allocate and Release should take O(log n) time in the worst-case, where n is the

number of fragments currently in the pool P.

Since this implementation of P is based on a data structure and algorithms described in class and in

the AVL handout, you should focus on describing the extensions and modifications needed here.

1. Give a precise and full description of your data structure. In particular, specify what data is associated

with each node, specify what the key is at each node, and specify what the auxiliary information is

at each node. Illustrate this data structure by giving an example of it (on a small pool P of available

fragments of your own choice).

2. Describe the algorithms for Allocate and Release, and explain why each one takes O(log n) time

in the worst-case.

5

Question 7. (0 marks) You are given a list L of n, not necessarily distinct, integers. Your task is to devise

an algorithm that outputs a list of the distinct integers in L, in order of non-increasing frequency (the

frequency of an element in L is the number of times it occurs in L). For example, if L is 2, 5, 4, 4, 2, 3, 4, 3,

then a suitable output is 4, 3, 2, 5; the only other suitable output for this list is 4, 2, 3, 5. In this example,

L contains only small integers, but this is not always the case. You cannot make any assumption on the

integers in L (e.g., their maximum size, or distribution). In particular, some integers of L can very large,

and so it is not feasible to use tables of size proportional to these integers.

a. Give a Hash Table-based algorithm for this problem whose expected time complexity is O(n), under

some Hash Table assumptions that you must clearly state. Explain why, your algorithm achieves this time

complexity under these assumptions.

What is the worst-case time complexity of your algorithm? Use the Θ notation and justify your answer.

Hint: As part of your algorithm, you may find useful to discover a way to sort n numbers in the range

0 to n in O(n) worst-case time.

b. Give an algorithm for this problem whose worst-case time complexity is O(n log n). Explain why

your algorithm achieves this time complexity.

Question 8. (0 marks)

In this question you will explore different algorithms to perform the join of two relations, a very important

operation in relational databases. We consider a simplified setting where we wish to compute the join of

relation R with attributes A and C, and relation S with attributes B and C. Notice that the two relations

have a common attribute, C. Each relation is a set of pairs. For each pair (a, c) of R, a is a value of

attribute A and c is a value of attribute C. Similarly, for each pair (b, c) of S, b is a value of attribute

B and c is a value of attribute C. It may be useful to think of R as a two-dimensional array, with two

columns, the first corresponding to A and the other corresponding to C: each row of the two-dimensional

array corresponds to a pair (a, c) of R; so the number of rows of the array is equal to the number of pairs

in R. Similar comments apply to S.

The join of R and S, denoted R ./ S, is a relation with attributes A, B and C (i.e., the union of the

attributes of R and S). It consists of the set of all triples (a, b, c) such that (a, c) is a pair in R and (b, c)

is a pair in S. Notice that the two pairs that are being joined to produce the triple (a, b, c) of R ./ S must

have the same value in the common attribute C.

Another way of thinking about R ./ S is as follows: We consider every possible combination of a pair

(a, c) from R and a pair (b, c0

) from S. If c = c

0 — i.e., if the two pairs have the same value in the common

attribute — then this combination contributes the triple (a, b, c) to the join. If c 6= c

0

then this combination

contributes nothing to the join.

For example, suppose that R is the relation Teaches with attributes ProfName and CourseId; and

S is the relation Takes with attributes StudName and CourseId. A pair (p, c) is in Teaches if and only

if professor p teaches course c. A pair (s, c) is in Takes if and only if student s takes course c. The

join Teaches ./ Takes is the relation with attributes ProfName, StudName, CourseId; it contains a triple

(p, s, c) if and only if professor p teaches student s in course c.

The obvious way to compute R ./ S is to consider each combination of pairs from R and S and create

a triple from each such combination where the two pairs have the same value in the common attribute.

Obviously this algorithm takes time Θ(mn), where m is the number of pairs in R and n is the number of

pairs in S.

In a way, this is the best we can hope for: If all pairs in R and all pairs in S have the same value

in the common attribute C, then there will be mn triples in R ./ S, and we can’t create a set of mn

triples faster than in Ω(mn) time! In practice, however, it is very unlikely that every pair in R joins with

every pair in S; in most cases, the size of R ./ S is much smaller than mn. For instance, in our example

of the Teaches and Takes relations, if these happened to describe, respectively, the courses taught by U

6

of T faculty and the courses taken by U of T students in a particular semester, the first relation would

contain about 4,000 pairs (approximately 2,000 professors each teaching 2 courses) and the second relation

would contain about 250,000 pairs (approximately 50,000 students, each taking 5 courses). In this case the

product mn is approximately one billion, while the actual size of Teaches ./ Takes will be approximately

250,000 — more than three orders of magnitude smaller! (There could be more records in Teaches ./ Takes

than in Takes, in case a course is co-taught by several professors, but such instances are rare, so typically

the size of Teaches ./ Takes will be only a little bit larger than that of Takes.)

In view of this example, we would like to have an algorithm that computes R ./ S in time Θ(k+f(m, n)),

where k is the actual size of R ./ S and f(m, n) is as small a function as possible.

In the following two questions you are asked to devise and describe such algorithms. You should

describe each algorithm in clear and precise English, and illustrate the data structure(s) that you use with

simple example(s); no pseudocode is necessary (but you can use pseudo-code as an additional explanation).

To the extent your algorithm makes use of algorithms or data structures discussed in this course or in its

prerequisites, you can simply state what algorithm(s) and data structure(s) you use without describing

them in detail.

In the following, assume that each of R and S is given as a two dimensional array, where each column

corresponds to an attribute and each row corresponds to a pair of the relation. You can not make any

assumption on the size or distribution of values contained in R or S (i.e., the values associated with

attributes A, B, or C). In particular, some of these values may be very large (e.g., they may be 128-bit

integers), and so it is not feasible to use arrays of size proportional to these values.

Let k be the size of the join R ./ S, and N be the size of the larger of the two given relations R and

S, i.e., N = max(n, m).

a. Describe an algorithm that computes R ./ S in worst-case time O(k + N log N). In other words, the

worst-case running time of your algorithm, with respect to all the inputs R and S that have N = max(n, m)

and whose join is of size k, is O(k + N log N). Your should describe your algorithm as we explained above

(start with a few English sentences explaining the high-level idea of your algorithm).

Explain why the worst-case time complexity of your algorithm is O(k + N log N).

b. Describe an algorithm that computes R ./ S in expected time Θ(k + N) under some reasonable

assumptions that you should state. Your should describe your algorithm as we explained above (start with

a few English sentences explaining the high-level idea of your algorithm).

Explain why the expected running time of your algorithm is Θ(k + N).

What is the worst-case time complexity of this algorithm? Justify your answer.

Hint: Note that R or S may have many pairs with the same value c in attribute C. You should think

about this case carefully when you formulate your assumptions, and when you analyse the time complexity

of your algorithm.

7

## Reviews

There are no reviews yet.