B561 Advanced Database Concepts

Assignment 2

This assignment relies on the lectures

• SQL Part 1 and SQL Part 2 (Pure SQL);

• Views;

• Relational Algebra (RA); and

• Joins and semijoins.

Furthermore, the lecture on the correspondence between Tuple Relational Calculus and SQL should help you to solve problems in this assignment.

To turn in your assignment, you will need to upload to Canvas a single file

with name assignment2.sql which contains the necessary SQL statements

that solve the problems in this assignment. The assignment2.sql file must

be so that the AI’s can run it in their PostgreSQL environment. You should

use the Assignment2Script.sql file to construct the assignment2.sql file.

(Note that the data to be used for this assignment is included in this file.)

In addition, you will need to upload a separate assignment2.txt file that

contains the results of running your queries. Finally, you need to upload

a file assignment2.pdf that contains the solutions to the problems that

require it.

The problems that need to be included in the assignment2.sql are

marked with a blue bullet •. The problems that need to be included in the

assignment2.pdf are marked with a red bullet •. (You should aim to use

Latex to construct your .pdf file.)

1

Database schema and instances

For the problems in this assignment we will use the following database

schema:1

Person(pid, pname, city)

Company(cname, headquarter)

Skill(skill)

worksFor(pid, cname, salary)

companyLocation(cname, city)

personSkill(pid, skill)

hasManager(eid, mid)

Knows(pid1, pid2)

In this database we maintain a set of persons (Person), a set of

companies (Company), and a set of (job) skills (Skill). The pname

attribute in Person is the name of the person. The city attribute

in Person specifies the city in which the person lives. The cname attribute in Company is the name of the company. The headquarter

attribute in Company is the name of the city wherein the company has

its headquarter. The skill attribute in Skill is the name of a (job)

skill.

A person can work for at most one company. This information is

maintained in the worksFor relation. (We permit that a person does

not work for any company.) The salary attribute in worksFor specifies

the salary made by the person.

The city attribute in companyLocation indicates a city in which

the company is located. (Companies may be located in multiple cities.)

A person can have multiple job skills. This information is maintained in the personSkill relation. A job skill can be the job skill of

multiple persons. (A person may not have any job skills, and a job skill

may have no persons with that skill.)

A pair (e, m) in hasManager indicates that person e has person m as

one of his or her managers. We permit that an employee has multiple

managers and that a manager may manage multiple employees. (It

is possible that an employee has no manager and that an employee is

1The primary key, which may consist of one or more attributes, of each of these relations

is underlined.

2

not a manager.) We further require that an employee and his or her

managers must work for the same company.

The relation Knows maintains a set of pairs (p1, p2) where p1 and p2

are pids of persons. The pair (p1, p2) indicates that the person with pid

p1 knows the person with pid p2. We do not assume that the relation

Knows is symmetric: it is possible that (p1, p2) is in the relation but

that (p2, p1) is not.

The domain for the attributes pid, pid1, pid2, salary, eid, and

mid is integer. The domain for all other attributes is text.

We assume the following foreign key constraints:

• pid is a foreign key in worksFor referencing the primary key pid

in Person;

• cname is a foreign key in worksFor referencing the primary key

cname in Company;

• cname is a foreign key in companyLocation referencing the primary key cname in Company;

• pid is a foreign key in personSkill referencing the primary key

pid in Person;

• skill is a foreign key in personSkill referencing the primary

key skill in Skill;

• eid is a foreign key in hasManager referencing the primary key

pid in Person;

• mid is a foreign key in hasManager referencing the primary key

pid in Person;

• pid1 is a foreign key in Knows referencing the primary key pid in

Person; and

• pid2 is a foreign key in Knows referencing the primary key pid in

Person

The file Assignment2Script.sql contains the data supplied for

this assignment.

3

Pure SQL and RA SQL

In this assignemt, we distinguish between Pure SQL and RA SQL.

Below we list the only features that are allowed in Pure SQL and in

RA SQL.

In particular notice that

• JOIN, NATURAL JOIN, and CROSS JOIN are not allowed in Pure

SQL.

• The predicates [NOT] IN, SOME, ALL, [NOT] EXISTS are not allowed

in RA SQL.

The only features allowed in Pure SQL

SELECT … FROM … WHERE

WITH …

UNION, INTERSECT, EXCEPT operations

EXISTS and NOT EXISTS predicates

IN and NOT IN predicates

ALL and SOME predicates

VIEWs that can only use the above RA SQL features

The only features allowed in Pure SQL features

SELECT … FROM … WHERE

WITH …

UNION, INTERSECT, EXCEPT operations

JOIN … ON …, NATURAL JOIN, and CROSS JOIN operations

VIEWs that can only use the above RA SQL features

commas in the FROM clause are not allowed

4

1 Formulating queries in Pure SQL with and without set predicates

An important consideration in formulating queries is to contemplate

how they can be written in different, but equivalent, ways. In fact, this

is an aspect of programming in general and, as can expected, is also

true for SQL. A learning outcome of this course is to acquire skills for

writing queries in different ways. One of the main motivations for this is

to learn that different formulations of the same query can dramatically

impact performance, sometimes by orders of magnitude.

For the problems in this section, you will need to formulate queries

in Pure SQL with and without set predicates. You can use the SQL operators INTERSECT, UNION, and EXCEPT, unless otherwise specified. You

are however allowed and encouraged to use views including temporary

and user-defined views.

To illustrate what you need to do, consider the following example.

Example 1 Consider the query “ Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows

a person who lives in Chicago.”

(a) Formulate this query in Pure SQL by only using the EXISTS or

NOT EXISTS set predicates. You can not use the set operations

INTERSECT, UNION, and EXCEPT.

A possible solution is

select p.pid, p.pname

from Person p

where exists (select 1

from worksFor w

where p.pid = w.pid and

exists (select 1

from companyLocation c

where w.cname = c.cname and c.city = ’Bloomington’)) and

exists (select

from Knows k

where p.pid = k.pid1 and

exists (select 1

from Person p2

where k.pid2 = p2.pid and

p2.city = ’Chicago’));

5

(b) Formulate this query in Pure SQL by only using the IN, NOT IN,

SOME, or ALL set membership predicates. You can not use the set

operations INTERSECT, UNION, and EXCEPT.

A possible solution is

select p.pid, p.pname

from Person p

where p.pid in (select w.pid

from worksFor w

where w.cname in (select c.cname

from companyLocation c

where c.city = ’Bloomington’)) and

p.pid in (select k.pid1

from Knows k

where k.pid2 in (select p2.pid

from Person p2

where p2.city = ’Chicago’));

Another possible solution using the SOME and IN predicates

select p.pid, p.pname

from Person p

where p.pid = some (select w.pid

from worksFor w

where w.cname = some (select c.cname

from companyLocation c

where c.city = ’Bloomington’)) and

p.pid in (select k.pid1

from Knows k

where k.pid2 in (select p2.pid

from Person p2

where p2.city = ’Chicago’));

(c) Formulate this query in Pure SQL without using set predicates. A

possible solution is

select p.pid, p.pname

from Person p, worksFor w, companyLocation c

where p.pid = w.pid and

w.cname = c.cname and

c.city = ’Bloomington’

intersect

select p1.pid, p1.pname

from Person p1, Knows k, Person p2

where k.pid1 = p1.pid and

k.pid2 = p2.pid and

p2.city = ’Chicago’;

6

We now turn to the problems for this section.

1. Consider the query “Find the pid and name of each person who

works for Google and who has a strictly higher salary than some

other person who he or she knows and who also works for Google.”

(a) • Formulate this query in Pure SQL by only using the EXISTS

or NOT EXISTS set predicates.

(b) • Formulate this query in Pure SQL by only using the IN,

NOT IN, SOME, or ALL set membership predicates.

(c) • Formulate this query in Pure SQL without using set predicates.

2. Consider the query “Find the cname of each company with headquarter in Cupertino, but that is not located in Indianapolis, along

with the pid, name, and salary of each person who works for that

company and who has the next-to-lowest salary at that company.”

2

(a) • Formulate this query in Pure SQL by only using the EXISTS

or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.

(b) • Formulate this query in Pure SQL by only using the IN,

NOT IN, SOME, or ALL set membership predicates. You can

not use the set operations INTERSECT, UNION, and EXCEPT.

(c) • Formulate this query in Pure SQL without using set predicates.

3. Consider the query “Find each (c, p) pair where c is the cname

of a company and p is the pid of a person who works for that

company and who is known by all other persons who work for

that company.

(a) • Formulate this query in Pure SQL by only using the EXISTS

or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.

2By definition, a salary s is next-to-lowest if there exists salary s1 with s1 < s, but

there do not exist two salaries s1 and s2 with s2 < s1 < s.

7

(b) • Formulate this query in Pure SQL by only using the IN,

NOT IN, SOME, or ALL set membership predicates. You can

not use the set operations INTERSECT, UNION, and EXCEPT.

(c) • Formulate this query in Pure SQL without using set predicates.

4. Consider the query “Find each skill that is not a jobskill of any

person who works for Yahoo or for Netflix.”

(a) • Formulate this query in Pure SQL using subqueries and set

predicates. You can not use the set operations INTERSECT,

UNION, and EXCEPT.

(b) • Formulate this query in Pure SQL without using predicates.

5. Consider the query “Find the pid and name of each person who

manages all-but-one person who works for Google.

(a) • Formulate this query in Pure SQL using subqueries and set

predicates. You can not use the set operations INTERSECT,

UNION, and EXCEPT.

(b) • Formulate this query in Pure SQL without using set predicates.

8

2 Formulating queries in Relational Algebra and

RA SQL

Reconsider the queries in Section 1. The goal of the problems in this

section is to formulate these queries in Relational Algebra in standard

notation and in RA SQL.

There are some further restrictions:

• You can only use WHERE clauses that use conditions involving constants. For example conditions of the form “t.A θ ’a’” are allowed,

but conditions of the form ’t.A θ s.B’ are not allowed. The latter

conditions can be absorbed in JOIN operations in the FROM clause.

• You can not use commas in any FROM clause. Rather, you should

use JOIN operations.

You can use the following letters, or indexed letters, to denote relation names in RA expressions:

P, P1, P2, · · · Person

C, C1, C2, · · · Company

S, S1, S2, · · · Skill

W, W1, W2, · · · worksFor

cL, cL1, cL2, · · · companyLocation

pS, pS1, pS2, · · · personSkill

M, M1, M2, · · · hasManager

K, K1, K2, · · · Knows

To illustrate what you need to do reconsider the query in Example 1

in Section 1.

Example 2 Consider the query “ Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows

a person who lives in Chicago.”

(a) Formulate this query in Relational Algebra in standard notation.

A possible solution is

πpid,pname(P erson ./ worksF or ./ πcname(σcity=Bloomington(companyLocation)))∩

πP erson1.pid,P erson1.pname(P erson1 ./P erson1.pid=pid1 Knows ./pid2=P erson2.pid πP erson2.pid(σcity=Chicago(P erson2)))

9

If we use the letters in the above box, this expression becomes more

succinct:

πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL)))∩

πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2)))

You are also allowed to introduce letters that denote expressions.

For example, let E and F denote the expression

πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL)))

and

πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2))),

respectively. Then we can write the solution as follows:

E ∩ F.

(b) Formulate this query in RA SQL.

A possible solution is

select pid, pname

from Person

NATURAL JOIN worksFor

NATURAL JOIN (select cname

from companyLocation

where city = ’Bloomington’) C

INTERSECT

select P1.pid, P1.pname

from Person P1

JOIN Knows ON (P1.pid = pid1)

JOIN (select pid

from Person

where city = ’Chicago’) P2 ON (pid2 = P2.pid)

order by 1,2;

Observe that the WHERE clauses only use conditions involving constants.

10

We now turn to the problems in this section.

6. Reconsider Problem 1. Find the pid and name of each person

who works for Google and who has a strictly higher salary than

some other person who he or she knows and who also works for

Google.

(a) • Formulate this query in Relational Algebra in standard

notation.

(b) • Formulate this query in RA SQL.

7. Reconsider Problem 2. Find the cname of each company with

headquarter in Bloomington, but not located in Indianapolis,

along with the pid, name, and salary of each person who works

for that company and who has the next-to-lowest salary at that

company.

(a) • Formulate this query in Relational Algebra in standard

notation.

(b) • Formulate this query in RA SQL.

8. Reconsider Problem 3. Find each (c, p) pair where c is the cname

of a company and p is the pid of a person who works for that

company and who is known by all other persons who work for

that company.

(a) • Formulate this query in Relational Algebra in standard

notation.

(b) • Formulate this query in RA SQL.

9. Reconsider Problem 4. Find each skill that is not a jobskill of any

person who works for Yahoo or for Netflix.

(a) • Formulate this query in Relational Algebra in standard

notation.

(b) • Formulate this query in RA SQL.

10. Reconsider Problem 5. Find the pid and name of each person

who manages all-but-one person who works for Google.

11

(a) • Formulate this query in Relational Algebra in standard

notation.

(b) • Formulate this query in RA SQL.

12

3 Formulating constraints using Relational Algebra

Recall that it is possible to express constraints in TRC and as boolean

SQL queries. It is also possible to write constraints using RA expressions. Let E, F, and G denote RA expressions. Then we can write RA

expression comparisons that express constraints:

E 6= ∅ which is true if E evaluates to an non-empty relation

E = ∅ which is true if E evaluates to the empty relation

F ⊆ G which is true if F evaluates to a relation

that is a subset of the relation obtained from G

F 6⊆ G which is true if F evaluates to a relation

that is not a subset of the relation obtained from G

Here are some examples of writing constraints in this manner.

Example 3 “ Some person works for Google.” This constraint can be

written as follows:

πpid(σcname=Google(worksF or)) 6= ∅.

Indeed, the RA expression

πpid(σcname=Google(worksF or))

computes the set of all person pids who work for Google. If this set is

not empty then there are indeed persons who work for Google.

Incidentally, the constraint “ No one works for Google” can be written as follows:

πpid(σcname=Google(worksF or)) = ∅.

Example 4 Each person has at least two skills. This constraint can

be written as follows:

πpid(P) ⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)).

Indeed,

πpid(P)

computes the set of all person pids and

πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2))

13

computes the set of all pids of persons who have at least two skills.

When the first set is contained in the second we must have that each

person has at least two skills.

Incidentally, the constraint “ Some person has fewer than two skills”

can be written as follows:

πpid(P) 6⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)).

We now turn to the problems in this section.

Formulate each of the following constraints using RA expressions

as illustrated in Example 3 and Example 4.

11. • Some person has a salary that is strictly lower than that of each

of the persons who he or she manages.

12. • No person knows all persons who work for Google.

13. • Each person knows all of his or her managers.

14. • Each employee and his or her managers work for the same company.

15. • The attribute pid is a primary key of the Person relation.

14

4 Formulating queries in SQL using views

Formulate the following views and queries in SQL. You are allowed to

combine the features of both Pure SQL and RA SQL.

16. • Create a view Triangle that contains each triple of pids of

different persons (p1, p2, p3) that mutually know each other.

Then test your view.

17. • Define a parameterized view SalaryBelow(cname text, salary

integer) that returns, for a given company identified by cname

and a given salary value, the subrelation of Person of persons

who work for company cname and whose salary is strictly below

salary.

Test your view for the parameter values ( ’IBM’,60000), (’IBM’,50000),

and (’Apple’,65000).

18. • Define a parameterized view KnowsPersonAtCompany(p integer,

c text) that returns for a person with pid p the subrelation of

Person of persons who p knows and who work for the company

with cname c.

Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’).

19. • Define a parameterized view KnownByPersonAtCompany(p integer,

c text) that returns the subrelation of Person of persons who

know the person with pid p and who work for the company with

cname c.

Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’).

20. • Let T ree(parent : integer, child : integer) be a rooted parentchild tree. So a pair (n, m) in T ree indicates that node n is a

parent of node m. The sameGeneration(n1, n2) binary relation

is inductively defined using the following two rules:

• If n is a node in T, then the pair (n, n) is in the sameGeneration

relation. (Base rule)

15

• If n1 is the parent of m1 in T ree and n2 is the parent of m2

in T ree and (n1, n2) is a pair in the sameGeneration relation then (m1, m2) is a pair in the sameGeneration relation.

(Inductive Rule)

Write a recursive view for the sameGeneration relation.

Test your view.

16

## Reviews

There are no reviews yet.