Database-Management-Systems

Problem Set 1 [100 points]¶

Deliverables:

Submit your queries (and only those) using the submission_template.txt file that is posted on Canvas. Follow the instructions on the file! Upload the file at Canvas.

Instructions / Notes:

Run the top cell above to load the database hw1.db (make sure the database file, hw1.db, is in the same directory as this IPython notebook is running in)

Some of the problems involve changing this database (e.g. deleting rows)- you can always re-download hw1.db or make a copy if you want to start fresh!

You may create new IPython notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer for each question is in its own cell and clearly indicated

When you see In [*]: to the left of the cell you are executing, this means that the code / query is running.

If the cell is hanging- i.e. running for too long: To restart the SQL connection, you must restart the entire python kernel

To restart kernel using the menu bar: “Kernel >> Restart >> Clear all outputs & restart”), then re-execute the sql connection cell at top

You will also need to restart the connection if you want to load a different version of the database file

Remember:

%sql [SQL] is for single line SQL queries

%%sql [SQL] is for multi line SQL queries

Have fun!

Problem 1: Linear Algebra [30 points]

Two random 3×3 ( N=3

) matrices have been provided in tables A and B, having the following schema:

i INT: Row index

j INT: Column index

val INT: Cell value

Note: all of your answers below must work for any square matrix sizes, i.e. any value of N

.

Note how the matrices are represented – why do we choose this format? Run the following queries to see the matrices in a nice format:

%sql SELECT group_concat(val, ” , “) AS “A” FROM A GROUP BY i;

%sql SELECT group_concat(val, ” , “) AS “B” FROM B GROUP BY i;

Part (a): Matrix addition [5 points]

The sum of a matrix A

(having dimensions n×m

) and a matrix B

(having dimensions n×m

) is the matrix C

(of dimension n×m

) having cell at row i

and column j

equal to:

Cij=Aij+Bij

Write a single SQL query to get the sum of A

and B

(in the same format as A

and B

):

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT A.i as i, B.j as j, A.val+B.val as val

FROM A,B

WHERE A.i=B.i and A.j=B.j;

* sqlite:///hw1.db

Done.

i j val

0 0 16

0 1 11

0 2 18

1 0 17

1 1 13

1 2 16

2 0 3

2 1 1

2 2 12

Part (b): Dot product [5 points]

The dot product of two vectors

a=[a1a2…an]

and

b=[b1b2…bn]

is

a⋅b=∑ni=1aibi=a1b1+a2b2+⋯+anbn

Write a single SQL query to take the dot product of the second column of A

and the third column of B

.:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT sum(A.val*B.val) AS DotProduct

FROM A,B

WHERE A.j=1 AND B.j=2 AND A.i = B.i;

* sqlite:///hw1.db

Done.

DotProduct

113

Part (c): Matrix multiplication [10 points]

The product of a matrix A

(having dimensions n×m

) and a matrix B

(having dimensions m×p

) is the matrix C

(of dimension n×p

) having cell at row i

and column j

equal to:

Cij=∑mk=1AikBkj

In other words, to multiply two matrices, get each cell of the resulting matrix C

, Cij

, by taking the dot product of the i

th row of A

and the j

th column of B

.

Write a single SQL query to get the matrix product of A

and B

(in the same format as A

and B

):

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT A.i AS i, B.j AS j, sum(A.val*B.val) AS val

FROM A,B

WHERE A.j=B.i

GROUP BY A.i, B.j;

* sqlite:///hw1.db

Done.

i j val

0 0 106

0 1 80

0 2 171

1 0 146

1 1 109

1 2 212

2 0 23

2 1 17

2 2 55

Part (d): Matrix power [10 points]

The power An

of a matrix A

is defined as the matrix product of n

copies of A

.

Write a single SQL query that computes the third power of matrix A

, in other words, A3=A⋅A⋅A

:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

with A_squared as (

SELECT A1.i AS i, A2.j AS j, sum(A1.val*A2.val) AS val

FROM A as A1, A as A2

WHERE A1.j=A2.i

GROUP BY A1.i, A2.j

)

SELECT A_squared.i as i, A3.j as j, sum(A_squared.val * A3.val) as val

FROM A_squared, A as A3

WHERE A_squared.j=A3.i

GROUP BY A_squared.i, A3.j;

* sqlite:///hw1.db

Done.

i j val

0 0 1767

0 1 1065

0 2 2065

1 0 2396

1 1 1463

1 2 2745

2 0 350

2 1 190

2 2 467

Problem 2: The Sales Database [35 points]

We’ve prepared and loaded a dataset related to sales data from a company. The dataset has the following schema:

Holidays (WeekDate, IsHoliday)

Stores (Store, Type, Size)

TemporalData(Store, WeekDate, Temperature, FuelPrice, CPI, UnemploymentRate)

Sales (Store, Dept, WeekDate, WeeklySales)

Before you start writing queries on the database, find the schema and the constraints (keys, foreign keys).

Part (a): Sales during Holidays [10 points]

Using a single SQL query, find the store(s) with the largest overall sales during holiday weeks. Further requirements:

Use the WITH clause before the main body of the query to compute a subquery if necessary.

Return a relation with schema (Store, AllSales).

Write your query here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

WITH Holiday_Sales as (

SELECT Stores.store as Store, sum(Sales.WeeklySales) as HolidaySales

FROM Stores, Sales, Holidays

WHERE Sales.Store = Stores.Store and Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘TRUE’

GROUP BY Stores.Store

)

SELECT Store, HolidaySales as AllSales

FROM Holiday_Sales

ORDER BY HolidaySales DESC

LIMIT 1;

* sqlite:///hw1.db

Done.

Store AllSales

20 22490350.81000001

Part (b): When Holidays do not help Sales [15 points]

Using a single SQL query, compute the number of non-holiday weeks that had larger sales than the overall average sales during holiday weeks. Further requirements:

Use the WITH clause before the main body of the query to compute a subquery if necessary.

Return a relation with schema (NumNonHolidays).

Write your query here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

WITH Non_Holiday_Sales as (

SELECT Sales.WeekDate as date, Sum(Sales.WeeklySales) as NonHolidaySales

FROM Sales, Holidays

WHERE Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘FALSE’

GROUP BY Sales.WeekDate

), Holiday_Sales as (

SELECT Sales.WeekDate as date, Sum(Sales.WeeklySales) as HolidaySales

FROM Sales, Holidays

WHERE Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘TRUE’

GROUP BY Sales.WeekDate

), Holiday_Avg as (

SELECT AVG(Holiday_Sales.HolidaySales) as Average_Sales

FROM Holiday_Sales

)

SELECT COUNT(*) as NumNonHolidays

FROM Non_Holiday_Sales, Holiday_Avg

WHERE Non_Holiday_Sales.NonHolidaySales > Holiday_Avg.Average_Sales;

* sqlite:///hw1.db

Done.

NumNonHolidays

8

Part (c): Total Summer Sales [10 points]

Using a single SQL query, compute the total sales during summer (months 6,7,and 8) for each type of store. Further requirements:

Return a relation with schema (type, TotalSales).

Hint: SQLite3 does not support native operations on the DATE datatype. To create a workaround, you can use the LIKE predicate and the string concatenation operator (||). You can also use the substring operator that SQLite3 supports (substr).

Write your query here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT Stores.Type as type, SUM(Sales.WeeklySales) as TotalSales

FROM Stores, Sales

WHERE (Sales.WeekDate LIKE ‘%-06-%’ or Sales.WeekDate LIKE ‘%-07-%’ or Sales.WeekDate LIKE ‘%-08-%’) and Stores.Store = Sales.Store

GROUP BY Stores.Type;

* sqlite:///hw1.db

Done.

type TotalSales

A 1211554899.8500097

B 561610722.3999932

C 112555450.65999933

Problem 3: The Traveling SQL Server Salesman Problem [35 points]

SQL Server salespeople are lucky as far as traveling salespeople go- they only have to sell one or two big enterprise contracts, at one or two offices in Wisconsin, in order to make their monthly quota!

Answer the following questions using the table of streets connecting company office buildings.

Note that for convenience all streets are included twice, as A→B

and B→A

. This should make some parts of the problem easier, but remember to take it into account!

%sql SELECT * FROM streets LIMIT 4;

* sqlite:///hw1.db

Done.

id direction A B d

0 F UW-Madison DooHickey Collective 7

0 R DooHickey Collective UW-Madison 7

1 F DooHickey Collective Gizmo Corp 2

1 R Gizmo Corp DooHickey Collective 2

Part (a): One-hop, two-hop, three-hop… [15 points]

Our salesperson has stopped at UW-Madison, to steal some cool new RDBMS technology from CS564-ers, and now wants to go sell it to a company _within 9 miles of UW-Madison and passing through no more than 3 distinct streets. Write a single query, not using WITH (see later on), to find all such companies.

Your query should return the schema (company, distance) where distance is cumulative from UW-Madison.

Write your query here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT Streets.A as company, Streets.d as distance

FROM Streets

WHERE Streets.B = ‘UW-Madison’ and Streets.d <= 9

UNION

SELECT Streets_1.A as company, (Streets_1.d + Streets_2.d) as distance

FROM Streets as Streets_1, Streets as Streets_2

WHERE Streets_2.B = ‘UW-Madison’

and (Streets_1.B = Streets_2.A)

and NOT(Streets_1.A = Streets_2.B)

and (Streets_1.d + Streets_2.d <= 9)

UNION

SELECT Streets_1.A as company, (Streets_1.d + Streets_2.d + Streets_3.d) as distance

FROM Streets as Streets_1, Streets as Streets_2, Streets as Streets_3

WHERE Streets_3.B = ‘UW-Madison’

and (Streets_1.B = Streets_2.A)

and (Streets_2.B = Streets_3.A)

and NOT(Streets_1.A = Streets_3.B)

and NOT(Streets_1.A = Streets_2.B)

and NOT(Streets_3.B = Streets_2.A)

and (Streets_1.d + Streets_2.d + Streets_3.d <= 9);

* sqlite:///hw1.db

Done.

company distance

DooHickey Collective 7

DooHickey Corp 9

Gadget Collective 9

Gadget Corp 6

Gizmo Corp 9

Part (b): A stop at the Farm [10 points]

Now, our salesperson is out in the field, and wants to see all routes- and their distances- which will take him/her from a company A

to a company B

, with the following constraints:

The route must pass through UW-Madison (in order to pick up new RDBMS tech to sell!)

A

and B

must each individually be within 2 hops of UW-Madison

A

and B

must be different companies

The total distance must be <=15

Do not use WITH

If you return a path A→B

, do not include B→A

in your answer!

In order to make your answer a bit cleaner, you may split into two queries, one of which creates a VIEW. A view is a virtual table based on the output set of a SQL query. A view can be used just like a normal table- the only difference under the hood is that the DBMS re-evaluates the query used to generate it each time a view is queried by a user (thus the data is always up-to date!)

Here’s a simple example of a view:

%%sql

DROP VIEW IF EXISTS short_streets;

CREATE VIEW short_streets AS

SELECT A, B, d FROM streets WHERE d < 3;

SELECT * FROM short_streets LIMIT 3;

* sqlite:///hw1.db

Done.

Done.

Done.

A B d

DooHickey Collective Gizmo Corp 2

Gizmo Corp DooHickey Collective 2

Gizmo Corp Widget Industries 1

Write your query or queries here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S2.d) as distance

FROM Streets as S1, Streets as S2

Where NOT(S1.A = S2.B)

and (S1.B = ‘UW-Madison’)

and (S2.A = ‘UW-Madison’)

and (S1.d + S2.d <= 15)

UNION

SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S1_Hop.d + S2_Hop.d + S2.d) as distance

FROM Streets as S1, Streets as S2, Streets as S1_Hop, Streets as S2_Hop

Where NOT(S1.A = S2.B)

and (S1.B = S1_Hop.A)

and (S1_Hop.B = ‘UW-Madison’)

and (S2_Hop.A = ‘UW-Madison’)

and (S2_Hop.B = S2.A)

and (S1.d + S1_Hop.d + S2_Hop.d + S2.d <= 15)

Union

SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S1_Hop.d + S2.d) as distance

FROM Streets as S1, Streets as S2, Streets as S1_Hop

Where NOT(S1.A = S2.B)

and (S1.B = S1_Hop.A)

and (S1_Hop.B = ‘UW-Madison’)

and (S2.A = ‘UW-Madison’)

and (S1.d + S1_Hop.d + S2.d <= 15)

UNION

SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S2_Hop.d + S2.d) as distance

FROM Streets as S1, Streets as S2, Streets as S2_Hop

Where NOT(S1.A = S2.B)

and (S1.B = ‘UW-Madison’)

and (S2_Hop.A = ‘UW-Madison’)

and (S2_Hop.B = S2.A)

and (S1.d + S2_Hop.d + S2.d <= 15);

* sqlite:///hw1.db

Done.

company_1 company_2 distance

Gadget Corp DooHickey Collective 13

Gadget Corp DooHickey Corp 15

Gadget Corp Gadget Collective 15

Gizmo Corp Gadget Corp 15

Part (c): Finding Triangles [10 points]

Finally, our salesperson wants to find a route that goes from company A

to company B

to company C

and then back to company A

with the following constraints:

A

, B

, C

must be different companies

Do not use WITH

Output each such route that you find once (use the id’s as a way to break ties)

Output the distance of the route

Write your query here:

“””

Expected output below- don’t re-evaluate this cell!

NOTE: A valid answer must work for ALL inputs of the given type,

not just this example. I.e. do not hardcode around this answer / etc!

“””

%%sql

SELECT S1.A as A, S2.A as B, S3.A as C, (S1.d + S2.d + S3.d) as distance

FROM Streets as S1, Streets as S2, Streets as S3

WHERE (S1.B = S2.A)

and (S2.B = S3.A)

and (S3.B = S1.A)

and (S1.id < S2.id)

and (s2.id < S3.id);

* sqlite:///hw1.db

Done.

A B C distance

Thing Industries Widget Collective GadgetCo 18