5/5 - (3 votes)

CSC 370

This assignment will be accepted electronically, and will be marked using the same schema as
the previous assignment. See the ‘Submission and Evaluation’ section below for details on the
submission process and expected formatting of your answers. For all of the questions below, your
answer must be one SQL query (including a terminating semicolon) which runs without errors
on the or PostgreSQL database servers. Note that
timeout errors (in which the server terminates your query for exceeding the maximum execution
time) are considered errors. Queries which have errors will receive a mark of zero. All queries
without errors will be marked out of two, with full marks given only to queries which produce the
correct output and contain no assumptions besides the data given in the question (see the advice
sections of assignment 2 for more details).
Some of the queries in this assignment involve floating point data. In the expected output shown
below, there might be slight differences in floating point values due to rounding and formatting
issues. You may assume that floating point values in your query results match the expected output
if they are equivalent to four decimal places (e.g. if the expected output is written as ‘0.1234’, the
values ‘0.123398’ and ‘0.1234567’ would both be considered correct, but the value ‘0.123’ would
Question 1: IMDB Queries [4 marks]
Create queries for each of the data retrieval problems below, using the imdb database. In the
questions below, any reference to ‘films’ refers to titles with title_type = ‘movie’.
(a) For each year between 2000 and 2017 (inclusive), list the primary name, production year,
rating and number of votes of the film or films which attained the highest rating among all
movies produced in that year which received at least 10000 votes. Both the rating and number
of votes are stored in the ratings table.
Expected Query Result
primary name year rating votes
Memento 2000 8.5000 942432
Gladiator 2000 8.5000 1095891
The Lord of the Rings: The Fellowship of the Ring 2001 8.8000 1370113
The Lord of the Rings: The Two Towers 2002 8.7000 1221886
The Lord of the Rings: The Return of the King 2003 8.9000 1349934
Anbe Sivam 2003 8.9000 10116
Black Friday 2004 8.6000 13513
Earthlings 2005 8.7000 14649
The Prestige 2006 8.5000 959259
The Lives of Others 2006 8.5000 290278
The Departed 2006 8.5000 975612
Like Stars on Earth 2007 8.5000 113421
The Dark Knight 2008 9.0000 1864795
Home 2009 8.6000 19621
Inception 2010 8.8000 1653611
The Intouchables 2011 8.6000 591006
The Dark Knight Rises 2012 8.4000 1269644
Django Unchained 2012 8.4000 1087769
CM101MMXI Fundamentals 2013 9.3000 38106
Interstellar 2014 8.6000 1120400
RangiTaranga 2015 8.7000 10286
The Mountain II 2016 9.6000 93071
Ayla: The Daughter of War 2017 9.1000 15807
(b) Select the primary name and episode count of each TV series (contained in the tv_series
table) for which at least 6000 episodes have been produced.
Expected Query Result
series name episode count
Days of Our Lives 10240
Ohayou Tokushima 9502
Coronation Street 9316
Neighbours 8911
Six O’Clock News 8646
The Price Is Right 8461
One O’Clock News 8100
Jeopardy! 7599
EastEnders 7393
The Bold and the Beautiful 7236
Home and Away 7081
Wheel of Fortune 6564
Gute Zeiten, schlechte Zeiten 6413
The Tonight Show Starring Johnny Carson 6037
The Young and the Restless 6024
Question 2: BC Ferries Queries [14 marks]
The queries you write below should work correctly on any of the BC Ferries databases (ferries_1month,
ferries_3months, ferries_6months, ferries_9months, ferries_1year or ferries_3years).
For comparison, sample output is shown for both ferries_1month and ferries_3years.
The following definitions apply to all of the parts below unless otherwise stated.
• The ‘duration’ of a sailing is defined to be the number of minutes between the scheduled
departure (not the actual departure) and the arrival of that sailing.
• Any reference to the ‘day of a sailing’ refers to the calendar day of the scheduled departure
(not the actual departure) of that sailing.
(a) For many routes, there are simultaneous sailings in both directions at the same time. For
example, on a typical day at 9:00am, a ferry leaves Tsawwassen for Swartz Bay and a different
ferry leaves Swartz Bay for Tsawwassen. Not all routes or sailings have this property. We will
define two vessels as ‘paired up’ if they have both served the same route number at the
same (scheduled) departure time/date. Construct a query to count the number of times each
distinct pair of ferries have been paired up. Your result must not contain counts for pairs
of vessels which have never been paired up. Each distinct pair of ferries should appear only
once in your result, and in the result rows, the vessel names of each pair must be ordered
alphabetically (so the alphabetically lowest vessel name will be listed first).
Expected Query Result (ferries 1month)
vessel1 vessel2 num pairings
Coastal Inspiration Queen of Alberni 158
Spirit of British Columbia Spirit of Vancouver Island 10
Expected Query Result (ferries 3years)
vessel1 vessel2 num pairings
Coastal Inspiration Queen of Alberni 5449
Spirit of British Columbia Spirit of Vancouver Island 3552
Queen of Cowichan Queen of Oak Bay 3048
Coastal Celebration Queen of New Westminster 2261
Coastal Renaissance Queen of Oak Bay 1825
Coastal Celebration Spirit of British Columbia 1653
Coastal Celebration Spirit of Vancouver Island 1247
Coastal Renaissance Spirit of Vancouver Island 847
Coastal Renaissance Queen of Alberni 814
Coastal Celebration Coastal Renaissance 622
Coastal Renaissance Queen of New Westminster 594
Coastal Inspiration Queen of New Westminster 592
Queen of Coquitlam Queen of Cowichan 548
Queen of Coquitlam Queen of Oak Bay 481
Queen of Alberni Queen of Coquitlam 400
Queen of Coquitlam Queen of New Westminster 152
Coastal Renaissance Spirit of British Columbia 53
Coastal Celebration Coastal Inspiration 24
Queen of New Westminster Spirit of Vancouver Island 17
Coastal Inspiration Spirit of British Columbia 8
Queen of Coquitlam Queen of Surrey 3
Coastal Renaissance Queen of Cowichan 1
(b) The routes table contains the ‘nominal duration’ of each route, which is the expected crossing
time. The nominal duration is determined by BC Ferries based on the average marine and
traffic conditions, along with information like loading times and the speed of the vessels. We
can test the accuracy of this calculation by computing the average time of each crossing.
Construct a query to find, for each route number, the nominal duration (in minutes) and the
average duration (in minutes) of a crossing based on all available data for that route. For this
question, assume that the ‘duration’ of a particular sailing is the time between its scheduled
departure and its arrival.
Expected Query Result (ferries 1month)
route number nominal duration avg duration
1 95 101.2424
3 40 44.9415
4 35 33.2479
8 20 20.5182
30 120 129.0709
Expected Query Result (ferries 3years)
route number nominal duration avg duration
1 95 92.8867
2 100 106.8488
3 40 45.1520
4 35 31.8939
8 20 21.9507
30 120 123.0126
(c) Suppose we define a sailing to be ‘late’ if the duration is at least five minutes longer1
than the
nominal duration in the routes table. Construct a query to find, for each month, the number
of days for which at least one sailing occurred on Route 1 but no late sailings occurred on
route number 1.
Expected Query Result (ferries 1month)
month count
4 1
Expected Query Result (ferries 3years)
month count
1 65
2 53
3 58
4 29
5 19
6 23
7 22
8 18
9 43
10 45
11 51
12 60
(d) Construct a query to find, for each vessel with any sailings, the total number of sailings it
has made, the number of late sailings it has made (which may be zero) and the fraction of
its sailings that were late (that is, the number of late sailings divided by the total number of
sailings). You should be careful with this query: a vessel may be involved in multiple routes,
each with a different nominal duration.
1. A duration which is exactly five minutes longer is still considered late.
Expected Query Result (ferries 1month)
vessel name total sailings late sailings late fraction
Coastal Inspiration 232 166 0.7155
Coastal Renaissance 39 14 0.3590
Queen of Alberni 163 43 0.2638
Queen of Capilano 440 30 0.0682
Queen of Coquitlam 68 16 0.2353
Queen of Surrey 274 100 0.3650
Skeena Queen 234 22 0.0940
Spirit of British Columbia 164 98 0.5976
Spirit of Vancouver Island 94 23 0.2447
Expected Query Result (ferries 3years)
vessel name total sailings late sailings late fraction
Bowen Queen 1268 83 0.0655
Coastal Celebration 6506 406 0.0624
Coastal Inspiration 6545 2502 0.3823
Coastal Renaissance 5637 1220 0.2164
Mayne Queen 1 0 0.0000
Queen of Alberni 6903 1164 0.1686
Queen of Capilano 14277 2806 0.1965
Queen of Coquitlam 6650 2230 0.3353
Queen of Cowichan 7071 2946 0.4166
Queen of Cumberland 974 113 0.1160
Queen of New Westminster 3966 439 0.1107
Queen of Oak Bay 6609 4592 0.6948
Queen of Surrey 12759 4555 0.3570
Quinitsa 8 8 1.0000
Skeena Queen 7312 378 0.0517
Spirit of British Columbia 5539 1013 0.1829
Spirit of Vancouver Island 5916 791 0.1337
(e) Usually, you would expect that when the beginning of a sailing is delayed (that is, when
actual_departure is much later than scheduled_departure), the arrival is also delayed
and the sailing is late. However, in some cases, a vessel that departs late may still arrive on
time. Define a ‘made up sailing’ to be any sailing which leaves at least 15 minutes after its
scheduled departure but arrives less than (or equal to) five minutes late. Write a query to list
the number of made up sailings in the dataset for each vessel. Only vessels with at least one
made up sailing should be listed.
Expected Query Result (ferries 1month)
vessel name made up sailings
Coastal Inspiration 4
Coastal Renaissance 2
Spirit of British Columbia 3
Expected Query Result (ferries 3years)
vessel name made up sailings
Coastal Celebration 109
Coastal Inspiration 70
Coastal Renaissance 53
Queen of Alberni 11
Queen of Coquitlam 7
Queen of Cowichan 1
Queen of New Westminster 28
Queen of Oak Bay 1
Skeena Queen 2
Spirit of British Columbia 57
Spirit of Vancouver Island 75
(f) Define a ‘good day’ for a particular route number to be a day where at least one sailing
occurred and no late sailings occurred. For each route, find the maximum number of good
days in a row over the entire dataset.
Note that the rather unimpressive results for the ferries_1month dataset compared to the
other datasets seem to be the result of COVID-19 related service reductions and extra precautions (which may be increasing the number of late sailings).
Hint: This is probably easiest when you use a large number of CTE expressions (that is,
subqueries in the WITH clause).
Expected Query Result (ferries 1month)
route number max consecutive good days
1 1
2 0
3 1
4 3
8 4
30 0
Expected Query Result (ferries 3years)
route number max consecutive good days
1 15
2 2
3 4
4 27
8 8
30 6
(g) For each route, output all date ranges where the maximum number of consecutive good days
(as defined in part (f) above) was achieved. For some routes, only one date range will meet this
criteria, but for others (e.g. route 2 in the ferries_3years dataset) there may be multiple
date ranges of the same size. The columns containing the start and end dates should contain
values of type DATE (created, for example, by the make_date function).
Expected Query Result (ferries 1month)
route number start date end date max consecutive good days
1 2020-04-26 2020-04-26 1
3 2020-04-15 2020-04-15 1
4 2020-04-19 2020-04-21 3
8 2020-04-29 2020-05-02 4
Expected Query Result (ferries 3years)
route number start date end date max consecutive good days
1 2017-12-21 2018-01-04 15
2 2017-05-20 2017-05-21 2
2 2017-09-26 2017-09-27 2
3 2018-02-24 2018-02-27 4
4 2018-09-02 2018-09-28 27
8 2019-03-06 2019-03-13 8
30 2017-09-23 2017-09-28 6
Question 3: VWSN Queries [10 marks]
Create queries for each of the data retrieval problems below, using the vwsn 1year database.
(a) Find the highest observed temperature in the dataset, along with the station number, station
name and observation time of all cases where that temperature was reported.
Note: The expected output below is correct (that is, the observation shown is actually in the
dataset, even though it appears to be unseasonably warm).
Expected Query Result
station id name temperature observation time
180 Captain Meares Elementary Secondary School 42.6000 2019-10-05 15:11:00
(b) For each station with station ID between 1 and 10 (inclusive), list the station ID, station name,
maximum temperature observed at that station and observation time of all observations in
the dataset in which the maximum temperature was attained at that station. Only include
stations which actually have recorded observations in the dataset.
Expected Query Result
station id name max temperature observation time
1 Ian Stewart Complex/Mt. Douglas High School 29.8000 2019-08-05 16:31:00
3 Strawberry Vale Elementary School 30.2000 2019-06-12 15:15:00
3 Strawberry Vale Elementary School 30.2000 2019-06-12 15:21:00
4 Oaklands Elementary School 30.3000 2019-06-12 14:46:00
4 Oaklands Elementary School 30.3000 2019-06-12 14:52:00
5 Cedar Hill Middle School 30.2000 2019-06-12 17:12:00
6 Marigold Elementary School/Spectrum High School 29.8000 2019-06-12 14:41:00
7 Campus View Elementary 29.6000 2019-06-12 17:22:00
8 Victoria High School 29.7000 2019-06-12 14:41:00
9 Frank Hobbs Elementary School 29.2000 2019-08-05 17:27:00
10 Macaulay Elementary School 27.3000 2019-06-12 12:26:00
(c) Find the IDs and names of all stations which have reported at least one observation at some
point, but which did not report any observations in January 2020.
Expected Query Result
station id name
13 Shoreline Middle School
16 Tillicum Elementary School
31 Sangster Elementary School
32 Colwood Elementary School
36 Hans Helgesen Elementary School
57 Wishart Elementary School
62 Deep Cove Elementary School
70 Parkland Secondary School
81 Braefoot Elementary School
94 Pender Islands Elementary and Secondary School
103 Frances Kelsey Secondary School
105 Port Renfrew Elementary School
108 Alberni Weather
109 Brentwood Elementary School
112 Saturna Elementary School
113 Mayne Island School
117 Saltspring Elementary and Middle Schools
124 Seaview Elementary School
131 Glenlyon Norfolk Junior School
133 Discovery Elementary School
136 Pleasant Valley Elementary School
160 Shawnigan Lake School
161 Bamfield Marine Sciences Centre
166 Maquinna Elementary School
180 Captain Meares Elementary Secondary School
195 North Island Distance Education School
196 Valley View Elementary School
226 Ditidaht Community School
(d) In this question (and the next question), define the ‘daily average temperature’ for a particular
day to be the average of all observations from all stations on that day. Furthermore, for each
month, define the ‘10 hottest days’ to be the top 10 days (by daily average temperature) and
define the ‘10 coolest days’ to be the bottom 10 days (by daily average temperature). For each
month/year pair in the dataset, compute the average daily average temperature across the ten
hottest days and the average daily average temperature across the ten coolest days. Notice
that you are computing an average of averages (first, find the average daily temperatures of
each of the ten hottest days, then average those temperatures to produce the result). Hint:
Use the rank() function with an appropriate over() clause (maybe in multiple places)
Expected Query Result
year month hottest10 average coolest10 average
2019 5 16.8834 12.0071
2019 6 17.8665 13.8339
2019 7 19.0044 15.9400
2019 8 19.7309 16.5326
2019 9 17.0929 12.3717
2019 10 10.8171 6.5475
2019 11 9.2537 3.6983
2019 12 7.3156 4.1378
2020 1 7.8368 0.9990
2020 2 6.3289 3.1054
2020 3 7.3842 3.7168
2020 4 11.5621 7.1426
(e) List the day, month and year (in separate columns) of all days whose daily average temperature (see previous question) was lower than the daily average temperature of each of the
previous 28 days in the dataset. The result should not list any of the first 28 days in the
dataset, since the metric cannot be computed for those days, but the data for those days
should be used to compute the rest of the result. Hint: You may want to use both the min()
and count() aggregation functions in combination with an over() clause that includes both
partitioning and windowing.
Expected Query Result
year month day
2019 8 21
2019 8 23
2019 9 12
2019 9 14
2019 9 15
2019 9 16
2019 9 17
2019 9 23
2019 9 27
2019 9 30
2019 10 1
2019 10 8
2019 10 9
2019 10 10
2019 10 28
2019 10 29
2019 10 30
2019 11 26
2019 11 28
2019 11 29
2020 1 9
2020 1 12
2020 1 13
2020 1 14
2020 3 14
Advice: Don’t Plagiarize
You are encouraged to discuss solution methods with your peers, and even to look up possible
solution ideas on the internet, but all of your submitted queries must be your own work. As a rule
of thumb, to ensure you do not accidentally plagiarize, do not look at anyone else’s queries (or
allow them to see yours). Additionally, if you use a version control system (such as Github) to store
your work, ensure that the repository is private. If your queries are posted in a public setting (even
inadvertantly), you may become entangled in any ensuing academic integrity investigation if your
work is copied by someone else.
Submission and Evaluation
This assignment will be marked through a combination of automated testing (that is, running your
submitted queries and examining the result) and human inspection. To expedite the marking process
(and ensure consistency between different student submissions), you are required to submit your
queries for each question inside of premade query template files. Three empty template files have
been posted to conneX: a4q1_queries.txt, a4q2_queries.txt and a4q3_queries.txt. Please
place your query for each subquestion in the indicated spaces in the templates before submitting
(we will be using an automated system to extract each query individually for marking, so failure to
comply with this requirement will likely result in you receiving a mark of zero for any queries which
do not meet the formatting requirements). Notice that the provided files have the .txt extension
instead of .sql (which would normally be used for SQL queries); this is due to a technical limitation
of conneX (which does not properly handle files with the .sql extension when they are submitted,
probably because it sees them as a security risk).
You are required to submit your answers to each question in three files called a4q1_queries.txt,
a4q2_queries.txt and a4q3_queries.txt, using the provided templates as a starting point. Your
answer for each query must consist of a single SQL statement (which may having a WITH clause
containing multiple subqueries, and/or several SELECT statements joined by set operators). As a
point of reference, your answer is a ‘single SQL statement’ if it contains only one semicolon (at the
end of the query). Although your files will be associated with your account, please ensure that each
submitted file contains a comment with your name and student number.
You are permitted to delete and resubmit your submission as many times as you want before the
due date, but no submissions or resubmissions will be accepted after the due date has passed. You
will receive a mark of zero if you have not officially submitted your assignment (and received a
confirmation email) before the due date.
Only the files that you submit through conneX will be marked. The best way to make sure your
submission is correct is to download it from conneX after submitting and test it. You are not
permitted to revise your submission after the due date, and late submissions will not be accepted,
so you should ensure that you have submitted the correct version of your code before the due date.
conneX will allow you to change your submission before the due date if you notice a mistake. After
submitting your assignment, conneX will automatically send you a confirmation email. If you do
not receive such an email, you did not submit the assignment. If you have problems with
the submission process, send an email to the instructor before the due date.

Scroll to Top