Homework 4



Homework 4

1 (50 points)
Suggested modules: requests, lxml or re
In this part of the assignment, your job is to create a script that will navigate to the FSU CS
department website and print out a directory containing the telephone number, office location,
email, and webpage of every faculty member in the department. The root page for the faculty
listings is
From this page, you can find links to every individual faculty page. Each individual faculty page
lists the links that you need. However, the only link that you may hardcode into your file is the one
above. You also may not hardcode faculty names. All other information from this point must be
retrieved using crawling and scraping methods. Here is some guidance:
1. First, try to gather all of the links to the individual faculty pages.
2. Navigate to the first page and scrape the data to be output to the user.
3. Repeat step 2 until all faculty information has been output to the user. An example of what
your output should look like is shown below. Missing information should be indicated with an
“N/A” entry. Be careful of edge cases or inconsistencies, you may need to be a little creative –
double check your output against the web pages!
[email protected]$ python
Name: Sudhir Aggarwal
Office: 263 Love Building
Telephone: (850) 644 0164
E-Mail: sudhir [ at cs dot fsu dot edu ]
Name: Theodore Baker
Office: N/A
Telephone: N/A
E-Mail: baker [ at cs dot fsu dot edu ]
Name: Mike Burmester
Office: 268 Love Building
Telephone: (850) 644-6410
E-Mail: burmeste [ at cs dot fsu dot edu ]

2 (50 points)
Suggested modules: requests, json
Your job in this part of the assignment is to sort through Imgur user comment data. Your
program should begin by prompting the user to enter a username of an Imgur account.
An Imgur user page can be found at<username>, but the comments on
this page are loaded dynamically which can make it very hard to pull data. The dynamically
loaded content, however, is systematic and easy to retrieve. All of the user’s comments can be
found at<username>/index/newest/page/<num>/hit.json?scrolling
where <num> is a counter that starts at 0 and increases as needed. When there are no more
comments to receive, the next page in the sequence will simply contain an empty string. For
example, navigate to the page
Now, increase the page counter in the address by 1. If we set the page counter to 100, for
instance, we’ll see that the page is empty. However, there’s no way to know beforehand how
many pages each user requires.
If the username does not exist or has not been used to post any comments, you may simply
print a message and end the program. Otherwise, you should sort through the user’s comment
data to find the top 5 comments (i.e. the comments with the most points). Your output should
list these comments in descending order of points, detailing the post identifier (the “hash”), title
of the post commented on, number of points received, and timestamp or the comment. In the
case of a tie between points, break the tie by comparing “hash” values.
[email protected]$ python
Enter username: LastAtlas
1. XJ7xbSk
Points: 19
Title: This man pulled, pushed and lifted his disabled twin brother
through an IronMan. Here is a touching picture of them at the finish
Date: 2014-08-24 15:36:54
2. wEF0R
Points: 11
Title: What a guy
Date: 2015-08-02 04:00:30
3. ZsqSJ
Points: 7
Title: MRW I find out I am unknowingly sharing a BF
Date: 2015-01-25 15:43:58
4. HfrBZSJ
Points: 5
Title: This is just magnificent
Date: 2014-02-17 03:42:57
5. NLiuVXC
Points: 5
Title: My wife…
Date: 2014-02-17 09:58:07


There are no reviews yet.

Be the first to review “Homework 4”

Your email address will not be published.