CS 202: DATA STRUCTURES

ASSIGNMENT 5

GRAPH THEORY

Ali and Mannan are procrastinating instead of working on their project.

Mannan has come across a play titled “Six Degrees of Separation” written by

American playwriter John Guare that says that every person in world is

connected to everyone else in the world by a chain of no more than six

connections. Mannan asserts that Ali definitely knows Donald Trump by way less

than 6 connections, whereas Ali says even if that is true Mannan will be related

to Donald Trump by lesser connections than him. Things escalate and now they

desperately want to prove their points. Hadi, who is sitting nearby interjects that

this problem can be easily solved using graph theory. Ali and Mannan quickly

bring their Googling skills in use and find a subset of dataset of connections

made public by a popular social media company. But, since they did not pay

attention during their Data Structures classes, they do not know how to make

useful deductions from the dataset. They need your help to settle the feud

between them.

Part 1: Parse the dataset:

You are provided with the mentioned dataset in the form a text file, ‘friends.txt’.

You can explore the dataset provided to you, lines following the first line contains

names of persons. You’ll use node class to represent a person. Lines following the

line Connections is the connection between persons. You will use edge class to

represent a connection between two people.

You have to parse the dataset in two forms:

1) If Boolean variable isUnitLenght is on, all edges must have weight equal to

1

2) If it is off, you must pick the value provided to you with each connection.

This value is called index of friendship. It is measured by how much two

people interact with each other. Lesser this value is for two people,

closer those people are to each other.

Part 2: Test Reachability:

Before finding the route between two people, you must check if these two people

are even connected to each other according to our dataset. You have studied two

Reachability algorithms namely BFS (breadth first search) and DFS (depth first

search). You can implement either of these, but you will have to provide reason

why you went with your chosen algorithm. Your algorithm will take name of first

person (i.e. your starting node) and toFind person (person you are testing

reachability for) as input and must return true if these two people are connected

and false otherwise.

Part 3: Shortest Path Algorithm:

You have studied Dijkstra’s well renowned algorithm for shortest path in class.

You will implement Dijkstra’s algorithm in this part to find shortest path between

two people. It will take name of starting person, last person as input and will

return a vector containing names of all people along the path. The additional

variable d is for path length.

Dijkstra Primer:

To find shortest path, one approach can be to find all the possible paths

and then look for shortest among them. But, this algorithm is not scalable and

have an exponential execution time. So, we move towards a smarter algorithm

called Dijkstra’s Algorithm named after its creator Edsgar Dijkstra.

The idea is to keep a queue of all paths constructed so far, prioritized in terms of

distance (A perfect use for the priority queue). At each stage, you dequeue the

shortest path; if it leads to the destination, your work is done. Otherwise, you use

that path as the basis for constructing new extended paths that add an edge

leading away from the node at the end of the path. You discard the extended

path if you already know a better way to get there (i.e. you have already found a

shorter path to that node), otherwise you enqueue the extended path to be

considered with the others. After updating the queue, you dequeue the next

shortest path and explore from there. At each step, the search expands outward

until you find a path that ends at the destination. The graphs in the data files are

fully connected, meaning, every pair of nodes is connected by some path, and

thus you will eventually find a path. The order of the paths considered by the

algorithm guarantees that you will find the shortest such path first.

Part 4: Minimum Spanning Tree:

MST finds the edges in graph with minimum cumulative weight that connects the

whole graph together. MSTs have a variety of applications such as minimizing

cost of some operation, finding clusters in graphs, finding bottlenecks in graphs

etc. You will use Kruskal’s algorithm to find minimum spanning tree in this part.

This function takes no input and returns a vector of nodes, where each node will

have only the edges corresponding to MST.

Kruskal Primer:

Joseph Kruskal devised one of the simplest MST algorithms in 1956. It is

based on the idea that you consider the edges in the graph in order of increasing

cost. If the nodes at the endpoints of the edge are unconnected, then you

include this edge as part of the spanning tree. If, however, some path already

connects the nodes, you can discard this edge.

Given below is a primer that will help you get started:

The tricky part of this algorithm is determining whether a given edge

should be included. The strategy you will use is based on tracking

connected sets.

Initially, each node is in its own set all by itself. Which means there will be

“n” sets comprising of individual nodes. Each set contains the nodes,

which are connected by some edge.

The algorithm then considers each edge, e, in turn, ordered by increasing

weight. (Use getMin() of the priority queue for this)

If an edge e connects two different sets, then e is added to the set of

edges of the minimum spanning tree, and the two sets of nodes that are

now connected by e are merged into a single set.

If, on the other hand, e connects two vertices/nodes that are already a

path between them using some earlier edge, then e is discarded. (Check

to see if the two end points of the edge e belong in the same set)

Once the algorithm has added enough edges to form a spanning tree

(meaning that only one set of nodes remain and that set contains all the

nodes), it terminates and outputs this tree as the minimum spanning tree.

Part 5: Bonus – Another MST Algorithm:

This part is a bonus. You will use another algorithm to construct an MST.

This algorithm is called Prims Algorithm.

Prims Primer:

Prim’s is a greedy algorithm which works by selecting the cheapest

adjacent vertex while looking at a single vertex. You are encouraged to use

binary heap for selecting the cheapest adjacent vertex. Here, the word

“cheapest” means the one with the smallest edge cost. Pick out a starting point

and keep on adding vertices until the heap is empty. You will need to initialize

keys of all vertices except the starting point to -INF, and update the key of a

given vertex when it is discovered by its neighboring vertex. Take a look at

https://en.wikipedia.org/wiki/Prim%27s_algorithm#Description for more

examples and description of the algorithm.

Part 6: Deductions:

Run all the above algorithms in context of following questions in main of

Graph.cpp and answer following questions. You may answer these questions in

form of comments at end of Graph.cpp file.

1) Check reachability for following nodes and deduce nature of the dataset.

i- Mannan, Ali

ii- Mannan, Trump

iii- Ali, Trump

2) For a graph with unit length weights, how many hops is Ali away from

Trump?

3) What about Mannan?

4) Who has smaller value of path in terms of index of friendship?

5) Run the MST on both unit weight graph and weighted graph. Could there

exist more than one MST for one of the graphs? What implications can you

draw from result of both runs? What benefit can a social media website

have from the MSTs you have produced? Can you think of other

applications of MST in terms of social connection graphs?

Some useful libraries:

Priority Queues in queue: https://www.geeksforgeeks.org/priority-queue-in-cppstl/

Sets: https://www.geeksforgeeks.org/insertion-deletion-stl-set-c/

Vectors: http://www.cplusplus.com/reference/vector/vector/

You are welcome to use templated version of heap you implemented in PA4 as a

replacement for priority queue and your own implementation/replacement of set.

Best of luck! �