Part 8 - Graphs | rezarezvan.com

DAT038_8

In this part we’ll so-called graphs and their applications.

Introduction of Graphs

A graph is a set of vertices connected pairwise by edges. So a very natural thing is that these edged can be either undirected or directed!

A node can have a so-called degree describing how many in/outgoing edges it has. In the directed case we split this variable to ‘out degree’ and ‘in degree’.

Another basic term that we’ll encounter is a cycle. A cycle is just as it sounds, a cycle, meaning that you start in a certain node and after a walk you end up in the same node again.

Also, we can see the undirected graphs as directed graphs, but just having edges in both directions. We’ll mainly look at the use cases for directed graphs.

Pseudo Code for Graphs

There are a lot of different ADT/API’s for graphs, we’ll be using this:

1
class Graph<Vertices>:
2
    // Adds an edge to the graph
3
    add_edge(e: Edge<Vertices>)
4

5
    // Removes an edge from the graph
6
    remove_edge(e: Edge<Vertices>)
7

8
    // Returns true if the edge is present in the graph, otherwise false
9
    contains_edge(e: Edge<Vertices>) -> boolean
10

11
    // Returns all the edges which are connected with the vertices
12
    outgoing_edges(from: Vertices) -> Collection<Edge<Vertices>>
13

14
    // Returns the number of vertices present in the graph
15
    n_vertices() -> Int
16

17
    // Returns the number of edges present in the graph
18
    n_edges() -> Int

Now this depends on the Edge class which will be implemented as:

1
class Edge<Vertices>:
2
    from   : Vertices
3
    to     : Vertices
4
    weight : float = 1.0

Graph Representation

We can represent a graph by doing:

A set of edges
An adjacency list
An adjacency matrix

Representation: Set of edges

We can use set data structure for this. If the graph is undirected we only need to keep one pair of the direction. This is a quite good implementation for undirected graphs, since the complexity of finding all adjacent neighbors to a vertex v would be $\mathcal{O}(E)$, where E is the total number of edges in the graph.

It can look something like:

We can look at these as tuples, where $(v_1, v_2)$ means that $v_1$ has a connecting edge to $v_2$. This is directed, but as we said, if we have an undirected graph, then it is enough to store just one of them.

Representation: Adjacency list

We maintain a map from vertices to collections of edges, we could alternatively also do a collection of vertices.

If the graph is undirected, then we need to include both directions.

The complexity for iterating over all the vertices adjacent to a vertex is $\mathcal{O}(v_a)$, we can write this as $\mathcal{O}(degree(V))$. So it’s good, note this is if the lookup for the key is $\mathcal{O}(1)$.

Representation: Adjacency matrix

We maintain a 2d $V \times V$ boolean array. Where true represent a connecting edge.

The complexity of this is quite bad, since we need to loop through one of the dimensions, the complexity becomes $\mathcal{O}(V)$.

Representation

In most real-world cases we use adjacency lists

Implementation

So let’s try to implement a directed graph using an adjacency list!

1
class Graph<V>:
2
    all_edges    : Map<V, Collection<Edge<V>>>
3
    n_edges      : int
4
    all_vertices : Set<V>
5

6
    outgoing_edges(from: V) -> Collection<Edge<V>>:
7
        return all_edges.get(from)
8

9
    contains_edge(e: Edge<V>) -> boolean:
10
        return e in outgoing_edges(edge.from)
11

12
    add(e: Edge<V>):
13
        outgoing = outgoing_edges(e.from)
14

15
        if e not in outgoing:
16
            outgoing.add(e)
17
            n_edges += 1
18
            all_vertices.add(e.from)
19
            all_vertices.add(e.to)
20

21
    remove(e: Edge<V>):
22
        outgoing = outgoing_edges(e.from)
23
        if edge in outgoing:
24
            outgoing.remove(e)
25
            n_edges -= 1

Searching in Graphs

In many real-world applications, graphs will represent some sort of maze, or other structure, where we want to find a path. Sometimes the path with the least resistance/weight/length. There are two primary search algorithms for graphs.

So called Depth-First Search or DFS for short. The other called Breadth-First Search or BFS for short.

Let’s begin with DFS.

Depth-first search

Depth-first search is a way to traverse a graph systematically.

A short pseudocode of a DFS would be:

1
DFS(to visit a vertex v):
2
    mark v as visited
3
    recursively visit all unmarked vertices w adjacent to v

We’ve encountered DFS for trees, the difference here as we can see is that we ‘mark’ each vertex we visit. Because in a graph, there might be cycles, so we don’t want to be stuck in them.

Applications of using a DFS are usually, finding all vertices connected to a source vertex, or simply, finding a path between two vertices.

Implementation

To implement this we need data structures! To store all marked vertices, we will use a Set. To keep track of the path we’ve taken, we’ll use a map which contains vertices tuples.

1
recursiveDFS(v : V):
2
    add v to visited
3
    for edge in outgoing_edges(v):
4
        w = endpoint of edge
5
        if v not in visited:
6
            recursiveDFS(w)
7
            cameFrom[w] = v

Breadth-First Search

For our DFS we used an underlying stack for our recursive calls, the function stack.

BFS uses a queue instead of stack, so we have to implement this ourselves.

So a breakdown of what we have to do:

1
BFS(from a starting vertex s):
2
    put s into FIFO queue, mark as visited
3
    while queue not empty:
4
        dequeue a vertex v
5
        for each unmarked vertex w adjacent to v:
6
            enqueue w, mark it as visited

So really, the only difference for BFS in trees compared to graphs is that we have to keep track of the visited vertices.

Implementation

So a pseudocode implementation would look something like:

1
iterativeBFS(start: V):
2
    visited = new Set<V>
3
    agenda = new Queue<V>
4
    agenda.enqueue(start)
5

6
    while agenda is not empty:
7
        agenda.dequeue(v)
8
        if v not in visited:
9
            add v to visited
10
            for edge in outgoing_edges(v):
11
                w = endpoint of edge
12
                if w not in visited:
13
                    add w to agenda

If we want to retrace our steps, we would need a cameFrom Map as in the DFS.

Also, one thing to note, if we replace the queue, with a stack, we’ll just end up with a iterativeDFS instead!

BFS properties

In a BFS search, what do we actually get/perform? We get a path yes - but it’s always the shortest path from s to all other vertices! It actually does this in $\mathcal{O}(E + V)$

Implementation: Calculating the distance

In this implementation we’ll add a map that hold the distances between every vertex. (I also added that cameFrom map that I discussed earlier)

1
iterativeBFS(start: V):
2
    visited = new Set<V>
3
    agenda = new Queue<V>
4
    cameFrom = new Map<V,V>
5
    distTo = new empty Map<V, Int>
6

7
    agenda.enqueue(start)
8
    distTo[start] = 0
9

10
    while agenda is not empty:
11
        agenda.dequeue(v)
12
        if v not in visited:
13
            add v to visited
14
            for edge in outgoing_edges(v):
15
                w = endpoint of edge
16
                if w not in visited:
17
                    add w to agenda
18
                    cameFrom[w] = v
19
                    distTo[w] = 1 + distTo[v]

Example Problems

So now that we’ve seen DFS, BFS and understood graphs - let’s see what we can do with them.

A very important kind of graph is a DAG, or a Directed acyclic graph. This means the graph doesn’t contain any cycles.

This is especially useful for scheduling, for example. But we need to know if a graph contains a cycle - how can we detect a cycle?

That’s a very common problem that many applications implement into their programs. There are multiple ways of detecting a cycle in a graph, for example, we can use a DFS, and if we encounter a node that we’ve visited before, we know that we have a cycle.

Also, a good note here, If we ever want the preorder, post order or reverse post order we just modify our DFS by one line - we change where the enqueue/push goes.

1
DFS(v : V):
2
    visited.add(v)
3
    preorder.enqueue(v)
4
    for w in outgoing_edges(v):
5
        if w not in visited:
6
            visited.add(w)
7
    postorder.enqueue(v)
8
    reversePostorder.push(v)

Note that the reversePostorder is a stack, therefore we will get the reverse order.

There are many finding ‘connected components’ problems, here’s a good picture to illustrate what we can use and not:

Conclusion

So that’s it for graphs - in the next part we’ll cover minimum spanning trees and shortest path algorithms.