Demystifying Graphs

Graph is one of the most significant data structures in Computer Science to help us visualize, represent and solve a lot of real world problems. Yet I find this to be one of the most dreaded topics in Computer Science. I’ve come across a lot of students and even on-the-job engineers take a stab at unraveling the mysteries of a graph data structure as part of their interview preparations (Yes! Graph is a very popular interview topic, at times, by teams who don’t even use them..but let’s keep that story for another post), but fail to analyze whether a given problem can be efficiently solved using a graph data structure or not. I feel one of the reasons we, as engineers, are afraid of this topic is because of our lack of familiarity with this data structure. We might be aware of the latest and greatest graph based algorithms. However unless we can map a real world problem to a graph data structure, all that algorithm knowledge is useless. Although the world of graphs is a sea in itself, let’s limit ourselves to its basics – specifically how to visualize and represent them in our code. More importantly, towards the end, we’ll see how to map a real world problem to a graph, because once that mapping is successfully done, it is usually much easier to tackle the problem.

Definition of a Graph

To put it in simple words, a graph is just a collection of nodes(also called vertices) and edges(connections between those nodes). If I may use Facebook as an example of a social graph, your Facebook profile can be represented as one node. Similarly all of your friend’s profiles are represented as individual nodes. This means that if you have 100 friends in your Facebook friend’s list, then there are 101 nodes(1 node corresponding to your profile + 100 friends’ profiles) in Facebook’s social graph. Now how does Facebook represent that these 100 profiles(nodes) are connected to you(as friends)? That’s where edges come into picture. Facebook will connect your graph node with each of your 100 friends’ node with a single edge(or a line). So they will need 100 edges to represent the friendships between you and your 100 friends. Fig.1 below shows an example representation of a subset of the Facebook’s giant graph database. Note that nodes are numbered 0(yourself), 1 to 100(friends). While the nodes are generally represented as circles, edges are represented as a line or an arrow(as we’ll see very soon).

Fig. 1: A simple example of Facebook’s social graph

Mathematically, a graph is represented as G=(V,E) meaning a graph(G) is a collection of a set of vertices(V) and a set of edges(E). I hope this mathematical representation now makes sense to you.

Types of Graphs

While there are different ways of classifying a graph, we’ll look at two of the most common ways of classifying them.

Directed vs Undirected graph

As the name suggests, a directed graph is one in which all the edges(between the vertices) have an associated direction. The edge in a directed graph is usually represented by an arrow. Eg: A –> B is a directed graph with two vertices(A and B) connected by an edge pointing from vertex A to vertex B. The direction of the edge denotes that the graph can be traversed only from vertex A to vertex B and not in the reverse direction. A practical example for a directed graph can be two houses(A and B) connected via a one-way street.

An undirected graph, on the other hand, is one where the edges don’t have any specific direction(or in other words the edges are bidirectional). The edge in an undirected graph is usually represented by a straight line. If you notice the graph in Fig. 1, you’ll see that it is infact an undirected graph. Eg: A — B is an undirected graph with two vertices(A and B) connected by a bidirectional edge. This simply means that the graph can be traversed both from vertex A to B and from vertex B to A as per our requirement. A practical example for an undirected graph can be two houses(A and B) connected by a two-way street.

Cyclic vs Acyclic graph

A cyclic graph is one which contains atleast one cycle(or a loop). A cycle in a graph is defined as a condition where starting from a vertex, it is possible to traverse the graph(using its edges) and reach the same starting vertex. Fig. 2 shows a cyclic graph. Note that this example graph also happens to be a directed graph(denoted by edges represented as arrows). As a result, you can call it a directed cyclic graph. Similarly, by now, you should be able to draw an undirected cyclic graph as well.

Fig 2: A directed cyclic graph

An acyclic graph, as the name suggests, is one which does not contain any cycle. Note that Fig. 3 is very similar to Fig.2 except that one of its directed edges(between vertex 2 and 4) has been removed, thereby making the graph a directed acyclic graph now.

Fig. 3 : A directed acyclic graph

Here’s a fun fact: If you notice Fig. 3 carefully, does the structure of the graph look very similar to another data structure you might be familiar with?

If your guess was a tree data structure, then you’re absolutely right. But here’s a caveat! Even though Fig.3 looks very similar to a tree, strictly speaking, it is not a tree data structure. That’s because a tree data structure has two important characteristics:

  • The nodes in a tree always have an inherent direction associated to them, namely parentNode –> childNode.
  • A child node must have a single parent node, whereas a parent node can have multiple children.

With this in mind, let’s go back to Fig. 3 for a bit and re-analyse it to see why it’s not a tree. If we consider node 0 as the root of the tree, then based on direction of the arrow, node 0 is a parent of node 1, node 1 is a parent of node 2 and so on. Did you see the problem? Node 1 has two parents(node 0 and node 5).

So, if we reverse the direction of the arrow between node 1 and 5, does the graph become a tree? Not really! If you observe, with the arrow direction reversed, node 5 will then have two parents(node 1 and 4).

Alright, if we now switch the direction of arrow between node 5 and 4 as well, does the new graph(with two arrows reversed) now become a tree? Sure it does! Here is how it will look like.

Fig. 4: A directed acyclic graph (which is also a tree)

The fun fact was just to emphasize on the fact that a tree falls in the category of a directed acyclic graph. Hence all trees are directed acyclic graphs but not vice versa.

How can we represent Graphs?

In the previous section, we noticed how we can represent graphs on paper using circles, lines and arrows. But how can we represent them in our code? Before we jump into the possible representations, let’s spend a minute on the three most important considerations while choosing the appropriate graph representation to use while writing algorithms(on them):

  1. How much memory is needed by the graph representation?
  2. How long does it take to find out whether an edge is present in a graph?
  3. How long does it take to find out the neighbours of a given vertex in a graph?

If you’re familiar with algorithmic complexities, you’ll notice that the 1st criteria above deals with space complexity whereas the 2nd and 3rd ones deal with time complexity. As a result, in a real world application, the algorithm you want to write on your graph data structure will be one of deciding factors in your choice of graph representation.

Alright, now it’s time to look at various ways to programmatically represent a graph data structure. I’ll start off by defining three methods of representation and then introduce a sample graph(in Fig. 5) and go over its three possible representations.

  • Edge list: As the name suggests, an edge list is a list(or array, depending on your programming language of choice) which contains all the edges contained in the graph. The entries in an edge-list are unordered i.e. they are not in any particular order. Note that each entry is an array of two vertices(containing that edge).
  • Adjacency matrix: An adjacency matrix is a two dimensional V*V matrix where V=number of vertices in the graph. The value in each cell in the matrix denotes whether a graph edge exists(1) or not(0). The matrix serves as a look-up table – the value at the intersection of the X’th row and Y’th column(of the matrix) denotes the presence(value=1) or absence(value=0) of an edge between vertex X and vertex Y.
  • Adjacency list: An adjacency list is a combination of the previous two representations. It is an array of linked list where the index to the array is the graph vertex number(0,1,2,…etc.) and the linked list contains all the vertices which are adjacent(neighbours) to that indexed vertex.

Alright..I know without an example these definitions will not make too much sense. So, let’s take a look at a sample undirected cyclic graph in Fig. 5.

Representations of a sample graph

Fig. 5 : A sample undirected cyclic graph

The figure above shows a cyclic graph containing 10 vertices(numbered 0 to 9) and 15 undirected edges.

Edge list representation

Since there are 15 edges in the graph, an edge-list representation should contain an array with 15 entries where each entry denotes an edge. And how do we denote an edge? Simple…we just list down the two vertex numbers which contain that edge. So, here’s one of the possible edge list representations of the graph in Fig. 5.

// Note that the first entry [0,1] represents an edge between 
// vertex 0 and vertex 1. Since the graph is undirected, we 
// could as well have written that edge as [1,0]. Hence I said
// that this is one of the many possible edge-list representations.
[ [0,1], [0,6], [0,8], [1,4], [1,6], [1,9], [2,4], [2,6], [3,4], [3,5],
[3,8], [4,5], [4,9], [7,8], [7,9] ]

Adjacency matrix representation

Since there are 10 vertices, this representation uses a 10*10 matrix with each cell’s value being either a zero(0) or one(1) denoting the presence or absence of an edge respectively. The adjacency matrix(G) representation of the graph in Fig. 5 looks like this:

G = [ [0, 1, 0, 0, 0, 0, 1, 0, 1, 0],
      [1, 0, 0, 0, 1, 0, 1, 0, 0, 1],
      [0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
      [0, 0, 0, 0, 1, 1, 0, 0, 1, 0],
      [0, 1, 1, 1, 0, 1, 0, 0, 0, 1],
      [0, 0, 0, 1, 1, 0, 0, 0, 0, 0],
      [1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
      [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
      [1, 0, 0, 1, 0, 0, 0, 1, 0, 0],
      [0, 1, 0, 0, 1, 0, 0, 1, 0, 0] ]

If you notice, the edge between vertex 2 and 4, for example, is denoted by specifying a value of 1(i.e. edge present) for matrix cell G[2][4].

Another interesting point to note is that this matrix is symmetrical along its main diagonal(running from top-left to bottom-right). This means that if there is an undirected edge from vertex x to y , then (due to the bidirectional nature of an undirected edge) an implicit edge also exists from vertex y to x. So, for undirected graphs, if G[x][y] = 1 then G[y][x] will also be equal to 1.

If you give it a little thought, you’ll soon conclude that an adjacency matrix representation of a directed graph will not be symmetrical along its main diagonal.

Adjacency list representation

Since there are 10 vertices in the graph, the array will contain 10 rows. The value at each row will be a linked-list. The adjacency list representation of the graph in Fig. 5 looks like this:

// Note that the arrows here denote pointers to the next node 
// in the linked-list.
[ [1 --> 6 --> 8 --> null],
  [0 --> 4 --> 6 --> 9 --> null],
  [4 --> 6 --> null],
  [4 --> 5 --> 8 --> null],
  [1 --> 2 --> 3 --> 5 --> 9 --> null],
  [3 --> 4 --> null],
  [0 --> 1 --> 2 --> null],
  [8 --> 9 --> null],
  [0 --> 3 --> 7 --> null],
  [1 --> 4 --> 7 --> null] ]

If you notice the graph in Fig. 5, you’ll see that the vertex 0 is connected to vertices 1, 6 and 8. Hence the first linked-list in the adjacency list array denotes all those neighbouring vertices(1 –> 6 –> 8) for vertex 0. Let me also add that 1–> 6 –> 8–> null is just one possible representation. Another valid one could be 6–>1–>8–>null.

Hope these representations for an undirected graph make a lot more sense after working through this example. With this background, it should be easy to write down all the three representations for a sample directed graph. I’ll leave that as an exercise for you.

Let’s talk about complexities…

Now that we know about various possible representations of a graph data structure, how do we find out which one we should use to represent a given problem? Is there one which trumps all the other representations? Just like every similar question in Computer Science, the answer is..It really depends! The choice of representation will depend on the constraints of the system we are trying to design. Some systems might be low in memory but high on CPU or vice versa. In some cases, the graph of the input data might be really dense(lots of edges between the vertices). Whereas in other scenarios, we might be working with a sparse graph(with very limited edges between the vertices). As a result, in order for our graph algorithms to be most efficient, our choice of graph representation will vary for each use case that we’re trying to solve.

To give you a sense of how various scenarios might affect our design decisions, let me go back and analyze the three questions I had mentioned earlier in this post.

Let’s start with the first question:

How much memory is needed by each graph representation?

This question brings us to evaluate the space complexity of each graph representation.

I’ll limit myself to a directed graph with V vertices and E edges to calculate the space and time complexities below.

Space complexity

Space complexity for each graph representation is a measure of the amount of space needed to store the graph in memory.

Edge list: Since this is a list of all the E edges, the amount of space needed to store the list would be proportional to the number of edges i.e. Θ(E). Edge list is good for sparse graphs(with few edges) but might not be such a great choice for representing dense graphs.

Adjacency matrix: It is a V*V matrix containing 0’s and 1’s and so its size depends on the number of vertices and not in the number of edges, unlike an edge-list. It has a large overhead for sparse graphs since most of the entries in the matrix would contain a value of 0(i.e. absent edge). The space complexity for this representation is thus Θ(V^2).

Adjacency list: This is a list with V rows, where entries in each row correspond to the neighbours of that vertex. In other words, each entry in a row represents an edge. Upon a closer look, it will become clear that all entries in all the rows is a collection of all graph edges. As a result, the space complexity of an adjacency list for our graph is Θ(V+E).

Summarizing our analysis of space complexities in the form of a table, here is how it looks:

SPACE COMPLEXITYΘ(E)Θ(V+E)Θ(V^2)
Edge listYESNONO
Adjacency matrixNONOYES
Adjacency listNOYESNO

Time complexity

The next two questions deal with time complexities of algorithms working over a graph with V vertices and E directed edges. Let’s look at the second question.

How long does it take to find out whether an edge is present in a graph?

Basically the question is asking us to search for an edge in a given graph. I’ll analyze each of the representations one by one.

Edge list: Since the size of the list is E, in order to search for an edge(which might be a possible entry in the list), we will have to iterate over the entire list since the list is unordered. Hence the worst case time complexity of the edge search algorithm is O(E).

Adjacency matrix: It is a V*V matrix containing 0’s and 1’s acting as a look up table to search for an edge in the graph. Great!!! That’s exactly what we want as part of this question. Searching for an edge between vertex X and vertex Y is simply indexing into the matrix array G at G[X][Y] to check whether its a 0(edge absent) or 1(edge present). Hence the time complexity of search if O(1) or a simple look-up.

Adjacency list: Searching for a directed edge(from say vertex 0 to vertex 1) in an adjacency list would involve indexing into the array at index 0 to obtain all the neighbours of vertex 0. Next up, we’ll have to search among the nodes in the linked list at index 0 to see if vertex 1 exists in it. Note that the worst case complexity of this inner search is O(number-of-neighbours-of-vertex) or O(d) where d is defined as the degree(or number of neighbours) of a vertex. Can you figure out the worst case value of d in terms of known variables V and E? The worst case value of d would be when a vertex is connected to all other vertices(excluding itself) i.e. V-1.

Summarizing our analysis of time complexities for edge search in the form of a table, here is how it looks:

Edge Search: TIME COMPLEXITYO(1)O(d)O(E)
Edge listNONOYES
Adjacency matrixYESNONO
Adjacency listNOYESNO

Alright..let’s head over to the third question and give it a try.

How long does it take to find out the neighbours of a given vertex in a graph?

Edge list: To find out the neighbours of a given vertex(say X), we’ll have to iterate over every element in the list and check if the element(edge) is connected to vertex X or not. Essentially the worst case time complexity for getting all neighbours would be O(E).

Adjacency matrix: Getting all neighbours of vertex X would mean indexing into the matrix row corresponding to vertex X. This is an O(1) operation. Once indexed, we will have to iterate over all the column entries(corresponding to that row) to see which of those are 1. The entries which are 1 will be the neighbours of the vertex X. This second iteration step is O(V) since there are V columns. Overall the time complexity for getting all neighbours is O(V).

Adjacency list: Getting a list of all neighbours in an adjacency list would involve indexing into the array corresponding to vertex X in O(1) time. Next up, to list down all the neighbours, we simply iterate over the linked-list stored at the row we just indexed. The complexity of iteration is O(d) and since worst case value for d is (V-1), the overall worst case comlexity for getting all neighbours would be O(V-1) or asymptotically it would be O(V).

Summarizing our analysis of time complexities(in a table format) for getting all neighbours of a given vertex, here is how it looks:

Get all neighbours of a vertex: TIME COMPLEXITYO(E)O(V)
Edge listYESNO
Adjacency matrixNOYES
Adjacency listNOYES

Conclusion

I hope, with the help of this post, I was able to provide you with a decent understanding of a graph data structure, its possible representations and the thought process to keep in mind in order to decide the best representation given the constraints of your system. I don’t consider this post as complete yet since it is important to look at a practical example and try to convert it into a graph problem. But since this post is long already, I will come up with a follow up post(with the leftover in part 2) very soon.

Until then..have fun exploring the world of graphs!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started