| home pub


-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Post 83

Down to the nitty-gritty of graph adjacency implementation


The core implementation of the adjacency scheme in a graphical abstract data type is a rich source of discussion. Common knowledge, i.e., Stackoverflow knowledge, indicates that a matrix is usually good if the graph in question is dense (squared space complexity is worth it at the advantage of random-access time complexity). Instead, if the graph is sparse, a linked list is more suitable (linear space complexity at the expense of linear-access time complexity). But is this actually a fair tradeoff?

Part of these arguments is due to considering all of its elements internally or managing some of them externally. Matrix-based graphs are only fast to access if it is assumed that the storage positions for the nodes are already known. Since computers don't know (nor assume) things per se, they generally have to be instructed step by step. Thus, the matrix-based approach (i.e., the fastest to access) needs some sort of lookup table to keep track of the node identifiers and their associated storage positions (be it maintained internally or externally). Bearing in mind that storing and searching this dictionary introduces some complexity offsets, its impact must be taken into account in any scenario. Moreover, the list-based graph also needs this lookup table to be able to reference all its nodes unless it is asserted that the graph is always connected, so that every node is accessible from a single root node (which not very realistic, especially for directed graphs). Therefore, the choice of implementation does not simply seem to come down to a tradeoff between graph time and space complexities, but to a somewhat more blurry scenario involving the complexities of the auxiliary data structs. Let's dig into this.

The goal of conducting this asymptotic analysis of graph adjacency implementation is to validate the following hypothesis:

Matrix-based graphs are more effective than list-based graphs for densely connected networks, whereas list-based graphs are more effective than matrix-based graphs for sparsely connected networks. This is what is usually taken for granted, but it only works if the complexities introduced by the auxiliary (and necessary) data structs of the implementation are of a lesser order of magnitude than the default graph complexities.

In order to carry out some experimentation, time and space complexities are taken for proxy effectiveness rates. A matrix-based and a list-based approaches are evaluated on some graph properties and functionalities (storage, node adjacency and node neighborhood functions) and on two topologies with different levels of connectivity. The sparse scenario is defined to be a tree and the dense scenario is defined to be a fully meshed network. Considering that V is the set of nodes (aka vertexes) and E is the set of edges, the aforementioned adjacencies are related as follows:

Network topology Number of edges |E|
Sparse graph (tree) |V|-1
Dense graph (full mesh) |V| (|V|-1)/2

Now the typical complexities of the two implementations of graph adjacency are shown as follows:

Complexity Matrix Linked-list
Storage O(|V|^2) O(|V|+|E|)
Node adjacency O(1) O(deg|V|)
Node neighborhood O(|V|) O(deg|V|)

Just for the record: without loss of generality, node degrees for a sparse graph are similar to 1, whereas node degrees for a dense graph are similar to the number of nodes |V|.

And the complexities of some implementations tackling the dictionary problem:

Dictionary implementation Space complexity Time complexity
Hash table O(|V|) O(1)
Double-list (unordered keys) O(|V|) O(|V|)
Double-list (ordered keys) O(|V|) O(log|V|)

It is to note that the dictionary implementation of use to relate node identifiers with storage positions introduces a diverse scenario of complexities. The implementation based on a hash table is the least intrusive with the graph complexities as the storage complexity is equivalent (or of lesser order) and it shows a random-access time complexity. Instead, the remaining two approaches do introduce some equivalent changes worsening the performance of the graph functions (the linear search for the double-list with unordered keys and the binary search for the ordered keys).

Common approaches to deal with the graph adjacency scheme are based on a matrix and on a linked list. The former is usually suitable for densely connected graphs while the latter is more effective for sparse graphs. However, both these implementations require an additional dictionary to associate node identifiers with their storage positions, and the time and space complexity of this auxiliary structure must be taken into account in the whole graphical abstract data type. From here on, the dictionary shaped as a hash table stands as the only approach that allows appreciating the differences between the matrix-based and the liked list-based approaches because it operates unobtrusively with the fundamental adjacency structures. The other two dictionary approaches based on linked lists (with ordered or unordered keys) introduce a complexity burden that hinders the advantages of the adjacency implementations wrt the level of connectivity of the graph in question.

Anyhow, in order to prevent premature optimization, TADTs implements a graphical abstract data type with a dictionary based on a linked list. This is good enough for most real-world scenarios.

All contents © Alexandre Trilla 2008-2017