Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 83
Down to the nitty-gritty of graph adjacency implementation
02-Sep-2013
The core implementation of the adjacency scheme in a graphical abstract
data type is a rich source of discussion. Common knowledge, i.e.,
Stackoverflow knowledge,
indicates that a matrix is usually good if the graph in question is
dense (squared space complexity is worth it at the advantage of
random-access time complexity). Instead, if the graph is sparse,
a linked list is more suitable (linear space complexity at the expense
of linear-access time complexity). But is this actually a fair tradeoff?
Part of these arguments is due to
considering all of its elements internally or managing some of them externally.
Matrix-based graphs are only fast to access if it is assumed that
the storage positions for the nodes are already known.
Since computers don't know (nor assume) things per se, they generally have
to be instructed step by step. Thus, the matrix-based approach (i.e., the
fastest to access) needs some sort of lookup table to
keep track of the node identifiers and their associated storage positions
(be it maintained internally or externally). Bearing in mind that
storing and searching this dictionary introduces some complexity offsets,
its impact must be taken into account in any scenario.
Moreover, the list-based graph also needs
this lookup table to be able to reference all its nodes unless it
is asserted that the graph is always connected, so that every node is
accessible from a single root node (which not very realistic, especially
for directed graphs). Therefore, the choice of implementation does not
simply seem to come down to a tradeoff between graph time and space
complexities, but to a somewhat more blurry scenario involving the
complexities of the auxiliary data structs. Let's dig into this.
The goal of conducting this asymptotic analysis of graph adjacency
implementation is to validate the following hypothesis:
Matrix-based graphs are more effective than list-based graphs
for densely connected networks, whereas list-based graphs are
more effective than matrix-based graphs for sparsely connected
networks. This is what is usually taken for granted, but it only
works if the complexities introduced by the auxiliary (and
necessary) data structs of the implementation are of a lesser
order of magnitude than the default graph complexities.
In order to carry out some experimentation, time and space complexities are
taken for proxy effectiveness rates. A matrix-based and a list-based
approaches are evaluated on some graph properties and functionalities
(storage, node adjacency and node neighborhood functions)
and on two topologies with different levels of connectivity.
The sparse scenario is defined to be a tree and the dense scenario is
defined to be a fully meshed network. Considering that V is the set of
nodes (aka vertexes) and E is the set of edges, the aforementioned
adjacencies are related as follows:
Network topology | Number of edges |E| |
Sparse graph (tree) | |V|-1 |
Dense graph (full mesh) | |V| (|V|-1)/2 |
Now the typical complexities of the two implementations of graph
adjacency are shown as follows:
Complexity | Matrix | Linked-list |
Storage | O(|V|^2) | O(|V|+|E|) |
Node adjacency | O(1) | O(deg|V|) |
Node neighborhood | O(|V|) | O(deg|V|) |
Just for the record: without loss of generality, node degrees for a sparse
graph are similar to 1, whereas node degrees for a dense graph are similar
to the number of nodes |V|.
And the complexities of some implementations tackling the dictionary
problem:
Dictionary implementation | Space complexity | Time complexity |
Hash table | O(|V|) | O(1) |
Double-list (unordered keys) | O(|V|) | O(|V|) |
Double-list (ordered keys) | O(|V|) | O(log|V|) |
It is to note that the dictionary implementation of use to relate node
identifiers with storage positions introduces a diverse scenario of
complexities. The implementation based on a hash table is the least
intrusive with the graph complexities as the storage complexity is
equivalent (or of lesser order) and it shows a random-access time
complexity. Instead, the remaining two approaches do introduce some
equivalent changes worsening the performance of the graph
functions (the linear search for the
double-list with unordered keys and the binary search for the ordered
keys).
Conclusion
Common approaches to deal with the graph adjacency scheme are based on a
matrix and on a linked list. The former is usually suitable for densely
connected graphs while the latter is more effective for sparse graphs.
However, both these implementations require
an additional dictionary to associate node identifiers with their storage
positions, and the time and space complexity of this auxiliary structure
must be taken into account in the whole graphical abstract data type.
From here on, the dictionary shaped as a hash table stands as
the only approach that allows appreciating the differences between
the matrix-based and the liked list-based approaches because it operates
unobtrusively with the fundamental adjacency structures. The other two
dictionary approaches based on linked lists (with ordered or unordered
keys) introduce a complexity burden that hinders the advantages of the
adjacency implementations wrt the level of connectivity of the graph in
question.
Anyhow, in order to prevent premature optimization,
TADTs
implements a
graphical abstract data type with a dictionary based on a linked list.
This is good enough for most
real-world scenarios.
|