An Efficient Algorithm for Obtaining Low Memory Approximation Models of Large-Scale Networks
SESSION: Student Poster Reception
EVENT TYPE: Poster, ACM Student Poster
TIME: 5:15PM - 7:00PM
ABSTRACT: The study of real world networks has important applications in many disciplines including social sciences and bioinformatics. These networks are unstructured and extremely large, for example of the order of 1 million vertices and 10 million edges. Due to this large size, analysis of real-world networks is both computation and memory intensive.
In this poster, we present a scalable parallel algorithm for obtaining low memory approximations (synopsis) of large-scale networks that preserves important combinatorial characteristics, such as the presence of cliques, degree distribution, and betweenness-centrality. In particular, we demonstrate how maximal chordal subgraph can be used to approximate large-scale networks. A maximal chordal subgraph is a spanning subgraph of the original network where there are no cycles of length larger than three and such subgraphs preserve many topological features such as the number of cliques and the lower bound on the chromatic number. Chordal graphs can be used to construct efficient linear time algorithms for problems, such as minimum vertex coloring and maximum clique size, which are non-polynomial on general graphs. This property can used to design faster analysis algorithms on the chordal subgraphs of a network.
Our parallel algorithm (on distributed memory systems) for obtaining the maximal chordal subgraph of a large-scale network is based on using a variation of maximum cardinality search on the network. A brief description of the algorithm is as follows;
(i) The network will be partitioned over p processors using a graph partitioning software like METIS. (ii) Within each processor, the maximal chordal subgraph of the local portion of the network will be computed in parallel. (iii)The subnetworks across processors will then be combined and inspected for cycles of length larger than three. (iv) For all cycles larger than three, the edges whose end lies in different processors will be deleted. This will reduce the communication costs for the analysis algorithms.
In our results, we demonstrate how maximal chordal subgraphs can provide a better visual representation of the modifications to the graph through a proportionate change in network properties like degree distribution. We also show how these subgraphs can be used to identify modifications in the gene expression of the hypothalamus of a mouse brain. We will show how maximal chordal subgraphs preserve characteristics like betweenness centrality, which are important in community detection. The size (number of edges) and connectivity structure of the chordal subgraph is influenced by the order in which the vertices are traversed. To maintain betweeness centrality, we modify the algorithm, by selecting vertices in appropriate order, such that the routes of most of the shortest paths in the network are preserved. We also provide scalability results pertaining to the performance of our parallel algorithm.
Kanimathi Duraisamy - University of Nebraska-Omaha