Skip to content

A tool for parallel and distributed enumeration of cliques and diameter two kplexes.

Notifications You must be signed in to change notification settings

levnikmyskin/parallel_enum

 
 

Repository files navigation

Contents

This repository contains implementations of algorithms for maximal subgraph enumeration, including the algorithm for maximal cliques and the algorithm for k-plexes of diameter 2 (which include those of size at least 2k-1) appeared in:

ICALP16 - Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques (ICALP 2016, https://drops.dagstuhl.de/opus/volltexte/2016/6292/ )

KDD18 - D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes (KDD 2018, https://dl.acm.org/citation.cfm?doid=3219819.3220093 )

Building

To build the project, just run the command ./build.sh. It will ask if you want to build the project with the support for distributed memory, for example to run the code on a cluster of homogeneous interconnected computing nodes. If this is the case, please read the "MPI" section, otherwise you jump directly to the Using section.

MPI

MPI is the de-facto standard for implementing high performance applications running on a cluster of computing nodes. If you need to run the code on a cluster, while building the project you need to specify the path where MPI is installed. The build script will try to infer by itself the path where MPI is installed. If this path is wrong, you need to specify it manually. If you do not have MPI already installed on your machine, please install it before running the build script, for example by following this guide. The script assumes that the folder you specified contains one include subfolders containing the MPI headers and one lib subfolder containing the MPI library. If this is not the case, you may need to modify either the folder structure or the build.py script. The code was only tested with OpenMPI but should work with other MPI installations as well.

Graph format

By default, the application accepts graphs in the NDE (Nodes-Degrees-Edges) format. The graph file should have .nde extension and contain the following information:

  • The first line should contain the number of nodes.
  • Then, for each node, one line containing the index of the node and its degree (indices should start from 0).
  • Then, for each edge, one line containing the indices of its extremes.
  • Numbers on the same line should be separated by space.

The OLY format is also available by using the flag -graph_format="oly". In this case the graph file should have .oly extension and contain the following information:

  • The first line should contain the number of nodes and edges.
  • Then, for each edge, one line containing the indices of its extremes (indices should start from 0, or from 1 if using the flag -one_based).
  • Numbers on the same line should be separated by space.

Using

To execute the code, you need to run the following executable file:

./build/text_ui graph.nde

This executable accepts the following optional parameters:

  • -system: What should be enumerated. This can assume one of the following values:

    • d2kplex: Default value. To enumerate maximal, diameter-2 k-plexes. These include all maximal k-plexes of size at least 2k-1 (this is the algorithm in KDD18).
    • clique: To enumerate maximal cliques (this is the algorithm in ICALP16).
  • -enumerator: This can assume one of the following values:

    • sequential: To run the code sequentially. This is the default value.
    • parallel: To run the code in parallel on a multicore computing node.
    • distributed: To run the code on a cluster of multicore computing nodes. If you specify this value, please refer to the Using with MPI section for more information on how to run the code.
  • -k: This is the k value for k-plex enumeration. By default it is set to 2.

  • -q: This is the minimum size of the d2kplexes to be found. By default it is set to 1.

  • -n: Number of cores to use when enumerator=parallel. By default this is set to the number of cores available on the computing node.

  • -graph_format: The format of the graph provided in input. It can assume one of the following values (in both cases, node indices should start from 0):

    • nde: This is the default format. See the description above in Section Graph format
    • oly: One line containing the number of nodes and edges, then one line for each edge containing the indices of its extremes.
  • -fast_graph: Use the faster but more memory hungry graph format. By default it is set to true.

  • -huge_graph: Use 64 bit integers to count nodes. By default it is set to false.

  • -one_based: Whether the graph is one based. Used only by oly format. By default it is set to false.

  • -quiet: Do not show any non-fatal output. By default it is set to false.

  • -print: Print the solutions one per line on stdout. Use with -quiet to print only the solutions.

  • -enable_pivoting: Enables pivoting in d2kplex. By default it is set to true.

For example, to count 3-plexes bigger than 2 on the 'graph.nde' graph, in parallel by using 10 cores, you should execute the following command:

./build/text_ui -k 3 -q 2 -enumerator="parallel" -n 10 graph.nde

Using with MPI

To run this code on a cluster of nodes, first of all we assume that these nodes are using NFS so that all of them have the same directory structure. Before executing the code, you need to specify a hosts.txt file. For example, supposing that you have 32 nodes (from node00 to node031), the file should look like the following one:

node00 slots=1 max-slots=1
node01 slots=1 max-slots=1
node02 slots=1 max-slots=1
...
node030 slots=1 max-slots=1
node031 slots=1 max-slots=1

Basically, we have one row for each computing node. Each row has three space separated fields. The first field is the hostname of the node, while the second and third fields force MPI to run only one process for each node. This is needed to avoid oversubscription of the node, since each process will spawn multiple threads (a number of threads equal to the one specified through the -n parameter).

ATTENTION: This is the format for OpenMPI installations. Other MPI versions should have a similar format. Please refer to the corresponding user manual to check how to force only one process for each node.

After you built the hosts.txt file, you need to run the application by using the mpiexec tool.

For example, to count 3-plexes bigger than 2 on the 'graph.nde' graph, by using 15 nodes of the cluster, each of which uses 10 cores, you should execute the following command:

mpiexec --hostfile hosts.txt -np 15 ./build/text_ui -k 3 -q 2 -enumerator="distributed" -n 10 graph.nde

In a nutshell, you need to run the same command you would run in the sequential/parallel case, by replacing the distributed value for the enumerator flag and by prepending the command with mpiexec --hostfile hosts.txt -np 15 where the value of the -np flag is the number of computing nodes you want to use.

Macros

The following macros can be specified at compile time, but are mostly for internal/debug use.

  • DEGENERACY: If specified, graph is permuted in degeneracy ordering
  • PARALLEL_NOPIN: Doesn't perform threads pinning
  • PRINT_PROGRESS: Once per second prints the current number of cliques/d2kplexes found

About

A tool for parallel and distributed enumeration of cliques and diameter two kplexes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 99.6%
  • Other 0.4%