This repository contains the source code for MOCHI: Motif-based Community Search over Large Heterogeneous Information Networks. In this paper, we investigate the problem of motif-based community search over heterogeneous information networks (MOCHI). We devise a novel motif density modularity (MDM) to assess the motif cohesiveness within communities and propose three algorithms to solve the MOCHI problem.
We implement all algorithms in C++. All experiments are performed on a Linux machine with 2.1 GHz CPU and 128 GB memory. The following files contain the source code of these algorithms:
1.Naive_origin/NaiveOriginPeeling.cpp
: The basic algorithm for MOCHI.
2.AdvancedHIN/AdvancedPeeling.cpp
: The MW-HIN-based algorithm for MOCHI.
3.LocalHIN/localHIN.cpp
: The motif-distance-based algorithm for MOCHI.
We use RapidMatch [1] to find all the motif instances, which can be found here.
[1] Shixuan Sun, Xibo Sun, Yulin Che, Qiong Luo, and Bingsheng He. Rapidmatch: A holistic approach to subgraph query processing. Proceedings of the VLDB Endowment, 14(2):176–188, 2020.
We use four real-world HINs in our experiments, including CIDeRplus, TMDB, Freebase, DBpedia. The processed data can be found here. The statistics and references of these datasets can be found in our paper.
An example format of the input data for MOCHI is shown in the folder data/example
.
-
data/example
: An example of the HIN. -
data/example/motifs
: Examples of the motif. -
data/example/motifs/i_j/p/q.txt
: The q-th query with size p for j-th motif with size i. -
vertex.txt
: Each line starts with the vertex id, followed by the vertex type. -
edge.txt
: Each line starts with the edge id, followed by the edge type. -
graph.txt
: Each line starts with the vertex id, followed by a list of neighbor vertex id and edge id. -
queryGraph.txt/matchGraph.txt
: Input format of the graph in RapidMatch. It starts witht |V| |E|
, where V is the vertex set and E is the edge set. A vertex and an edge are formatted asv {vertex id} {vertex type} {vertex degree}
ande {vertex id} {vertex id}
respectively. The vertex id should start from 0 and the range should be limited within[0,|V|-1]
.
The running examples of three algorithms can be found at BasicRunner.cpp
, MWRunner.cpp
, and MDRunner.cpp
. Please compile the project with the CMakeLists.txt
and run BasicRunner
, MWRunner
, and MDRunner
, respectively, to execute the basic algorithm, MW algorithm, and MD algorithm. You can set the dataset and path in Config/config.h
.
The algorithm for generating motifs and query vertices can be found at GenerateQuery.cpp
. The generated queries will be stored at data/xxx(your dataset name)/RandomMotif
. Please move them to the folder data/xxx(your dataset name)/motifs
when executing queries.
- GCC 11.2.1
- CMake 3.29.2