|dc.description.abstract||Subgraph pattern matching is a fundamental operation for many applications, and it is exhaustively studied in its classical forms. Nevertheless, there are newly emerging applications, like analyzing hyperlinks of the web graph and analyzing associations in a social network, that need to process massive graphs in a timely manner. Regarding the extremely large size of these graphs and knowledge they represent, not only new computing platforms are needed, but also old models and algorithms should be revised. In recent years, a few pattern matching models have been introduced that can promise a new avenue for pattern matching research on extremely massive graphs. In this research, we study a family of subgraph pattern matching models called graph simulation, and propose two new models, called strict and tight simulation, to increase their efficiency while preserving the quality of their results. Moreover, we propose a new set of conditions, namely cardinality restriction, that can improve the expressiveness of most models in this family.
Several graph processing frameworks like Pregel have recently sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging on these frameworks because this problem does not naturally align with a vertex-centric programming paradigm. We design and implement novel distributed algorithms based on the vertex-centric programming paradigm for efficient subgraph pattern matching. Our algorithms are fine tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we present an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms.
Regarding the fact that pattern matching can be considered as an important type of queries for a graph database- either centralized or distributed- we study the problem of pattern containment and caching techniques specified for subgraph pattern matching. The proposed caching technique works based on the tight simulation model. Nevertheless, it is also possible to use it for subgraph isomorphic queries. We identify the main challenges of such a system, and our experiments show the effectiveness of the proposed solutions.||