• Login
    View Item 
    •   Athenaeum Home
    • University of Georgia Theses and Dissertations
    • University of Georgia Theses and Dissertations
    • View Item
    •   Athenaeum Home
    • University of Georgia Theses and Dissertations
    • University of Georgia Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A fast algorithm for subgraph pattern matching on large labeled graphs

    Thumbnail
    Date
    2013-08
    Author
    Saltz, Matthew Wyatt
    Metadata
    Show full item record
    Abstract
    In recent years, the robustness of graphs for representing complex data has led to a proliferation of research on graph databases and analytics. One important topic in the field is graph pattern matching, which can be used, for example, in the processing of queries in graph databases. Though fast querying is highly desirable, pattern matching algorithms are hindered by the NP-completeness of the subgraph isomorphism problem. This paper presents a conceptually simple, memory-efficient, pruning-based algorithm for the subgraph isomorphism problem that outperforms commonly used algorithms by orders of magnitude on large labeled graphs. This speedup is due in large part to the effectiveness of the pruning algorithm, known as dual simulation, which in many cases removes a large percentage of the vertices not found in isomorphic matches. In this paper, the runtime of the algorithm is tested on synthetic graphs of up to 10 million vertices and 250 million edges and on two real life datasets, comparing when possible to an adjacency list version of Ullmann’s algorithm and to the VF2 algorithm. To the best of our knowledge, this is the first paper to test a centralized subgraph isomorphism algorithm on graphs of this magnitude. The algorithm is tested extensively to determine the effects of label density, edge density, data graph size, degree distribution, and query graph size and type on runtime. The effectiveness of the algorithm is then demonstrated on two large real life graphs. The algorithm is easily extendable to graphs with multiple attributes on vertices and edges, making it an ideal candidate to serve as the backbone of a query processing engine for a graph database.
    URI
    http://purl.galileo.usg.edu/uga_etd/saltz_matthew_w_201308_ms
    http://hdl.handle.net/10724/29646
    Collections
    • University of Georgia Theses and Dissertations

    About Athenaeum | Contact Us | Send Feedback
     

     

    Browse

    All of AthenaeumCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    About Athenaeum | Contact Us | Send Feedback