Jagalpure, Aniruddha Girish
MetadataShow full item record
RDF data is a labeled directed graph. SPARQL is an RDF Query language that is used to extract information from the RDF Graph. There are different RDF Engines like Sesame, RDF- 3X, OWLIM & Jena. Jena is the most popular framework and is widely used. Jena In-Memory model cannot scale for large RDF datasets while Jena SDB and Jena TDB have high latencies. In this thesis we propose a new system ‘RGIS’ (RDF Graph Split and Index) for processing SPARQL queries on RDF data. RGIS is not only scalable but also faster than Jena and OWLIMSE (BigOWLIM). RGIS uses a custom data format and novel indexing technique to store the RDF data. Our custom format stores the RDF data into different files based on Classes and Object Properties present in the RDF data. These files are then given an index and each instance in these files is given a unique index value. We have also developed an RDF structure-aware Query Planner that uses the topology of RDF graph to intelligently schedule various query operations. When compared with Jena TDB, OWLIM and Mulgara on LUBM datasets, RGIS was not only had faster response times but also has less memory overhead.