An agent based architecture for query planning and cost modeling of web sources
MetadataShow full item record
Query planning and optimization in mediators integrating wrapped web sources poses new challenges due to the differences in kinds of queries and costs of executing such queries on wrapped sources versus the same for structured database systems. In this thesis our aim is to develop a query planner and cost based optimization for web based mediators that produces high quality query plans. In particular, the user of the system poses a query in global "view" to an information broker for retrieving data from several heterogeneous resources, which in turn sends appropriate subqueries to relevant resources. To achieve good overall system performance, efficient global query planning and optimization techniques have to be employed. The main contributions are the development of a cost model and query cost estimation technique for wrapped web sources. Traditional database use factors such as average number of tuples, number of blocks access, blocking factor, indexed columns etc. to estimate the cost of executing a query. However such factors are not relevant for wrapped web sources, where the cost of executing a query depends on other factors, such as number of pages retrieved, time to retrieve a single page from the server, organization of information to be extracted on the pages on the web source etc.