Thumbnail Image

Query Planning in Mediator Based Information Systems

Leser, Ulf

Information integration has gained new importance since the widespread success of the World Wide Web. The simplicity of data publishing on the web and the promises of the emerging eCommerce markets pose a strong incentive for data providers to offer their services on the Internet. Due to the exponential growth rate of the number of web sites, users are already faced with an overwhelming amount of accessible information. Finding the desired piece of information is difficult and time-consuming due to the inherently chaotic organisation of the Web. For this reason, information integration services are becoming increasingly important. The idea of such a service is to offer to a user a single point of access that provides him or her ex-actly with the information he or she is interested in. To achieve this goal, the service dynami-cally integrates and customises data from various data providers. For instance, a business in-formation service would integrate news tickers, specialised business databases, and stock in-formation. However, integration services face serious technical problems. Two of them are particularly hard to overcome: The heterogeneity between data sources and the high volatility that interfaces to data providers on the web typically exhibit. Mediator based information systems (MBIS) offer remedies for those problems. A MBIS tackles heterogeneity on two levels: Mediators carry out structural and semantic integration of information stemming from different origins, whereas wrappers solve technical and syntacti-cal problems. Users only communicate with mediators, which use wrappers to access data sources. To this end, mediators and wrappers are connected by declarative rules that semanti-cally describe the content of data sources. This decoupling supports the stability of interfaces and thus increases the maintainability of the overall system. This thesis discusses query-centred MBIS, i.e., MBIS in which mediators represent their domain through a schema. The core functionality of a mediator is to receive and answer user queries against its schema. Two ingredients are essential to accomplish this task: First, it re-quires a powerful language for specifying the rules that connect mediators and wrappers. The higher the expressiveness of this these rules, the more types of heterogeneity can be overcome declaratively. Second, the mediator must be equipped with algorithms that are - guided by the semantic rules - capable of efficiently rewriting user queries into queries against wrappers. We contribute to both issues. We introduce query correspondence assertions (QCA) as a flexible and expressive language to describe the content of heterogeneous data sources with respect to a given schema. QCAs are able to bridge more types of conflicts between schemas than previous languages. We describe algorithms that rewrite queries against a mediator into sequences of queries against wrappers, based on the knowledge encoded in QCAs. Our algo-rithms are considerably more efficient than previously published algorithms for query rewrit-ing in MBIS. Furthermore, we define a formal semantics for queries in MBIS, which allows us to derive statements about properties of rewriting algorithms. Based on this semantics, we prove that our algorithm is sound and complete. Finally, we show how to reduce the main cost factor of query answering in MBIS, i.e., the number of accesses to remote data sources. To this end, we device algorithms that are capable of detecting and removing redundant remote accesses.