Optimierung kontinuierlicher Anfragen auf Basis statistischer Metadaten

Die kontinuierliche Verarbeitung aktiv bereitgestellter Daten, wie beispielsweise Sensordaten oder Börsenkurse, wird in vielen Branchen mehr und mehr zu einer Notwendigkeit. Da Anfragen über solchen Daten im Allgemeinen eine sehr lange Lebensdauer besitzen und somit ihre Ressourcen entsprechen lange...

Full description

Saved in:
Bibliographic Details
Main Author: Riemenschneider, Tobias
Contributors: Seeger, Bernhard (Prof. Dr.) (Thesis advisor)
Format: Doctoral Thesis
Language:German
Published: Philipps-Universität Marburg 2008
Subjects:
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!

Continuous processing of actively prepared data, for example sensor data or stock market prices, is becoming ever more a necessity in many sectors. Since queries on such data generally have a very long life and thus tie up their resources equally for a long period, optimization of so-called continuous queries represent a central condition for their efficient processing. The present work is therefore devoted to exactly this theme. The assignment of query optimization consists of planning the calculation of the results of a more or less abstractly defined query as efficiently as possible. The so-called query plan representing the result of a query optimization therefore defines exactly in what manner the received data are to be processed and in what order the data from several data sources should be combined. In order to facilitate availability of this optimal query plan the query optimization generates a large quantity of various query plans that all calculate the same result. Following this, it assesses for each of the generated query plans what costs arise from calculating its results, and forwards the most favorable as result of the query optimization. The exact assessment of the costs of a query plan in the context of continuous queries, however, proves to be problematic, with the result that an evaluation basis for the selection of the optimal query plan is missing. In order to provide such an evaluation basis a simulation of the query plans is suggested within the framework of the present work. This simulates query processing based on artificial data sources whose data are processed by the query plan to be evaluated. By this means, its costs can be defined simply by measuring and be used for the evaluation. A further problem of continuous queries results from the behavior of the data processed. This often results in a query plan defined by query optimization generating after some time much higher costs than originally expected. This can happen, for example, when the underlying data sources suddenly supply their data in large quantities and in a very much higher rate. In this case, the query optimization must check whether the costs of the query processing can be reduced by using a different query plan. In order to keep the outlay involved in this regularly repeated, so-called re-optimization within limits a process is introduced in the present work that archives the optimization of a continuous query. In the case of an expected re-optimization, this process facilitates a preceding check to identify whether an explicit re-optimization can be prevented by reutilization of an archived query plan.