Table of Contents:
Due to technological progress there has been an enormous increase of the number of continuous data streams from which valuable information has to be derived as fast as possible. Therefore, data stream management systems have emerged as a new technology to process continuous queries over data streams. In contrast to databases they primarily operate in memory and are optimized for processing continuous queries over data streams. During the development of this new kind of systems taking over the concept of using relational operator graphs from database theory has proven to be of value if executing them data driven instead of demand driven. Assigning validity intervals to the elements of the data streams solves the problem of processing potentially unbounded data streams while using bounded resources.
The capabilities of such systems considerably depend on the availability of efficient and well defined techniques for combining information from different data streams. The objective of this thesis therefore is to transfer the proven concept of the relational join to the data driven data stream processing using validity intervals.
For this purpose the semantic of the join operation for data streams is derived from the one of the extended relational algebra using the concept of snapshot-reducibility. Several join algorithms are presented and proven to comply with this semantic. The consequent usage of parameterization of the techniques with respect to the data structures used for storing the status allows supporting a large variety of different join predicates. Well known techniques of join processing using nested loops, hashing or indexing are adapted for data stream processing. Additionally, the Temporal Progressive-Merge-Join is introduced as an algorithm which allows to derive the join by using value based sorting of the data stream elements. Additionally, several optimizations are proposed, including the generalization of all presented algorithms for more than two input streams.
To enable the automated choice and parameterization of the concrete implementations with respect to a forecast of their resource usage the techniques are embedded in a detailed cost model. Often, several kinds of metadata concerning the data streams to process are not available at registration time of the continuous queries and additionally may change during their long-lasting execution time. Therefore, it is important to be able to gather detailed information regarding the data streams and the system state at any time and to be able to adapt the processing strategy accordingly if necessary. A main part of the thesis therefore deals with the question how to provide dynamic metadata within a data stream management system. For that purpose a user-friendly framework is presented which allows to efficiently obtain consistent dynamic metadata. This process is investigated in several experiments and also used to evaluate the presented join techniques. Additionally, an approach allowing to restructure continuous queries at runtime is presented and it is shown that this approach is applicable for join processing.