Statistische Auswertung und Interpretation von hochdimensionalen molekularbiologischen Datensätzen

Der Einsatz von Hochdurchsatz-Methoden ist in der molekularbiologischen Forschung zu einem elementaren Werkzeug zur Aufklärung der komplexen zellulären Vorgänge geworden. Ein wichtiger Aspekt bei der Durchführung von Hochdurchsatz-Methoden ist die Etablierung von geeigneten Algorithmen zur Auswertun...

Full description

Saved in:
Bibliographic Details
Main Author: Samans, Birgit
Contributors: Eilers, Martin (Prof. Dr.) (Thesis advisor)
Format: Doctoral Thesis
Published: Philipps-Universität Marburg 2008
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!

High throughput methods have become an elementary tool in molecular biological research for the elucidation of complex cellular processes. An important aspect in using high throughput technologies is the implementation of appropriate algorithms for data analysis. The lack of standardised data validation and quality assurance processes has been recognised in the past as one of the major problems for successfully implementing high throughput experimental technologies (Kaul 2005). These processes are therefore of particular importance for the success of these experiments. For DNA-Chip technology, which has been used for studying gene expression since the beginning of the nineties, standardised methods for data analysis have already been established. An overview of these methods is given in chapter 2. However, despite efficient data analysis the interpretation of these data in the context of the biological question remains difficult. Therefore, during the last years the main focus of interest was more on bioinformatic methods regarding the functional interpretation of the data. In chapter 3 some of these methods are described and then applied to the results of a microarray experiment studying the c-Myc dependent gene expression in T-lymphocytes of transgenic mice. It was shown that the applied methods are suitable for the functional interpretation of gene lists. As different methods lead to different results and also focus on different aspects it is expedient to use them to maximise the efficiency of the interpretation. One aspect of the functional analysis of a set of co-regulated genes is their possible regulation through a common transcription factor. A method for the detection of significant over-represented cis-regulatory motifs in the promoter sequence of a set of co-regulated genes has been established within this work. This method includes the following steps: • Establishing of a database containing orthologous promoter sequences (mouse/human) • Masking of repetitive parts of the sequences • Alignment of orthologous sequences • Examination the conserved sequences for transcription factor binding sites (TFBS) • Correction of the number of binding sites regarding the length of the conserved promoter sequences • Testing a group of co-regulated genes for the enrichment of TFBS in the promoter sequences compared to a background set This method was applied to the microarray dataset already used in chapter 3. Position weight matrices (PWM) with E-box motifs, which already have been described as a binding motif for c-Myc were identified as significantly overrepresented. Additionally, an enrichment of the PWM for the transcription factor YY1 can be detected, whose constitutive repression is reduced due to the over-expression of c-Myc (Austen, Cerni et al. 1998). Beside the regulation of the gene expression, for example by transcription factors, there are a number of other factors that play an important role in the regulation of cellular processes. One of these factors are miRNAs, which over the last years have become more and more important in molecular biological research. miRNAs show e.g. time- and tissue-specific expression patterns in plant and animal development and are involved in the regulation of physiological processes such as apoptosis, cell division and cell differentiation. Since often a high number of miRNAs are involved in the regulation of one process, it is necessary to measure the expression of different miRNAs simultaneously to find the corresponding pattern. miRNA microarrays can be used to measure these expression patterns. Within this work the algorithms for the data analysis of a miRNA microarray platform are established based on the methods described in chapter 2. These algorithms were applied to two experiments that analyse the N-Myc dependent miRNA expression in vivo and in vitro. The results of both experiments have a clear overlap in miRNA expression patterns. The preliminary test as well as the experiments showed that the miRNA platform is suitable for the experimental use and that the data analysis routine leads to reasonable results. However it also became apparent that an improvement of the actual array design would allow different normalisation techniques which could lead to a better reduction of systematic errors. With the RNAi screening technology another high throughput technology is available that allows the examination of the direct interaction of different gene products. These screens generate very large data sets which often show a high variability. This also necessitates efficient data analysis. In this work I describe the implementation of a data analysis routine for the RNAi screening technology on the basis of an shRNA screen that examines the influence of different kinases on the stability of c-Myc.