Die funktionserhaltende, integrative Genselektion: Eine Methode zur Reduktion von krankheitsbezogenen Gensätzen auf ihre Schlüsselkomponenten

Durch den technischen Fortschritt der letzten Jahre werden in immer kürzerer Zeit immer größere Mengen von Daten mit tausenden und abertausenden Merkmalen gesammelt [Stańczyk/Jain, 2017], [H. Liu/Motoda, 2012]. Um diese unüberschaubar große Datenflut nutzbringend einzusetzen, werden computergestützt...

Full description

Saved in:
Bibliographic Details
Main Author: Lippmann, Catharina
Contributors: Ultsch, Alfred (Prof. Dr.) (Thesis advisor)
Format: Doctoral Thesis
Published: Philipps-Universität Marburg 2020
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!

Recently, due to the technical progress and development more and more data with thousands and thousands of features is collected in increasingly less time [Stańczyk/Jain, 2017], [H. Liu/Motoda, 2012]. In order to make use of this unmanageable flood of data, computer-aided evaluation methods are needed to support scientists in extracting useful information or knowledge [Fayyad et al., 1996]. One approach to this are methods of "feature selection". In the present work, such an algorithm is developed exemplarily for sets of genes found on the basis of current knowledge about the genetic architecture of features or diseases. It is not required to have numerical measurements from experiments for the individual genes, since an integrative approach is pursued, which uses the Gene Ontology Knowledge Base [Ashburner et al., 2000] as a basis for the criterion for the selection of the most important genes. The function-preserving, integrative gene selection presented here reduces a set of genes to their most important elements by calculating a score for each gene that describes the importance of the genes. This score is determined using the annotations of the genes to the significant biological processes in the polyhierarchically organized Gene Ontology knowledge base. The resulting directed acyclic graph (DAG) of significant biological processes describes the gene functions of the set of genes. With the gene score, genes can be ranked according to their importance. The first k∗ genes form an optimal subset, whereby the subset of genes is selected that has the best function-preserving property. The preservation of function is evaluated by precision and recall and their combination to the F1-measure, respectively, regarding the reproduction of the entire DAG with the selected subset. With the function-preserving, integrative gene selection, the original DAG could be reproduced with recall and precision of about 70% for each of the examined data sets, using only about 5% of the original genes. The most important results of this thesis were already successfully, peer-reviewed published: [Lippmann et al., 2019].