Entwicklung einer Datenbank und wissensbasierter Vorhersagemethoden zur Untersuchung von Wassermolekülen in Proteinstrukturen sowie ihrer Rolle in der Protein-Liganden-Bindung

Die vorliegende Arbeit befasst sich mit dem Aufbau der ersten Datenbank zur Charakterisierung von Wassermolekülen in Proteinstrukturen. Diese wurde als Modul der Rezeptor-Ligand-Datenbank Relibase+ konzipiert und erfasst alle Röntgenstrukturen der Proteindatenbank PDB. Diese Datenbas...

Full description

Saved in:
Bibliographic Details
Main Author: Günther, Judith
Contributors: Klebe, Gerhard (Thesis advisor)
Format: Doctoral Thesis
Published: Philipps-Universität Marburg 2003
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents: This study is dedicated to the development of the first database for characterizing water molecules in protein structures. It was designed as a module for the receptor-ligand database Relibase+ and comprises all the X-ray structures of the protein databank PDB. The data collection has subsequently been utilized for the development and validation of knowledge-based methods for the prediction of water sites. Of particular interest are water molecules buried in the protein-ligand interface, since these water molecules defy the simple lock and key principle that most rational drug design methods rely on. Chapter 2 reviews the knowledge on water molecules in protein structures gathered to date against the background of the underlying experimental methods. Chapter 3 describes the conception of the water database as well as some of the application examples for the tools implemented. The tools developed for comparative analysis of solvation patterns allow different references to be used. Both the structural similarity of ligands as well as the sequential relationships of proteins can serve as the reference for superimposition of the respective structures in three-dimensional space. Although recurrent solvation patterns (conservation of water molecules) are predominantly determined by physicochemical properties exposed on the protein surface, the influence of the ligand should not be underestimated. Moreover, apart from classical hydrogen bonds, weaker interactions such as CH-hydrogen bonds can play a relevant role. The water database also includes a tool for the detection of crystallographically misassigned water molecules, which enhances known methods (see chapter 7). The method estimates as to whether or not some particles assigned as water molecules might instead represent a sodium (or magnesium) ion. The algorithm combines a set of descriptors, which include the coordination geometry of the particle, its B-factor and its electrostatic valence, which is derived from the contact lengths to the atoms in the local neighbourhood. In Chapter 4, hydration structures in proteins are examined by means of statistical methods. This analysis reveals important conditions that predictive methods in rational drug design have to meet in order to appear promising. Water molecules buried in the protein-ligand interface are mostly conserved with respect to the ligand-free structure, while a significant shift of their positions upon ligand binding is only rarely observed. Thus, water sites in a ligand-free structure provide an indication as to where feasible water sites in a protein-ligand-complex can be found. However, the degree of conservation amongst water sites from two sequence-identical protein pockets depends significantly on the structural similarity of the two bound ligands. Chapter 5 deals with the prediction of conserved water molecules in different scenarios by means of a GA/knn algorithm. Using the descriptors developed with the water database, a prediction accuracy of 82% was achieved for the discrimination of crystallographically determined water sites from non-solvated positions on a protein surface. In this application scenario, the approach outperformed all previously reported methods. A similar improvement when compared with existing methods was also achieved for the classification of conserved versus non-conserved water molecules in a comparison of different structures of the same protein (prediction accuracy 78%). To this end, firstly, it was necessary to consider as many structural comparisons as possible for a reference protein when compiling the knowledge base and, secondly, to account for the proven bias introduced by the individual crystallographers who author the structures, respectively. For the discrimination of water molecules conserved upon ligand binding versus others that are non-conserved, a prediction accuracy of 73% was obtained. In this scenario, it is primarily the influence of the individual bound ligand that limits the performance of the algorithm. Depending on the respective protein binding pocket, the approach of pre-placing selected water molecules in fixed positions in the setup of, e.g., a virtual screening, can thus be inappropriate. Therefore, chapter 6 focuses on a methodical enhancement of the Particle Concept implemented in FlexX. This approach allows for flexible placement of water molecules during the build-up of the individual ligand in the binding pocket. Implementation of an enhanced version of the scoring function DrugScore, which takes water molecules into account, improved the energy ranking of the generated solutions significantly. The new scoring scheme not only outperforms the originally implemented empirical function by Boehm, it also yields a by 15% improved recognition of near native binding modes on top rank (RMSD<= 1.0A) when compared with the standard DrugScore version.