(1) Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose descriptors explaining the nature of different protein-protein complexes are desirable. In this work, we introduce Epic Protein Interface Classification (EPIC) as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines (SVM), C4.5 Decision Trees, K Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms (GA) to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, we represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors (ACVs), DrugScore pair potential vectors (DPV) and SFCscore descriptor vectors (SDV). We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features.
(2) Since protein-protein interactions play pivotal role in the communication on the molecular level in virtually every biological system and process, the search and design for modulators of such interactions is of utmost interest. In recent years many inhibitors for specific protein-protein interactions have been developed, however, in only a few cases, small and druglike molecules are able to interfere the complex formation of proteins. On the other hand, there a several small molecules known to modulate protein-protein interactions by means of stabilizing an already assembled complex. To achieve this goal, a ligand is binding to a pocket, which is located rim-exposed at the interface of the interacting proteins, e.g. as the phytotoxin Fusicoccin, which stabilizes the interaction of plant H+-ATPase and 14-3-3 protein by nearly a factor of 100. To suggest alternative leads, we performed a virtual screening campaign to discover new molecules putatively stabilizing this complex. Furthermore, we screen a dataset of 198 transient recognition protein-protein complexes for cavities, which are located rim-exposed at their interfaces. We provide evidence for high similarity between such rim-exposed cavities and usual ligand accommodating active sites of enzymes. This analysis suggests that rim-exposed cavities at protein-protein interfaces are druggable targets. Therefore, the principle of stabilizing protein-protein interactions seems to be a promising alternative to the approach of the competitive inhibition of such interactions by small molecules.
(3) AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the PDB. It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code, and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH-value of the measurement, ligand molecular weight, and publication data (author, journal, year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design.