Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an en...

Ausführliche Beschreibung

Gespeichert in:
1. Verfasser: Mernberger, Marco
Beteiligte: Hüllermeier, Eyke (Prof.) (BetreuerIn (Doktorarbeit))
Format: Dissertation
Veröffentlicht: Philipps-Universität Marburg 2011
Mathematik und Informatik
Online Zugang:PDF-Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!


2. J.P. Vert. The optimal assignment kernel is not positive definite. Technical report, Centre for Computational Biology, Mines Paris Tech, 2008. 41

3. M. Shatsky, R. Nussinov, and H.J. Wolfson. FlexProt: alignment of flexible protein structures without a predefinition of hinge regions. Journal of Computational Biology, 11(1):83–106, 2004. 20, 83

4. J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subra- maniam. Analytical shape computing of macromolecules II: identification and computation of inaccessible cavities inside proteins. Proteins, 33(1):18–29, 1998a. 25

5. Y. Tian and J. M. Patel. TALE: a tool for approximate large graph matching. In ICDE'08: 24th International Conference on Data Engineering. Proceedings, pages 963–972, Cancun, Mexico, April 2008. 30

6. F. Glaser, Y. Rosenberg, A. Kessel, T. Pupko, and N. Ben-Tal. The ConSurf-HSSP database: the mapping of evolutionary conser- vation among homologs onto PDB structures. Proteins, 58(3): 610–617, 2005. 24

7. D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biol- ogy, 55(1):141–154, 1993. ISSN 0092-8240. 51

8. X. Yan and J. Han. gSpan: graph-based substructure pattern min- ing. In ICDM'08: IEEE International Conference on Data Min- ing. Proceedings, pages 721–724, Pisa, Italy, December 2002. 33

9. J.A. Capra, R.A. Laskowski, J.M. Thornton, M. Singh, and T.A. Funkhouser. Predicting protein ligand binding sites by com- bining evolutionary sequence conservation and 3D structure. PLoS Computational Biology, 5(12):e1000585, 2009. 25

10. L. Xu and I. King. A PCA approach for fast retrieval of structural patterns in attributed graphs. IEEE Transactions on Systems, Man, and Cybernetics, 31(5):812–817, 2001. ISSN 1083-4419. 41

11. Y. Shinano, T. Fujie, Y. Ikebe, and R. Hirabayashi. Solving the maximum clique problem using PUBB. In IPPS'98: Interna- tional Parallel Processing Symposium. Proceedings, pages 326– 332, Orlando, Florida, March 2002. 35

12. D. Gold and R.M. Jackson. SitesBase: a database for structure- based protein-ligand binding site comparisons. Nucleic Acids Research, 34(Database Issue):D231–D234, 2006. ISSN 0305- 1048. 27

13. C. Yeats, M. Maibaum, R. Marsden, M. Dibley, D. Lee, S. Addou, and C.A. Orengo. Gene3D: modelling protein structure, func- tion and evolution. Nucleic Acids Research, 34(Database Issue): D281–D284, 2006. 17

14. A. Rangarajan and E.D. Mjolsness. A Lagrangian relaxation net- work for graph matching. IEEE Transactions on Neural Net- works, 7(6):1365–1381, 2002. ISSN 1045-9227. 37

15. S.S. Hannenhalli and R.B. Russell. Analysis and prediction of func- tional sub-types from protein sequence alignments. Journal of Molecular Biology, 303(1):61–76, 2000. 2

16. S. Medasani, R. Krishnapuram, and Y.S. Choi. Graph matching by relaxation of fuzzy assignments. IEEE Transactions on Fuzzy Systems, 9(1):173–182, 2002. ISSN 1063-6706. 37

17. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5):695–703, 1988. 41

18. S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):377–388, 1996. 37

19. K. Sjölander. Phylogenomic inference of protein molecular func- tion: advances and challenges. Bioinformatics, 20(2):170–179, 2004. 2

20. G.J. Kleywegt. Recognition of spatial motifs in protein structures. Journal of Molecular Biology, 285(4):1887–1897, 1999. 22

21. B. Huet and E.R. Hancock. Shape recognition from large image li- braries by inexact graph matching. Pattern Recognition Letters, 20(11-13):1259–1269, 1999. ISSN 0167-8655. 37

22. W. Spears, K. De Jong, T. B " ack, D. Fogel, and H. De Garis. An overview of evolution- ary computation. LNCS. Machine Learning: ECML-93, 667: 442–459, 1993. 69, 70

23. REFERENCES H. Bunke. Graph matching: theoretical foundations, algorithms, and applications. In VI'00: Vision Interface Conference. Pro- ceedings, pages 82–88, Montréal, Canada, May 2000. 31

24. C.A. Wilson, J. Kreychman, and M. Gerstein. Assessing annota- tion transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. Journal of Molecular Biology, 297(1): 233–249, 2000. 16, 21

25. J.M. Chandonia and S.E. Brenner. The impact of structural ge- nomics: expectations and outcomes. Science, 311(5759):347– 351, 2006. 1

26. S. Kosinov and T. Caelli. Inexact multisubgraph matching using graph eigenspace and clustering models. LNCS. Structural, Syn- tactic, and Statistical Pattern Recognition, 2396/2002:455–473, 2009. 41

27. G. Verbitsky, R. Nussinov, and H. Wolfson. Flexible structural comparison allowing hinge-bending, swiveling motions. Pro- teins, 34(2):232–254, 1999. 83

28. C. Guda, E.D. Scheeff, P.E. Bourne, and I.N. Shindyalov. A new algorithm for the alignment of multiple protein structures us- ing Monte Carlo optimization. In PSB'01: Pacific Symposium of Biocomputing. Proceedings, volume 6, pages 275–286, Hawai, USA, January 2001. 19

29. M. Neuhaus and H. Bunke. A random walk kernel derived from graph edit distance. LNCS. Structural, Syntactic, and Statistical Pattern Recognition, 4109:191–199, 2006b. 40

30. H. Edelsbrunner, M. Facello, and J. Liang. On the definition and the construction of pockets in macromolecules. Discrete Applied Mathematics, 88(1-3):83–102, 1998. ISSN 0166-218X. 25

31. V. Spriggs, P.J. Artymiuk, and P. Willett. Searching for patterns of amino acids in 3D protein structures. Journal of Chemical Information and Computer Sciences, 43(2):412–421, 2003. 22, 30, 34

32. M. Shatsky, A. Shulman-Peleg, R. Nussinov, and H.J. Wolfson. The multiple common point set problem and Its application to molecule binding pattern detection. Journal of Computational Biology, 13(2):407–428, 2006. 28, 53

33. K. Tsuda, T. Kin, and K. Asai. Marginalized kernels for biological sequences. Bioinformatics, 18(Suppl. 1):S268–5276, 2002. 40

34. M. Shatsky, R. Nussinov, and H.J. Wolfson. Flexible protein align- ment and hinge detection. Proteins, 48(2):24–256, 2002a. 20

35. K. Wang and R. Samudrala. FSSA: a novel method for identify- ing functional signatures from structural alignments. Bioinfor- matics, 21(13):2969–2977, 2005. 21

36. R.J. Najmanovich, A. Allali-Hassani, R.J. Morris, L. Dombrovsky, P.W. Pan, M. Vedadi, A.N. Plotnikov, A. Edwards, C. Arrow- smith, and J.M. Thornton. Analysis of binding site similarity, small-molecule similarity and experimental binding profiles in the human cytosolic sulfotransferase family. Bioinformatics, 23 (2):e104–e109, 2007. 28

37. R. Najmanovich, N. Kurbatova, and J. Thornton. Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics, 24(16): i105–i111, 2008. 83

38. M. Veeramalai and D. Gilbert. A novel method for comparing topological models of protein structures enhanced with ligand information. Bioinformatics, 24(23):2698–2705, 2008. 20

39. C. Micheletti and H. Orland. MISTRAL: a tool for energy-based multiple structural alignment of proteins. Bioinformatics, 25 (20):2663–2669, 2009. 19, 21

40. Z. Wang, J. Eickholt, and J. Cheng. MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP 8. Bioinformatics, 26(7):882–888, 2010. 2

41. K. Mizuguchi and N. Go. Comparison of spatial arrangements of secondary structural elements in proteins. Protein Engineering, Design and Selection, 8(4):353–362, 1995. 21

42. X. Yan, F. Zhu, J. Han, and P.S. Yu. Searching substructures with superimposed distance. In ICDE'06: 22nd International Conference on Data Engineering. Proceedings, volume 88, pages 88–88, Atlanta, USA, April 2006. 30

43. S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph index- ing method. In ICDE'07: 23rd IEEE International Confer- ence on Data Engineering. Proceedings, pages 966–975, Istanbul, Turkey, April 2007. 30

44. R. Myers, R.C. Wilson, and E.R. Hancock. Bayesian graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6):628–635, 2000. 37

45. D. Justice and A. Hero. A binary linear programming formula- tion of the graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8):1200–1214, 2006. 31, 38

46. M. Neuhaus and H. Bunke. Self-organizing maps for learning the edit costs in graph matching. IEEE Transactions on Systems, Man and Cybernetics, Part B, 35(3):503–514, 2005. 38

47. X. Yan, P.S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD'05: ACM SIGMOD International Conference on Management of Data. Proceedings, pages 766–777, Baltimore, USA, June 2005. 30

48. O. Sander, T. Sing, I. Sommer, A.J. Low, P.K. Cheung, P.R. Harri- gan, T. Lengauer, and F.S. Domingues. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Computational Biology, 3(3):e58–e68, 2007. xii, 122, 193, 194, 195, 196

49. I. Koch. Enumerating all connected maximal common subgraphs in two graphs. Theoretical Computer Science, 250(1-2):1–30, 2001. 34

50. J.E. Thomas, C.M. Rylett, A. Carhan, N.D. Bland, R.J. Bing- ham, A.D. Shirras, A.J. Turner, and R.E. Isaac. Drosophila melanogaster NEP2 is a new soluble member of the neprilysin family of endopeptidases with implications for reproduction and renal function. Biochemical Journal, 386(2):357–366, 2005. 184

51. A. Grant, D. Lee, and C. Orengo. Progress towards mapping the universe of protein folds. Genome Biology, 5(5):107–116, 2004. 3, 4, 17

52. C. Schalon, J.S. Surgand, E. Kellenberger, and D. Rognan. A sim- ple and fuzzy method to align and compare druggable ligand- binding sites. Proteins, 71(4):1755–1778, 2008. 29

53. J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hub- bard, and C. Chothia. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. Journal of Molecular Biology, 284(4):1201– 1210, 1998. 16

54. M. Lazarescu, H. Bunke, and S. Venkatesh. Graph matching: fast candidate elimination using machine learning techniques. LNCS. Advances in Pattern Recognition, 1876/2000:236–245, 2000. 35

55. R. Potestio, T. Aleksiev, F. Pontiggia, S. Cozzini, and C. Micheletti. ALADYN: a web server for aligning proteins by matching their large-scale motion. Nucleic Acids Research, 38(2):W41–W45, 2010. 20

56. M. Neuhaus and H. Bunke. A convolution edit kernel for error- tolerant graph matching. In ICPR'06: 18th IAPR International Conference on Pattern Recognition. Proceedings, volume 4, pages 220–223, Hong-Kong, China, August 2006a. 31, 40

57. A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13(3):353–362, 1983. 36, 68

58. J. Fodor and M. Roubens. Fuzzy Preference Modelling and Multi- criteria Decision Support. Kluwer Academic Publishers, 1994. 114

59. A. Moustafa. JAligner: open source Java implementation of Smith- Waterman. available at, 2005. 117 A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia. SCOP: a structural classification of proteins database for the investiga- tion of sequences and structures. Journal of Molecular Biology, 247(4):536–540, 1995. 18, 120, 164

60. L. David, L.R. de Beer Tjaart, T. Janet, and O. Christine. 1,000 structures and more from the MCSG. BMC Structural Biology, 11(2):2, 2011. ISSN 1472-6807. 1

61. K. Schädler and F. Wysotzki. A connectionist approach to struc- tural similarity determination as a basis of clustering, classifi- cation and feature detection. LNCS. Principles of Data Mining and Knowledge Discovery, 1263/1997:254–264, 1997. 35

62. D.C. Schmidt and L.E. Druffel. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM, 23(3):433–445, 1976. 35

63. S.B. Needleman and C.D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970. 3, 16, 90

64. E.H. Davidson, J.P. Rast, P. Oliveri, A. Ransick, C. Calestani, C.H. Yuh, T. Minokawa, G. Amore, V. Hinman, C. Arenas- Mena, Otim O., Brown C.T., Livi C.B., Lee P.Y., Revilla R., Rust A.G., Pan Z., Schilstra M.J., Clarke P.J., Arnone M.I., Rowen L., Cameron R.A., McClay D.R., Hood L., and Bolouri H. A genomic regulatory network for development. Science, 295(5560):1669–1678, 2002. 30

65. H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3- 4):255–259, 1998. 33

66. M.L. Fernández and G. Valiente. A Graph distance metric com- bining maximum common subgraph and minimum common su- pergraph. Pattern Recognition Letters, 22(6-7):753–758, 2001. 33

67. W. Floyd. Algorithm 97: shortest path. Communications of the ACM, 5(6):345–345, 1962. ISSN 0001-0782. 90

68. D. Shasha, J.T.L. Wang, and R. Giugno. Algorithmics and ap- plications of tree and graph searching. In PODS'02: 21th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Proceedings, pages 39–52, Madison, USA, June 2002. 30

69. F. Glaser, R.J. Morris, R.J. Najmanovich, R.A. Laskowski, and J.M. Thornton. A method for localizing ligand binding pock- ets in protein structures. Proteins, 62(2):479–488, 2006. 24

70. L.G. Shapiro and R.M. Haralick. A metric for comparing rela- tional descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(1):90–94, 2009. ISSN 0162-8828. 36

71. J.R. Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM, 23(1):31–42, 1976. 34

72. A. Jagota, M. Pelillo, and A. Rangarajan. A new deterministic annealing algorithm for maximum clique. In IJCNN'00: In- ternational Joint Conference on Neural Networks. Proceedings., pages 505–508, Como, Italy, July 2000. 38 B.J. Jain, P. Geibel, and F. Wysotzki. SVM learning with the Schur-Hadamard inner product for graphs. Neurocomputing, 64:93–105, 2005. 40

73. B. Bustos, D. Keim, D. Saupe, T. Schreck, and D. Vranic. An experimental comparison of feature-based 3D retrieval meth- ods. In 3DPVT'04: International Symposium on 3D Data Pro- cessing Visualization and Transmission. Proceedings., pages 215– 222, Thessaloniki, Greece, September 2004. 156 A.A. Canutescu, A.A. Shelenkov, and R.L. Dunbrack Jr. A graph- theory algorithm for rapid protein side-chain prediction. Pro- tein Science, 12(9):2001–2014, 2003. ISSN 1469-896X. 122

74. J.E. Hopcroft and R.M. Karp. An n 5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on Computing, 2 (4):225–231, 1973. 106

75. A. Stark and R.B. Russell. Annotation in three dimensions. PINTS: patterns in non-homologous tertiary structures. Nu- cleic Acids Research, 31(13):3341–3344, 2003. 23, 63, 112 P.N. Suganthan. Attributed relational graph matching by neural- gas networks. In NNSP'02: Neural Networks for Signal Pro- cessing X, 2000. Proceedings, volume 1, pages 366–374, Sydney, Australia, December 2002. ISBN 0780362780. 38

76. H. Huson and D. Bryant. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23(2): 254–267, 2006. 30

77. M. Neuhaus and H. Bunke. A probabilistic approach to learning costs for graph edit distance. In ICPR'04: 17th IAPR Interna- tional Conference on Pattern Recognition. Proceedings, volume 3, pages 389–393, Cambridge, UK, August 2004. 38

78. Z.X. Wang. A re-estimation for the total numbers of protein folds and superfamilies. Protein Engineering Design and Selection, 11 (8):621–626, 1998. 3, 4, 17 P.P. Wangikar, A.V. Tendulkar, S. Ramya, D.N. Mali, and S. Sarawagi. Functional sites in protein families uncovered via an objective and automated graph theoretic approach. Journal of Molecular Biology, 326(3):955–978, 2003. 22

79. W. Kabsch. A solution of the best rotation to relate two sets of vectors. Acta Crystallographica, 32(8):922–923, 1976. 119

80. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 26(10): 1367–1372, 2004. 34 A.D.J. Cross, R.C. Wilson, and E.R. Hancock. Genetic search for structural matching. In ECCV'96: Fourth European Confer- ence on Computer Vision-Volume I. Proceedings, pages 514–525, Camebridge, UK, April 1996. 38

81. A. Falicov and F.E. Cohen. A surface of minimum area metric for the structural comparison of proteins. Journal of Molecular Biology, 258(5):871–892, 1996. 20

82. Gärtner. A survey of kernels for structured data. SIGKKD Explorations, 5(1):49–58, 2003. 40, 86, 87, 117, 178

83. M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. Atlas of protein sequence and structure. National Biomedical Research Founda- tion, 5(Suppl. 3):Dayhoff, M.O., 1978. 28

84. J. McGregor. Backtrack search algorithms and the maximal com- mon subgraph problem. Software -Practice and Experience, 12 (1):23–34, 1982. 35 B.D. McKay. Practical Graph Isomorphism. Citeseer, 1981. 35

85. Diplom, Biologie, Philipps-University, Marburg, Deutschland Titel der Diplomarbeit: " Biotinylierung -Eine Methode zur Detektion infek- tionsbedingter Konformationsänderungen in Membranproteinen am Beispiel von Bande 3 in P.falciparum-infizierten Erythrozyten " 10/98-10/99

86. A. Kryshtafovych, K. Fidelis, and J. Moult. CASP8 results in context of previous experiments. Proteins: Structure, Function, and Bioinformatics, 77(S9):217–228, 2009. ISSN 1097-0134. 2, 193

87. C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, and J.M. Thornton. CATH -a hierarchic classification of pro- tein domain structures. Structure, 5(8):1093–1108, 1997. 18 C.A. Orengo, A.E. Todd, and J.M. Thornton. From protein struc- ture to function. Current Opinion in Structural Biology, 9(3): 374–382, 1999. 4, 21 A.R. Ortiz, C.E.M. Strauss, and O. Olmea. MAMMOTH (match- ing molecular models obtained from theory): an automated method for model comparison. Protein Science, 11(11):2606– 2621, 2002. 19, 20 REFERENCES P.R.J. ¨ Ostergård. A new algorithm for the maximum-weight clique problem. In TW'99: 6th Twente Workshop on Graphs and Com- binatorial Optimization. Proceedings, volume 3, pages 153–156, Enschede, The Netherlands, May 1999. 28

88. Villerbu, A.M. Gaben, G. Redeuilh, and J. Mester. Cellular ef- fects of purvalanol A: A specific inhibitor of cyclin-dependent kinase activities. International Journal of Cancer, 97(6):761– 769, 2002. 159

89. X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In KDD'03: Ninth ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining. Proceedings, pages 286–295, Washington, DC, USA, August 2003. 31, 33

90. J. Kittler and E.R. Hancock. Combining evidence in probabilis- tic relaxation. International Journal of Pattern Recognition and Artificial Intelligence, 3(1):29–51, 1989. 37

91. E.B. Krissinel and K. Henrick. Common subgraph isomorphism detection by backtracking search. Software Practice and Expe- rience, 34(6):591–607, 2004b. 18

92. M. Gerstein and M. Levitt. Comprehensive assessment of au- tomatic structural alignment against a manual standard, the SCOP classification of proteins. Protein Science, 7(2):445–456, 1998. 18, 20

93. J. Mestres. Computational chemogenomics approaches to system- atic knowledge-based drug discovery. Current Opinion in Drug Discovery & Development, 7(3):304, 2004. ISSN 1367-6733. 7

94. G. Schneider and U. Fechner. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, 4(8):649– 663, 2005. ISSN 1474-1776. 5

95. M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman & Co, 1979. 33, 65

96. REFERENCES A.C.M. Dumay, R.J. van der Geest, J.J. Gerbrands, E. Jansen, and J.H.C. Reiber. Consistent inexact graph matching applied to labelling coronary segments in arteriograms. In ICPR'92: 11th IAPR International Conference on Pattern Recognition. Proceed- ings, pages 439–442, The Hague, Netherlands, August 1992. 37

97. J. Larrosa and G. Valiente. Constraint satisfaction algorithms for graph pattern matching. Mathematical Structures in Computer Science, 12(4):403–422, 2002. ISSN 0960-1295. 34

98. L. Holm and C. Sander. Dali: a network tool for protein structure comparison. Trends in Biochemical Sciences, 20(11):478–480, 1995. 18

99. L. Holm and J. Park. DaliLite workbench for protein structure comparison. Bioinformatics, 16(6):566–567, 2000. 18

100. G. Vriend and C. Sander. Detection of common three-dimensional substructures in proteins. Proteins, 11(1):52–58, 1991. 20

101. R.B. Russell. Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. Journal of Molecular Biology, 279(5):1211–1227, 1998. 23

102. M.L. Williams, R.C. Wilson, and E.R. Hancock. Deterministic search for relational graph matching. Pattern Recognition, 32 (7):1255–1271, 1999. 38

103. S.D. Copley, W.R.P. Novak, and P.C. Babbitt. Divergence of func- tion in the thioredoxin fold suprafamily: evidence for evolution of peroxiredoxins from a thioredoxin-like ancestor. Biochem- istry, 43(44):13981–13995, 2004. 4, 21

104. J. Hartshorn, M.L. Verdonk, G. Chessari, S.C. Brewerton, W.T.M. Mooij, P.N. Mortenson, and C.W. Murray. Diverse, high-quality test set for the validation of protein-ligand dock- ing performance. Journal of Medicinal Chemistry, 50(4):726– 741, 2007. 121

105. S. Pérot, O. Sperandio, M.A. Miteva, A.C. Camproux, and B.O. Villoutreix. Druggable pockets and binding site centric chem- ical space: a paradigm shift in drug discovery. Drug Discovery Today, 15(15-16):656–667, 2010. ISSN 1359-6446. 2

106. P.T. Corbett, J. Leclaire, L. Vial, K.R. West, J.L. Wietor, J.K.M. Sanders, and S. Otto. Dynamic combinatorial chemistry. Chemical Reviews, 106(9):3652–3711, 2006. 5

107. J. Huan, W. Wang, and J. Prins. Efficient mining of frequent sub- graphs in the presence of isomorphism. In ICDM'03: Third IEEE International Conference on Data Mining. Proceedings, pages 549–561, Melbourne, USA, November 2003. 33

108. T. Wang and J. Zhou. EMCSS: A new method for maximal com- mon substructure search. Journal of Chemical Information and Computer Sciences, 37(5):828–834, 1997. 35

109. Rost. Enzyme function less conserved than anticipated. Journal of Molecular Biology, 318(2):595–608, 2002. 2, 17, 21

110. REFERENCES B.T. Messmer and H. Bunke. Error-correcting graph isomorphism using decision trees. International Journal of Pattern Recogni- tion and Artificial Intelligence, 12(6):721–742, 1998. 38 B.T. Messmer and H. Bunke. A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognition, 32 (12):1979–1998, 1999. 35 B.T. Messmer and H. Bunke. Efficient subgraph isomorphism detection: a decomposition approach. IEEE Transactions on Knowledge and Data Engineering, 12(2):307–323, 2002. 35

111. W.H. Tsai and K.S. Fu. Error-correcting isomorphisms of at- tributed relational graphs for pattern analysis. IEEE Trans- actions on Systems, Man and Cybernetics, 9(12):757–768, 1979. 37 W.H. Tsai and K.S. Fu. Subgraph error-correcting isomorphism for syntactic pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, SMC-13((1)):48–62, 1983. 37

112. P. Adams and V.V. Brantner. Estimating the cost of new drug development: is it really 802 million dollars? Health Affairs, 25(2):420–428, 2006. 7

113. S. Govindarajan, R. Recabarren, and R.A. Goldstein. Estimat- ing the total number of protein folds. Proteins, 35(4):408–414, 1999. 3, 4, 17

114. European School of genetic medicine: " 9th Course in Bioinformatics and System Biology for Molecular Biologists " , Bertinoro, Italien 06/07

115. R. Sánchez and A. ˇ Sali. Evaluation of comparative protein struc- ture modeling by MODELLER-3. Proteins, 29(S1):50–58, 1997. ISSN 1097-0134. 21

116. I. Rechenberg and M. Eigen. Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. Frommann-Holzboog, 1973. 73 O.C. Redfern, A. Harrison, T. Dallman, F.M. Pearl, and C.A. Orengo. CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain pro- tein structures. PLoS Computational Biology, 3(11):2334–2347, 2007. 18, 20, 116

117. Gärtner. Exponential and geometric kernels for graphs. In NIPS'02: Workshop on Unreal Data: Principles of Modeling Nonvectorial Data. Proceedings, pages 27–31, Whistler, Canada, December 2002. 40

118. P. Mahé, N. Ueda, T. Akutsu, J.L. Perret, and J.P. Vert. Ex- tensions of marginalized graph kernels. In ICML'04: 21st In- ternational Conference on Machine Learning. Proceedings, pages 70–78, Banff, Canada, July 2004. 40 C.D. Manning, P. Raghavan, and H. Sch " utze. Introduction to Information Retrieval, volume 1. Cam- bridge University Press Cambridge, 2008. 151, 156, 157, 158

119. @BULLET Marco Mernberger, Thomas Fober, Eyke Hüllermeier Efficient Similarity Retrieval for Protein Structures based on Histogram Com- parison German Conference on Bioinformatics, Braunschweig, Germany, September, 2010

120. @BULLET Marco Mernberger, Thomas Fober, Gerhard Klebe, Eyke Hüllermeier Evolutionary Construction of Multiple Graph Alignments for the Structural Anal- ysis of Biomolecules Oxford Bioinformatics 25(16): 2110 -2117, August, 2009

121. @BULLET Marco Mernberger, Thomas Fober, Vitalik Melnikov, Ralph Moritz, Eyke Hüllermeier Extension and Empirical Comparison of Graph-Kernels for the Analysis of Pro- tein Active Sites Knowledge Discovery, Data Mining, und Maschinelles Lernen, Darmstadt, Germany, Septem- ber, 2009

122. @BULLET Marco Mernberger, Thomas Fober, Ralph Moritz, Eyke Hüllermeier Graph-Kernels for the Comparative Analysis of Protein Active Sites German Conference on Bioinformatics, Halle (Saale), Germany, September, 2009

123. @BULLET Marco Mernberger, Gerhard Klebe, Eyke Hüllermeier SEGA -A Semi-Global Approach to Graph Alignment for Approximate Molec- ular Structure Comparison IEEE/ACM Transactions on Computational Biology and Bioinformatics, February 2011

124. @BULLET Imen Boukhris, Zied Elouedi, Thomas Fober, Marco Mernberger, Eyke Hüllermeier Similarity Analysis of Protein Binding Sites: A Generalization of the Maximum Common Subgraph Measure based on Quasi-Clique Detection International Conference on Intelligent Systems Design and Applications, Pisa, Italy, Novem- ber, 2009

125. X. Wang and J.T. Wang. Fast similarity search in three- dimensional structure databases. Journal of Chemical Infor- mation and Computer Sciences, 40(2):442–451, 2000. 36

126. Ye and A. Godzik. Flexible structure alignment by chain- ing aligned fragment pairs allowing twists. Bioinformatics, 19 (Suppl. 2):ii246–ii255, 2003. 20

127. M.L. Moss and F.H. Rasmussen. Fluorescent substrates for the proteinases ADAM17, ADAM10, ADAM8, and ADAM12 useful for high-throughput inhibitor screening. Analytical Biochem- istry, 366(2):144–148, 2007. 184

128. M. Thornton. From genome to function. Science, 292(5524): 2095–2097, 2001. 4, 17, 21

129. S. Schmitt, M. Hendlich, and G. Klebe. From structure to func- tion: a new approach to detect functional similarity among proteins independent from sequence and fold homology. Ange- wandte Chemie International Edition, 40(17):3141–3146, 2001. 27, 116

130. J.M. Thornton, A.E. Todd, D. Milburn, N. Borkakoti, and C.A. Orengo. From structure to function: approaches and limita- tions. Nature Structural & Molecular Biology, 7:991–994, 2000. 1

131. D. Kuhn, N. Weskamp, S. Schmitt, E. Hüllermeier, and G. Klebe. From the similarity analysis of protein cavities to the func- tional classification of protein families using Cavbase. Journal of Molecular Biology, 359(4):1023–1044, 2006. 11, 55, 56, 57, 122 H.W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics, 52(1):7–21, 2005. 103, 106, 112, 116, 194 REFERENCES M. Kuramochi and G. Karypis. Discovering frequent geometric subgraphs. Information Systems, 32(8):1101–1120, 2007. 31, 33

132. S.B. Pandit and J. Skolnick. Fr-TM-align: a new protein struc- tural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics, 9(1):531–542, 2008. 19, 20 A.N. Papadopoulos and Y. Manolopoulos. Structure-based Simi- larity Search with Graph Histograms. In DEXA'99: Tenth In- ternational Workshop on Database and Expert Systems Applica- tions. Proceedings, pages 174–178, Florence, Italy, August 1999. 39 P.M. Pardalos, J. Rappe, and M.G.C. Resende. An Exact Parallel Algorithm for the Maximum Clique Problem. Citeseer, 1998. 35

133. E.D. Scheeff and J.L. Fink. Fundamentals of protein structure. In Structural Bioinformatics, chapter 2, pages 15–40. John Whiley & Sons, 2009. 4

134. K. Wang, K.C. Fan, and J.T. Horng. Genetic-based search for error-correcting graph isomorphism. IEEE Transactions on Sys- tems, Man and Cybernetics, Part B, 27(4):588–597, 1997. 38

135. Y. Lamdan and H.J. Wolfson. Geometric hashing: a general and efficient model-based recognition scheme. In ICCV'88: 2nd In- ternational Conference on Computer Vision. Proceedings, pages 238–249, Tampa, USA, December 1988. 22, 36

136. W.D. Wallis, P. Shoubridge, M. Kraetz, and D. Ray. Graph dis- tances using graph union. Pattern Recognition Letters, 22(6-7): 701–704, 2001. 33

137. A. Robles-Kelly and E.R. Hancock. Graph edit distance from spec- tral seriation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 27(3):365–378, 2005. 41 B.P. Roques, M.C. Fournie-Zaluski, E. Soroca, J.M. Lecomte, B. Malfroy, C. Llorens, and J.C. Schwartz. The enkephalinase inhibitor thiorphan shows antinociceptive activity in mice. Na- ture, 288(5788):286–288, 1980. 184

138. X. Yan, P.S. Yu, and J. Han. Graph indexing: a frequent structure- based approach. In SIGMOD'04: ACM SIGMOD International Conference on Management of Data. Proceedings, pages 335–346, Paris, France, June 2004. 30

139. S.V.N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K.M. Borgwardt. Graph kernels. Journal of Machine Learning Re- search, 11:1201–1242, 2010. 30, 41

140. L. Ralaivola, S.J. Swamidass, H. Saigo, and P. Baldi. Graph ker- nels for chemical informatics. Neural Networks, 18(8):1093– 1110, 2005. 31, 41

141. H. Bunke and X. Jiang. Graph Matching and Similarity. Kluwer Academic Publishers, 2000. 8, 31, 33

142. C. Irniger and H. Bunke. Graph matching: filtering large databases of graphs using decision trees. In GbR'01: 3rd IAPR-TC15 Workshop on Graph-based Representation in Pattern Recognition. Proceedings, pages 239–249, Ischia, Italy, May 2001. 35 V.A. Ivanisenko, S.S. Pintus, D.A. Grigorovich, and N.A. Kolchanov. PDBSiteScan: a program for searching for ac- tive, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Research, 32(Web Server Issue):W549–W554, 2004. 23

143. J.L. Gross and J. Yellen. Graph Theory and its Applications. CRC Press, 2006. ISBN 158488505X. 52

144. Geburtsort: Herborn Familienstand: verheiratet Email: Beruf 03/07-heute Wissenschaftlicher Mitarbeiter, AG KEBI, Prof. Hüllermeier, FB Mathe- matik und Informatik, Philipps-Universität, Marburg, Deutschland 11/05-05/09

145. J.W. Raymond, E.J. Gardiner, and P. Willett. Heuristics for simi- larity searching of chemical graphs using a maximum common edge subgraph algorithm. Journal of Chemical Information and Computer Sciences, 42(2):305–316, 2002. 31

146. R. Eddy. Hidden Markov models. Current Opinion in Structural Biology, 6(3):361–365, 1996. 16 S.R. Eddy. Profile hidden Markov models. Bioinformatics, 14(9): 755–763, 1998. 16, 28

147. A. Krogh and M. Brown. Hidden Markov models in computational biology. Journal of Molecular Biology, 235(2):1501–1531, 1994. 16

148. W. Tian and J. Skolnick. How well is enzyme function conserved as a function of pairwise sequence identity? Journal of Molecular Biology, 333(4):863–882, 2003. 2, 17

149. J. De Jong, J. Goudsmit, W. Keulen, B. Klaver, W. Krone, M. Tersmette, and A. De Ronde. Human immunodeficiency virus type 1 clones chimeric for the envelope V3 domain differ in syncytium formation and replication capacity. Journal of Virology, 66(2):757–765, 1992. 194 E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269–271, 1959. 19

150. T.F. Smith and M.S. Waterman. Identification of common molec- ular subsequences. Journal of Bolecular Biology, 147:195–197, 1981. 4, 16, 117 A.J. Smola and R. Kondor. Kernels and regularization on graphs. In COLT'03: 16th Annual Conference on Learning Theory and 7th Kernel Workshop. Proceedings, pages 144–158, Washington, DC, USA, August 2003. 40

151. K. Kinoshita and H. Nakamura. Identification of protein biochem- ical functions by similarity search using the molecular surface database eF-site. Protein Science, 12(8):1589–1595, 2003. 7, 8, 27, 30

152. Y.Y. Tseng and W.H. Li. Identification of protein functional sur- faces by the concept of a split pocket. Proteins, 76(4):959–976, 2009. 25

153. M. Silberstein, S. Dennis, L. Brown, T. Kortvelyesi, K. Clodfelter, and S. Vajda. Identification of substrate binding sites in en- zymes by computational solvent mapping. Journal of Molecular Biology, 332(5):1095–1113, 2003. 25

154. K. Kinoshita and H. Nakamura. Identification of the ligand bind- ing sites on the molecular surface of proteins. Protein Science, 14(3):711–718, 2005. 27, 30, 34, 65, 116

155. L. Xu and E. Oja. Improved simulated annealing, boltzmann ma- chine, and attributed graph matching. In EURASIP'90: Work- shop on Neural Networks. Proceedings, pages 151–160, Sesimbra, Portugal, February 1990. 38 R.R. Yager. On ordered weighted averaging aggregation operators inmulticriteria decisionmaking. IEEE Transactions on Systems, Man and Cybernetics, 18(1):183–190, 1988. 82

156. L. Jaroszewski, A. Godzik, and L. Rychlewski. Improving the quality of twilight-zone alignments. Protein Science, 9(8):1487– 1496, 2000. ISSN 1469-896X. 21 L.J. Jensen, R. Gupta, H.H. Staerfeldt, and S. Brunak. Predic- tion of human protein function according to Gene Ontology categories. Bioinformatics, 19(5):635–642, 2003. 2

157. J.D. Watson, G.J. Bartlett, and J.M. Thornton. Inferring protein function from structure. In Structural Bioinformatics, chap- ter 21, pages 515–537. John Whiley & Sons, 2009. 4, 5, 8, 16 E.C. Webb. Enzyme Nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochem- istry and Molecular Biology on the Nomenclature and Classifica- tion of Enzymes. Academic Press, 1992. 164

158. J. Lafferty and G. Lebanon. Information diffusion kernels. Ad- vances in Neural Information Processing Systems, 500(2):391– 398, 2003. 40

159. D. Grobelny, L. Poncz, and R.E. Galardy. Inhibition of human skin fibroblast collagenase, thermolysin, and Pseudomonas aerugi- nosa elastase by peptide hydroxamic acids. Biochemistry, 31 (31):7152–7154, 1992. 184

160. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. 39

161. H. Kashima, K. Tsuda, and A. Inokuchi. Kernels for graphs. In Kernel Methods in Computational Biology, chapter 7, pages 155– 170. MIT Press, 2004. 40

162. T. Sing, N. Beerenwinkel, and T. Lengauer. Learning mixtures of localized rules by maximizing the area under the ROC curve. In ROCAI'04: First International Workshop on ROC Analysis in Artificial Intelligence. Proceedings, volume 22, pages 96–98, Valencia, Spain, August 2004. 194 REFERENCES A.P. Singh and D.L. Brutlag. Hierarchical protein structure su- perposition using both secondary structure and atomic repre- sentations. In ISMB'97: 16th International Conference on In- telligence Systems for Molecular Biology. Proceedings, volume 5, pages 284–293, Halkidiki, Greece, June 1997. 19, 20

163. Umeyama. Least-squares estimation of transformation parame- ters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4):376–380, 1991. ISSN 0162-8828. 27

164. M. Hendlich, F. Rippmann, and G. Barnickel. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. Journal of Molecular Graphics and Modelling, 15(6): 359–363, 1997. 11, 25, 53

165. J.E. Hopcroft and J.K. Wong. Linear time algorithm for isomor- phism of planar graphs. In STOC'74: Sixth Annual ACM Sym- posium on Theory of Computing. Proceedings, pages 172–184, Seattle, Washington, July 1974. 33

166. J. Berg and M. Lässig. Local graph alignment and motif search in biological networks. PNAS, 101(41):14689–14694, 2004. 9, 30 H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. The protein data bank. Nucleic Acids Research, 28(1):235–242, 2000. 1, 2, 22, 163, 164, 165

167. C. Tonnelier, P. Jauffret, T. Hanser, and G. Kaufmann. Machine learning of generic reactions: 3. an efficient algorithm for max- imal common substructure determination. Tetrahedron Com- puter Methodology, 3(6):351–358, 1990. 34

168. K. Katoh, K. Misawa, K. Kuma, and T. Miyata. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14):3059–3066, 2002. 16

169. K. Mason, N.M. Patel, A. Ledel, C.C. Moallemi, and E.A. Wint- ner. Mapping protein pockets through their potential small- molecule binding volumes: QSCD applied to biological protein structures. Journal of Computer-aided Molecular Design, 18(1): 55–70, 2004. 26

170. @BULLET Thomas Fober, Eyke Hüllermeier, Marco Mernberger Evolutionary Construction of Multiple Graph Alignments for the Structural Anal- ysis of Biomolecules German Conference on Bioinformatics, Dresden, Germany, September, 2008

171. H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels be- tween labeled graphs. In ICML'03: 20th International Confer- ence on Machine Learning. Proceedings, pages 321–328, Wash- ington, DC, USA, August 2003. 40

172. O. Dror, H. Benyamini, R. Nussinov, and H. Wolfson. MASS: multiple structural alignment by secondary structures. Bioin- formatics, 19(1):i95–104, 2003. 18, 21 A.Z. Dudek, T. Arodz, and J. Galvez. Computational meth- ods in developing quantitative structure-activity relationships (QSAR): a review. Combinatorial Chemistry & High-Throughput Screening, 9(3):213–228, 2006. ISSN 1386-2073. 5

173. M. Moll and L.E. Kavraki. Matching of structural motifs using hashing on residue labels and geometric filtering for protein function prediction. In CSB'08: 7th Conference on Computa- tional Systems Bioinformatics. Proceedings, pages 157–169, Palo Alto, USA, August 2008. 23 S.D. Mooney, M.H.P. Liang, R. DeConde, and R.B. Altman. Struc- tural characterization of proteins using residue environments. Proteins, 61(4):741–747, 2005. 26 G.M. Morris, D.S. Goodsell, R.S. Halliday, R. Huey, W.E. Hart, R.K. Belew, and A.J. Olson. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, 19(14): 1639–1662, 1998. ISSN 1096-987X. 6

174. J.W. Raymond and P. Willett. Maximum common subgraph iso- morphism algorithms for the matching of chemical structures. Journal of Computer-Aided Molecular Design, 16(7):521–533, 2002. 33

175. J.S. Fetrow and J. Skolnick. Method for prediction of protein func- tion from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T˜T˜1 ribonucleases. Journal of Molecular Biology, 281(5):949– 968, 1998. 22

176. M. Karplus and J.A. McCammon. Molecular dynamics simula- tions of biomolecules. Nature Structural & Molecular Biology, 9 (9):646–652, 2002. 5

177. T.R. Hagadone. Molecular substructure similarity searching: effi- cient retrieval in two-dimensional structure databases. Journal of Chemical Information and Computer Sciences, 32(5):515–521, 1992. 35

178. H. Leonov, J.S.B. Mitchell, and I.T. Arkin. Monte Carlo estima- tion of the number of possible protein folds: effects of sampling bias and folds distributions. Proteins, 51(3):352–359, 2003. 3, 4, 17 V.I. Levenshtein. Binary codes capable of correcting deletions, in- sertions, and reversals. Soviet Physics Doklady, 10(8):707–710, 1966. 90

179. R.B. Russell and G.J. Barton. Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins, 14(2):309–323, 1992. ISSN 1097-0134. 20

180. CE-MC: a multiple protein structure alignment server. Nucleic Acids Research, 32(Web Server Issue):W100–W103, 2004. 19, 21

181. M. Shatsky, R. Nussinov, and H.J. Wolfson. MultiProt -a multiple protein structural alignment algorithm. LNCS. Algorithms in Bioinformatics, 2452/2002:235–250, 2002b. 19

182. N. Leibowitz, R. Nussinov, and H.J. Wolfson. MUSTA -a general, efficient, automated method for multiple structure alignment and detection of common motifs: application to proteins. Jour- nal of Computational Biology, 8(2):93–121, 2001. 19, 21

183. W.P. Russ, D.M. Lowery, P. Mishra, M.B. Yaffe, and R. Ran- ganathan. Natural-like function in artificial WW domains. Na- ture, 437(7058):579–583, 2005. 2, 17

184. M. Pellecchia, D.S. Sem, and K. W " uthrich. NMR in drug discovery. Nature Reviews Drug Discov- ery, 1(3):211–219, 2002. ISSN 1474-1776. 1

185. H. Nakamura and S. Nishida. Numerical calculations of electro- static potentials of protein-solvent systems by the self consis- tent boundary method. Journal of the Physical Society of Japan, 56:1609–1622, 1987. 27

186. N. Nagano, C.A. Orengo, and J.M. Thornton. One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. Journal of Molecular Biology, 321(5):741–765, 2002. ISSN 0022- 2836. 4, 21

187. Y. Lamdan, J.T. Schwartz, and H.J. Wolfson. On recognition of 3D objects from 2D images. In ICRA'88: 1988 IEEE Inter- national Conference on Robotics and Automation. Proceedings, pages 1407–1413, Philadelphia, USA, April 1988. 22

188. H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1):13–25, 2000. 33

189. H. Fröhlich, J.K. Wegner, F. Sieker, and A. Zell. Optimal as- signment kernels for attributed molecular graphs. In ICML'05: 22nd International Conference on Machine Learning. Proceed- ings, volume 119, pages 225–232, Bonn, Germany, August 2005. 41 N. Funabiki and J. Kitamichi. A two-stage discrete optimization method for largest common subgraph problems. IEICE Trans- actions on Information and Systems, 82(8):1145–1153, 1999. 35

190. E. Takimoto and M.K. Warmuth. Path kernels and multiplicative updates. The Journal of Machine Learning Research, 4:773–818, 2003. 40

191. H. Umezawa, T. Aoyagi, H. Morishima, M. Matsuzaki, and M. Hamada. Pepstatin, a new pepsin inhibitor produced by actinomycetes. Journal of Antibiotics, 23(5):259–262, 1970. 163

192. T. Langer and R.D. Hoffmann. Pharmacophores And Pharma- cophore Searches. Vch Verlagsgesellschaft Mbh, 2006. ISBN 3527312501. 5, 8

193. D.G. Levitt and L.J. Banaszak. POCKET: A computer graphies method for identifying and displaying protein cavities and their surrounding amino acids. Journal of Molecular Graphics, 10(4): 229–234, 1992. 24

194. M. Weisel, E. Proschak, and G. Schneider. PocketPicker: analy- sis of ligand binding-sites with shape descriptors. Chemistry Central Journal, 1(7):1–17, 2007. 25

195. J.A. Capra and M. Singh. Predicting functionally important residues from sequence conservation. Bioinformatics, 23(15): 1875–1882, 2007. ISSN 1367-4803. 26

196. D. Lee, O. Redfern, and C. Orengo. Predicting protein function from sequence and structure. Nature Reviews Molecular Cell Biology, 8(12):995–1005, 2007. 2, 17

197. J.C. Whisstock and A.M. Lesk. Prediction of protein function from protein sequence and structure. Quarterly Reviews of Bio- physics, 36(3):307–340, 2004. 2, 17

198. M. Gribskov, A.D. McLachlan, and D. Eisenberg. Profile anal- ysis: detection of distantly related proteins. PNAS, 84(13): 4355–4358, 1987. 16 REFERENCES H.M. Grindley, P.J. Artymiuk, D.W. Rice, and P. Willett. Iden- tification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. Journal of Molecular Biology, 229(3):707–721, 1993. 18, 30

199. J.M. Thornton, C.A. Orengo, A.E. Todd, and F.M.G. Pearl. Pro- tein folds, functions and evolution. Journal of Molecular Biol- ogy, 293(2):333–342, 1999. 21

200. M.L. Verdonk, P.N. Mortenson, R.J. Hall, M.J. Hartshorn, and C.W. Murray. Protein-ligand docking against non-native pro- tein conformers. Journal of Chemical Information and Modeling, 48(11):2214–2225, 2008. ISSN 1549-9596. 120, 121

201. I.N. Shindyalov and P.E. Bourne. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering Design and Selection, 11(9):739–747, 1998. 19 I.N. Shindyalov and P.E. Bourne. A database and tools for 3-D protein structure comparison and alignment using the combi- natorial extension (CE) algorithm. Nucleic Acids Research, 29 (1):228–229, 2001. 19

202. Owen, L.R. Thompson, I. Tran, M.F. Tutt, and T. Young. Rapid assessment of a novel series of selective CB2 agonists us- ing parallel synthesis protocols: a lipophilic efficiency (LipE) analysis. Bioorganic & Medicinal Chemistry Letters, 19(15): 4406–4409, 2009. ISSN 0960-894X. 7

203. H. Bunke and BT Messmer. Recent advances in graph matching. In Spatial Computing: Issues in Vision, Multimedia and Visu- alization Technologies, pages 169–204. World Scientific, 1997. ISBN 9810229240. 35

204. Shulman-Peleg, R. Nussinov, and H.J. Wolfson. Recognition of functional sites in protein structures. Journal of Molecular Biology, 339(3):607–633, 2004. xii, 7, 27, 28, 65, 121, 149, 150, 152, 154, 207

205. P.N. Suganthan and H. Yan. Recognition of handprinted Chinese characters by constrained graph matching. Image and Vision Computing, 16(3):191–201, 1998. ISSN 0262-8856. 38

206. T. Benchetrit, M.C. Fournié-Zaluski, and B.P. Roques. Relation- ship between the inhibitory potencies of thiorphan and retroth- iorphan enantiomers on thermolysin and neutral endopeptidase 24.11 and their interactions with the thermolysin active site by computer modelling. Biochemical and Biophysical Research Communications, 147(3):1034–1040, 1987. 184

207. M. Hendlich, A. Bergner, J. Günther, and G. Klebe. Relibase: de- sign and development of a database for comprehensive analysis of protein-ligand interactions. Journal of Molecular Biology, 326 (2):607–620, 2003. ISSN 0022-2836. 119

208. F. Teichert, U. Bastolla, and M. Porto. SABERTOOTH: protein structural alignment based on a vectorial structure representa- tion. BMC Bioinformatics, 8(1):425–442, 2007. 19, 20

209. Y. Tian, R.C. McEachin, C. Santos, D.J. States, and J.M. Patel. SAGA: a subgraph matching tool for biological graphs. Bioin- formatics, 23(2):232–239, 2007. 30 A.E. Todd, C.A. Orengo, and J.M. Thornton. Evolution of function in protein superfamilies, from a structural perspective. Journal of Molecular Biology, 307(4):1113–1143, 2001. 2, 17, 21

210. Y. Zhang and J. Skolnick. Scoring function for automated assess- ment of protein structure template quality. Proteins, 57(4): 702–710, 2004. 19

211. W.R. Pearson. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 11(3):635–650, 1991. 2, 16

212. E. Krissinel and K. Henrick. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimen- sions. Acta Crystallographica Section D: Biological Crystallogra- phy, 60(1:12):2256–2268, 2004a. 18, 20

213. J.H. Chan, J.S. Hong, L.F. Kuyper, D.P. Baccanari, S.S. Joyner, R.L. Tansik, C.M. Boytos, and S.K. Rudolph. Selective in- hibitors of Candida albicans dihydrofolate reductase: activity and selectivity of 5-(arylthio)-2, 4-diaminoquinazolines. Jour- nal of Medicinal Chemistry, 38(18):3608–3616, 1995. 160

214. S. Günter and H. Bunke. Self-organizing map for clustering in the graph domain. Pattern Recognition Letters, 23(4):405–417, 2002. 38

215. H.X. Zou, X.L. Xie, U. Linne, X.D. Zheng, and S.M. Li. Simultane- ous C7-and N1-prenylation of cyclo-l-Trp-l-Trp catalyzed by a prenyltransferase from Aspergillus oryzae. Organic & Biomolec- ular Chemistry, 8(13):3037–3044, 2010. ISSN 1477-0520. 3

216. S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994. 9

217. M.L. Connolly. Solvent-accessible surfaces of proteins and nucleic acids. Science, 221(4612):709–713, 1983. 27

218. A. Orengo and W.R. Taylor. SSAP: sequential structure align- ment program for protein structure comparison. Methods in Enzymology, 266:617–35, 1996. 18, 20, 128

219. T. Naumann and H. Matter. Structural classification of protein kinases using 3D molecular interaction field analysis of their li- gand binding sites: target family landscapes. Journal of Medic- inal Chemistry, 45(12):2366–2378, 2002. 7

220. L.G. Shapiro and R.M. Haralick. Structural descriptions and in- exact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3(5):504–519, 1981. 37

221. R.C. Wilson and E.R. Hancock. Structural matching by discrete relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):634–648, 1997. 37 REFERENCES T.C. Wood and W.R. Pearson. Evolution of protein sequences and structures. Journal of Molecular Biology, 291(4):977–995, 1999. 16

222. W.J. Christmas, J. Kittler, and M. Petrou. Structural match- ing in computer vision using probabilistic relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (8):749–764, 1995. 37

223. J. Zhu, H. Fan, H. Liu, and Y. Shi. Structure-based ligand design for flexible proteins: application of new F-DycoBlock. Jour- nal of Computer-aided Molecular Design, 15(11):979–996, 2001. ISSN 0920-654X. 6

224. M.A. Marti-Renom, E. Capriotti, I.N. Shindyalov, and P.E. Bourne. Structure comparison and alignment. In Structural Bioinformatics, chapter 16, pages 397–418. John Whiley & Sons, 2009. 8

225. M. Jost, G. Zocher, S. Tarcz, M. Matuschek, X. Xie, S.M. Li, and T. Stehle. Structure-function analysis of an enzymatic prenyl transfer reaction identifies a reaction chamber with modifiable specificity. Journal of the American Chemical Society, 132(50): 1788–1798, 2010. ISSN 0002-7863. 3

226. T. Fober, S. Glinca, G. Klebe, and E. Hüllermeier. Superposition and alignment of labeled point clouds. IEEE/ACM Transac- tions on Computational Biology and Bioinformatics, 99:in print, 2011. ISSN 1545-5963. 53

227. REFERENCES R.B. Russell, P.D. Sasieni, and M.J.E. Sternberg. Supersites within superfolds. Binding site similarity in the absence of homology. Journal of Molecular Biology, 282(4):903–918, 1998. ISSN 0022- 2836. 21

228. M.L. Verdonk, J.C. Cole, P. Watson, V. Gillet, and P. Willett. Su- perstar: improved knowledge-based interaction fields for pro- tein binding sites. Journal of Molecular Biology, 307(3):841– 859, 2001. 25

229. Ferre, G. Ausiello, A. Zanzoni, and M. Helmer-Citterich. SUR- FACE: a database of protein surface regions for functional an- notation. Nucleic Acids Research, 32(Suppl.1):240–244, 2004. 28

230. R.A. Laskowski. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. Journal of Molecular Graphics, 13(5):323–330, 1995. 24

231. J.M. Sasin, A. Godzik, and J.M. Bujnicki. SURF'S UP!—protein classification by surface comparisons. Journal of Biosciences, 32(1):97–100, 2007. 29

232. J.F. Gibrat, T. Madej, and S.H. Bryant. Surprising similarities in structure comparison. Current Opinion in Structural Biology, 6 (3):377–385, 1996. 18, 20

233. C. Notredame, D.G. Higgins, and J. Heringa. T-coffee: a novel method for fast and accurate multiple sequence alignments. Journal of Molecular Biology, 302(1):205–217, 2000. 2, 16

234. K.P. Peters, J. Fauck, and C. Frommel. The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. Journal of Molecular Biology, 256(1):201–214, 1996. 6, 24, 29, 67 B.J. Polacco and P.C. Babbitt. Automated discovery of 3D motifs for protein function annotation. Bioinformatics, 22(6):723–730, 2006. 21

235. C.T. Porter, G.J. Bartlett, and J.M. Thornton. The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Research, 32 (Database issue):129–133, 2004. 22, 171

236. M. Wagener and J. Gasteiger. The determination of maximum common substructures by a genetic algorithm: application in synthesis design and for the structural analysis of biological activity. Angewandte Chemie International Edition in English, 33(11):1189–1192, 1994. 35 A.C. Wallace, N. Borkakoti, and J.M. Thornton. TESS: a geomet- ric hashing algorithm for deriving 3D coordinate templates for searching structural databases. application to enzyme active sites. Protein Science, 6(11):2308–2323, 1997. 22

237. L. Holm and C. Sander. The FSSP database: fold classification based on structure-structure alignment of proteins. Nucleic Acids Research, 24(1):206–209, 1996. 18

238. R.C. Read and D.G. Corneil. The graph isomorphism disease. Journal of Graph Theory, 1(1):339–363, 1977. 33

239. J. Köbler, U. Schöning, and J. Torán. The Graph Isomorphism Problem: Its Structural Complexity. Birkhauser Verlag, 1994. 33

240. R. Regoes and S. Bonhoeffer. The HIV coreceptor switch: a population dynamical perspective. TRENDS in Microbiology, 13(6):269–277, 2005. 193

241. M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. The KEGG resource for deciphering the genome. Nucleic Acids Research, 32(90001):277–280, 2004. 9, 30 R.M. Karp. Reducibility among combinatorial problems. In M. Jünger, T.M. Liebling, G.L. Naddef, D. Nemhauser, W.R.

242. X. Liu, K. Fan, and W. Wang. The number of protein folds and their distribution over families in nature. Proteins, 54(3):491– 499, 2004. 3, 4, 17

243. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998, 1998. 9

244. R. Wang, X. Fang, Y. Lu, and S. Wang. The pdbbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. Journal of Medici- nal Chemistry, 47(12):2977–2980, 2004. 165

245. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J.A. Sigrist. The PROSITE database. Nucleic Acids Research, 34(Suppl. 1): D227–D230, 2006. ISSN 0305-1048. 28

246. H.M. Berman, J.D. Westbrook, M.J. Gabanyi, W. Tao, R. Shah, A. Kouranov, T. Schwede, K. Arnold, F. Kiefer, L. Bordoli, J. Kopp, M. Podvinec, P.D. Adams, L.G. Carter, W. Minor, R. Nair, and J. La Baer. The protein structure initiative structural genomics knowledgebase. Nucleic Acids Research, 37 (Suppl. 1):D365–D368, 2009. ISSN 0305-1048. 2

247. R.H. Lathrop. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineer- ing Design and Selection, 7(9):1059–1068, 1994. 18 A.T.R. Laurie and R.M. Jackson. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bio- informatics, 21(9):1908–1916, 2005. 25

248. R. Kondor and K.M. Borgwardt. The skew spectrum of graphs. In ICML'08: 25th International Conference on Machine Learning. Proceedings, pages 496–503, Helsinki, Finnland, July 2008. 41 R.I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures. In ICML'02: 19th International Conference on Machine Learning. Proceedings, pages 315–322, Sydney, Aus- tralia, July 2002. 40

249. M. Jambon, O. Andrieu, C. Combet, G. Deleage, F. Delfaud, and C. Geourjon. The SuMo server: 3D search for protein func- tional sites. Bioinformatics, 21(20):3929–3930, 2005. 22

250. UniProt-Consortium. The universal protein resource (UniProt) 2009. Nucleic Acids Research, 37:D169–d174, 2009. 15, 180

251. D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18(3):265–298, 2004. 9, 32

252. T. Madej, J.F. Gibrat, and S.H. Bryant. Threading a database of protein cores. Proteins, 23(3):356–369, 1995. 18, 20, 30

253. D. Fischer, H. Wolfson, S.L. Lin, and R. Nussinov. Three- dimensional, sequence order-independent structural compari- son of a serine protease against the crystallographic database reveals active site similarities: potential implications to evo- lution and to protein folding. Protein Science, 3(5):769–778, 1994. 22

254. M. Rarey, B. Kramer, and T. Lengauer. Time-efficient docking of flexible ligands into active sites of proteins. In ISMB'95: 3rd International Conference on Intelligence Systems in Molecu- lar Biology. Proceedings, volume 3, pages 300–308, Cambridge, UK, July 1995. 25 N.D. Rawlings, A.J. Barrett, and A. Bateman. Merops: the pepti- dase database. Nucleic Acids Research, 38(suppl 1):D227–D233, 2010. 163

255. Chen, ZL Ji, and Y.Z. Chen. TTD: therapeutic target database. Nucleic Acids Research, 30(1):412–415, 2002. ISSN 0305-1048. 6

256. F. Zhu, B.C. Han, P. Kumar, X.H. Liu, X.H. Ma, X.N. Wei, L. Huang, Y.F. Guo, L.Y. Han, C.J. Zheng, and Y. Chen. Up- date of TTD: therapeutic target database. Nucleic Acids Re- search, 38(Database issue):D787–D791, 2010. ISSN 0305-1048. 6

257. L. Gregory and J. Kittler. Using graph search techniques for con- textual colour retrieval. LNCS. Structural, Syntactic, and Sta- tistical Pattern Recognition, 2396/2002:193–213, 2002. 37

258. M. Gerstein and M. Levitt. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. In ISMB'96: 8th International Conference on In- telligent Systems for Molecular Biology. Proceedings, volume 4, pages 59–67, La Jolla, USA, August 1996. 18, 20

259. B.Y. Chen and B. Honig. VASP: a volumetric analysis of sur- face properties yields insights into protein-ligand binding speci- ficity. PLoS Computational Biololgy, 6(8):e1000881, 2010. 29 B.Y. Chen, V.Y. Fofanov, D.H. Bryant, B.D. Dodson, D.M. Kris- tensen, A.M. Lisewski, M. Kimmel, O. Lichtarge, and L.E. Kavraki. The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs. Journal of Computational Biology, 14(6):791–816, 2007. 23

260. K. Shearer, S. Venkatesh, and H. Bunke. Video sequence matching via decision tree path following. Pattern Recognition Letters, 22 (5):479–492, 2001. 35

261. R.C. Edgar. MUSCLE: multiple sequence alignment with high ac- curacy and high throughput. Nucleic Acids Research, 32(5): 1792–1797, 2004. 2, 16, 122

262. T. Kawabata and K. Nishikawa. Protein structure comparison us- ing the Markov transition model of evolution. Proteins, 41(1): 108–122, 2000. 19, 20

263. Levi. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo, 9(1972):341–352, 1973. 49

264. Knapp. Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics, 7(1):510–530, 2006. 19, 21

265. Xie and P.E. Bourne. A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics, 8(Suppl. 4):S9, 2007. ISSN 1471-2105. 28

266. M. Veeramalai, Y. Ye, and A. Godzik. TOPS++ FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model. BMC Bioinformatics, 9(1):358–370, 2008. 20

267. C. Chothia and A.M. Lesk. The relation between the divergence of sequence and structure in proteins. The EMBO journal, 5 (4):823–826, 1986. 16

268. Hark Gan, R.A. Perlow, S. Roy, J. Ko, M. Wu, J. Huang, S. Yan, A. Nicoletta, J. Vafai, D. Sun, L. Wang, J.E. Noah, S. Pasquali, and T. Schlick. Analysis of protein sequence/structure simi- larity relationships. Biophysical Journal, 83(5):2781–2791, 2002. 16

269. M. Westby, M. Lewis, J. Whitcomb, M. Youle, A.L. Pozniak, I.T. James, T.M. Jenkins, M. Perros, and E. Van Der Ryst. Emer- gence of CXCR4-using human immunodeficiency virus type 1 (HIV-1) variants in a minority of HIV-1-infected patients fol- lowing treatment with the CCR5 antagonist maraviroc is from a pretreatment CXCR4-using virus reservoir. Journal of Virol- ogy, 80(10):4909–4920, 2006. 194

270. R.A. Laskowski, N.M. Luscombe, M.B. Swindells, and J.M. Thorn- ton. Protein clefts in molecular recognition and function. Pro- tein Science, 5(12):2438–2452, 1996. 6, 24, 67

271. J. Liang, H. Edelsbrunner, and C. Woodward. Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Science, 7(9):1884– 1897, 1998b. ISSN 0961-8368. 25 C.A. Lipinski, F. Lombardo, B.W. Dominy, and P.J. Feeney. Ex- perimental and computational approaches to estimate solubil- ity and permeability in drug discovery and development set- tings. Advanced Drug Delivery Reviews, 23(1-3):3–25, 1997. ISSN 0169-409X. 7

272. Xie and P.E. Bourne. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments. PNAS, 105(14):5441–5446, 2008. 28, 30, 34

273. C. Huang, M. Tang, M.Y. Zhang, S. Majeed, E. Montabana, R.L. Stanfield, D.S. Dimitrov, B. Korber, J. Sodroski, I.A. Wilson, R. Wyatt, and P.D. Kwong. Structure of a v3-containing hiv-1 gp120 core. Science, 310(5750):1025–1028, 2005. 122

274. A. Guerler and E.W. Knapp. Novel protein folds and their nonse- quential structural analogs. Protein Science, 17(8):1374–1382, 2008. 18

275. B. Qian, S. Raman, R. Das, P. Bradley, A.J. McCoy, R.J. Read, and D. Baker. High-resolution structure prediction and the crystallographic phase problem. Nature, 450(7167):259–264, 2007. 2

276. Y.Y. Tseng, J. Dundas, and J. Liang. Predicting protein function and binding profile via matching of local evolutionary and ge- ometric surface patterns. Journal of Molecular Biology, 387(2): 451–464, 2009. 27

277. T.I. Zarembinski, L.W. Hung, H.J. Mueller-Dieckmann, K.K. Kim, H. Yokota, R. Kim, and S.H. Kim. Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. PNAS, 95(26):15189–15193, 1998. 17, 21

278. H.H. Guo, J. Choe, and L.A. Loeb. Protein tolerance to random amino acid change. PNAS, 101(25):9205, 2004. 2, 17

279. R. Nussinov and H.J. Wolfson. Efficient detection of three- dimensional structural motifs in biological macromolecules by computer vision techniques. PNAS, 88(23):10495–10499, 1991. 18, 22

280. I. Xenarios, L. Salwinski, X.J. Duan, P. Higney, S.M. Kim, and D. Eisenberg. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interac- tions. Nucleic Acids Research, 30(1):303–305, 2002. 30

281. J. Gough and C. Chothia. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, align- ments and genome assignments. Nucleic Acids Research, 30(1): 268–272, 2002. 18

282. S. Schmitt, D. Kuhn, and G. Klebe. A new method to detect related function among proteins independent of sequence and fold homology. Journal of Molecular Biology, 323(2):387–406, 2002. 7, 11, 25, 27, 30, 34, 49, 55, 57, 60, 62, 65, 116, 119, 122, 128, 163

283. I.K. McDonald and J.M. Thornton. Satisfying hydrogen bonding potential in proteins. Journal of Molecular Biology, 238(5):777– 793, 1994. ISSN 0022-2836. 54

284. M. Jambon, A. Imberty, G. Deleage, and C. Geourjon. A new bioinformatic approach to detect common 3D sites in protein structures. Proteins, 52(2):137–145, 2003. 22, 30