Publikationsserver der Universitätsbibliothek Marburg

Titel:Scientific Workflows for Metabolic Flux Analysis
Autor:Dalman, Tolga
Weitere Beteiligte: Freisleben, Bernd (Prof. Dr.)
Veröffentlicht:2017
URI:https://archiv.ub.uni-marburg.de/diss/z2017/0249
URN: urn:nbn:de:hebis:04-z2017-02499
DOI: https://doi.org/10.17192/z2017.0249
DDC:004 Informatik
Titel(trans.):Scientific Workflows für die metabolische Stoffflussanalyse
Publikationsdatum:2017-05-22
Lizenz:https://rightsstatements.org/vocab/InC-NC/1.0/

Dokument

Schlagwörter:
scientific workflows, Cloud Computing, web services, cloud computing, Arbeitsablaufplanung, Software, metabolic flux analysis, Systembiologie

Summary:
Metabolic engineering is a highly interdisciplinary research domain that interfaces biology, mathematics, computer science, and engineering. Metabolic flux analysis with carbon tracer experiments (13 C-MFA) is a particularly challenging metabolic engineering application that consists of several tightly interwoven building blocks such as modeling, simulation, and experimental design. While several general-purpose workflow solutions have emerged in recent years to support the realization of complex scientific applications, the transferability of these approaches are only partially applicable to 13C-MFA workflows. While problems in other research fields (e.g., bioinformatics) are primarily centered around scientific data processing, 13C-MFA workflows have more in common with business workflows. For instance, many bioinformatics workflows are designed to identify, compare, and annotate genomic sequences by "pipelining" them through standard tools like BLAST. Typically, the next workflow task in the pipeline can be automatically determined by the outcome of the previous step. Five computational challenges have been identified in the endeavor of conducting 13 C-MFA studies: organization of heterogeneous data, standardization of processes and the unification of tools and data, interactive workflow steering, distributed computing, and service orientation. The outcome of this thesis is a scientific workflow framework (SWF) that is custom-tailored for the specific requirements of 13 C-MFA applications. The proposed approach – namely, designing the SWF as a collection of loosely-coupled modules that are glued together with web services – alleviates the realization of 13C-MFA workflows by offering several features. By design, existing tools are integrated into the SWF using web service interfaces and foreign programming language bindings (e.g., Java or Python). Although the attributes "easy-to-use" and "general-purpose" are rarely associated with distributed computing software, the presented use cases show that the proposed Hadoop MapReduce framework eases the deployment of computationally demanding simulations on cloud and cluster computing resources. An important building block for allowing interactive researcher-driven workflows is the ability to track all data that is needed to understand and reproduce a workflow. The standardization of 13 C-MFA studies using a folder structure template and the corresponding services and web interfaces improves the exchange of information for a group of researchers. Finally, several auxiliary tools are developed in the course of this work to complement the SWF modules, i.e., ranging from simple helper scripts to visualization or data conversion programs. This solution distinguishes itself from other scientific workflow approaches by offering a system of loosely-coupled components that are flexibly arranged to match the typical requirements in the metabolic engineering domain. Being a modern and service-oriented software framework, new applications are easily composed by reusing existing components.

Zusammenfassung:
Metabolic Engineering ist eine hochgradig interdisziplinäre Wissenschaftsdomäne, welche Biologie, Mathematik, Informatik und Ingenieurswissenschaften miteinander verknüpft. Metabolische Stoffflussanalyse mit 13 C markierten Isotopen (13 C-SFA) ist eine besonders herausfordernde Metabolic Engineering Anwendung, die aus vielen miteinander eng verwobenen Bausteinen besteht, wie etwa Modellierung, Simulation und Versuchsplanung. Obwohl eine Vielzahl universeller Workflow Lösungen zur Realisierung komplexer wissenschaftlicher Anwendungen in den vergangenen Jahren entwickelt wurden, ist die Übertragung dieser Ansätze auf 13C-SFA Workflows nur teilweise möglich. Während Probleme in anderen Wissenschaftszweigen (wie etwa der Bioinformatik) vornehmlich mit Datenprozessierung zu tun haben, sind 13C-SFA Workflows eher mit Business Workflows vergleichbar. Beispielsweise sind viele Bioinformatik Workflows derart gestaltet, dass Genomsequenzen mittels "pipelining" durch Standardwerkzeuge wie BLAST identifiziert, verglichen und annotiert werden. Typischerweise kann der nächste Workflow Schritt in der "pipeline" automatisch durch das Ergebnis des vorangegangenen Schrittes ermittelt werden. Fünf rechenbetonte Herausforderungen wurden im Bemühen um 13C-SFA Studien durchzuführen identifiziert: Organisation heterogener Daten, Standardisierung von Prozessen sowie die Vereinheitlichung von Werkzeugen und Daten, interaktive Workflow Steuerung, verteiltes Rechnen und Service Orientierung. Das Ergebnis dieser Dissertation ist ein Scientific Workflow Framework (SWF), das auf die spezifischen Anforderungen von 13C-SFA Anwendungen zugeschnitten ist. Der hier präsentierte Ansatz – nämlich das SWF als eine Sammlung von miteinander lose gekoppelten Modulen zu gestalten, die mittels Web Services miteinander interagieren – erleichtert mit einigen Besonderheiten die Umsetzung von 13C-SFA Workflows. Bestehende Werkzeuge sind in das SWF durch Web Service Schnittstellen sowie Programmiersprachenanbindungen angebunden (z.B. an Java oder Python). Obwohl die Attribute "einfache Handhabung" und "Universalität" nur selten in Zusammenhang mit verteiltem Rechnen gebracht wird, zeigen die vorgestellten Anwendungsfälle, dass der Einsatz des vorgeschlagenen Hadoop MapReduce Frameworks die Umsetzung von rechenintensiven Simulationen auf Cloud und Cluster Computing Ressourcen vereinfacht. Ein wichtiger Baustein um interaktive, Wissenschaftler-affine Workflows zu ermöglichen ist die Fähigkeit, alle Daten zu beobachten, die notwendig sind um einen Workflow zu verstehen und zu reproduzieren. Die Standardisierung von 13 C-SFA Studien mittels einer Vorlage für eine Ordnerstruktur und den dazugehörigen Web Services und Schnittstellen verbessert den Austausch von Informationen mit anderen Wissenschaftlern. Schließlich wurden im Rahmen dieser Arbeit eine Vielzahl von Zusatzprogrammen entwickelt, welche die eigentlichen SWF Module komplementieren. Diese reichen von einfachen Hilfsskripten bis hin zu Visualisierungs- und Datenkonvertierungsprogrammen. Die in dieser Arbeit vorgestellte Lösung unterscheidet sich von anderen Scientific Workflow Ansätzen durch ein System von lose gekoppelten Komponenten, die flexibel angeordnet sind, um den typischen Anforderungen in der Metabolic Engineering Domäne gerecht zu werden. Die moderne Softwarearchitektur und Service-orientierung des SWF erleichtern die Entwicklung neuer Anwendungen durch das Zusammenstellen und die Wiederverwendung bereits existierender Komponenten.

Bibliographie / References

  1. Cao, B., B. Plale, G. Subramanian, E. Robertson, and Y. Simmhan (2009). “Provenance Information Model of Karma Version 3”. In: Proceedings of the 2009 Congress on Services - I. Washington, DC, USA: IEEE Computer Society, pp. 348-351. doi: 10.1109/SERVICES-I.2009.54.
  2. Zamboni, N., S.-M. Fendt, M. Rühl, and U. Sauer (2009). “13C-based metabolic flux analysis”. In: Nature Protocols 4 (6), pp. 878-92.
  3. Yang, T. H. (2013). “13C-based metabolic flux analysis: fundamentals and practice”. In: Systems Metabolic Engineering: Methods in Molecular Biology 985, pp. 297-334.
  4. “13CFLUX2 - high-performance software suite for 13C-metabolic flux analysis”. In: Bioinformatics 29 (1), pp. 143-145 T. Dalman, T. Dörnemann, E. Juhnke, M. Weitzel, K. Nöh, W. Wiechert, and B. Freisleben (2013). “Cloud MapReduce for Monte Carlo bootstrap applied to metabolic flux analysis”. In: Future Generation Computer Science 29 (2), pp. 582-590.
  5. Beste, D., K. Nöh, S. Niedenführ, T. Mendum, N. Hawkins, J. Ward, M. Beale, W. Wiechert, and J. McFadden (2013). “13C-flux spectral analysis of host-pathogen metabolism reveals a mixed diet for intracellular Mycobacterium tuberculosis”. In: Chemistry & biology 20, pp. 1012-1021.
  6. - (2001). “13C metabolic flux analysis”. In: Metababolic Engineering 3 (3), pp. 195-206.
  7. Beste, D., B. Bonde, N. Hawkins, J. Ward, M. Beale, S. Noack, K. Nöh, N. Kruger, R. Ratcliffe, and J. McFadden (2011). “13C metabolic flux analysis identifies an unusual route for pyruvate dissimilation in mycobacteria which requires isocitrate lyase and carbon dioxide fixation”. In: PLoS PATHOGENS 7, e1002091.
  8. Bente, D. A., J. Friesen, K. White, J. Koll, and G. P. Kobinger (2011). “A computerized data-capture system for animal biosafety level 4 laboratories”. In: Journal of the American Association for Laboratory Animal Science (JAALAS) 50 (5), pp. 660-664.
  9. Bowers, S. and B. Ludäscher (2005). “Actor-oriented design of scientific workflows”. In: Conceptual Modeling ER 2005. Vol. 3716. Lecture Notes in Computer Science (LNCS). Springer-Verlag Berlin, Heidelberg, pp. 369-384.
  10. Raue, A., C. Kreutz, T. Maiwald, U. Klingmüller, and J. Timmer (2011). “Addressing parameter identifiability by model-based experimentation”. In: IET Systems Biology 5 (2), pp. 120-130.
  11. Folk, M., A. Cheng, and K. Yates (1999). “HDF5: A file format and I/O library for high performance computing applications”. In: Proceedings of Supercomputing. Vol. 99.
  12. Dalman, T., M. Weitzel, B. Freisleben, W. Wiechert, and K. Nöh (2011). “A hybrid parallelization approach for cloud-enabled metabolic flux analysis simulation workflows”. In: Proceedings of 4th GRID4TS Workshop, pp. 30-31.
  13. Cieslik, M. and C. Mura (2011). “A lightweight, flow-based toolkit for parallel and distributed bioinformatics piplines”. In: BMC Bioinformatics 12 (1), pp. 61+.
  14. Palankar, M. R., A. Iamnitchi, M. Ripeanu, and S. Garfinkel (2008). “Amazon S3 for science grids: a viable solution?” In: DADC '08: Proceedings of the 2008 International Workshop on Data-aware Distributed Computing. Boston, MA, USA: ACM, pp. 55-64.
  15. Niedenführ, S. (2014). “Analyzing the fluxome of P. chrysogenum in an industrial environment - workflows for 13C metabolic flux analysis in complex systems”. PhD thesis. RWTH Aachen, 293 p.
  16. Cortassa, S., M. A. Aon, A. A. Iglesias, J. C. Aon, and D. Lloyd (2012). An introduction to metabolic and cellular engineering. 2nd ed. World Scientific Publishing Company, Incorporated.
  17. Pacheco, P. S. (2011). An introduction to parallel programming. Morgan Kaufman.
  18. Efron, B. and R. J. Tibshirani (1993). An introduction to the bootstrap. Chapman & Hall/CRC.
  19. Dalman, T., M. Weitzel, W. Wiechert, B. Freisleben, and K. Nöh (2011). “An online provenance service for workflows for distributed metabolic flux analysis”. In: Proceedings of IEEE 9th European Conference on Web Services (ECOWS). IEEE Press, pp. 91-98.
  20. Bowen, R. and K. Coar (2008). Apache cookbook. 2nd. O'Reilly & Associates, Inc.
  21. Joshi, S. B. (2012). “Apache Hadoop performance-tuning methodologies and best practices”. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering. ICPE '12. New York, NY, USA: ACM, pp. 241-242.
  22. Kruchten, P. (1995). “Architectural blueprints: the 4+1 view model of architecture”. In: IEEE Software 12 (6), pp. 42-50.
  23. See http://www.omixT. Dalman, W. Wiechert, and K. Nöh (2016). “A scientific workflow framework for 13C metabolic flux analysis”. In: Journal of Biotechnology 232. Bioinformatics for Biotechnology and Biomedicine, pp. 12-24
  24. Wiechert, W., M. Möllney, S. Petersen, and A. A. de Graaf (2001). “A universal framework for 13C metabolic flux analysis”. In: Metabolic Engineering 3 (3), pp. 265-283.
  25. Romano, P. (2008). “Automation of in-silico data analysis processes through workflow management systems”. In: Briefings in Bioinformatics 9 (1), pp. 57-68.
  26. Altschul, S., W. Gish, W. Miller, E. Myers, and D. Lipman (1990). “Basic local alignment search tool”. In: Journal of Molecular Biology 215, pp. 403-410.
  27. Scott, S. L., A. W. Blocker, and F. V. Bonassi (2016). “Bayes and big data: the consensus Monte Carlo algorithm”. In: International Journal of Management Science and Engineering Management (to appear).
  28. Hohmann, L. (2003). Beyond software architecture: creating and sustaining winnig solutions. 1st ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
  29. Möllney, M., W. Wiechert, D. Kownatzki, and A. A. de Graaf (1999). “Bidirectional reaction steps in metabolic networks: IV. optimal design of isotopomer labeling experiments”. In: Biotechnology and Bioengineering 66 (2), pp. 86-103.
  30. Hope, J. (2008). Biobazaar: the open source revolution and biotechnology. Harvard University Press.
  31. Lamprecht, A.-L., T. Margaria, and B. Steffen (2009). “Bio-jETI: a framework for semantics-based service composition”. In: BMC Bioinformatics 10 Suppl 10, S8.
  32. Eronen, L. and H. Toivonen (2012). “Biomine: predicting links between biological entities using network models of heterogeneous databases”. In: BMC Bioinformatics 13 (1), pp. 1-21.
  33. Cock, P. J., T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, and M. J. de Hoon (2009). “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. In: Bioinformatics (Oxford, England) 25 (11), pp. 1422-1423.
  34. Chernick, M. R. (2007). Bootstrap methods: a guide for practitioners and researchers. 2nd ed. Wiley-Interscience.
  35. Newman, S. (2015). Building microservices - designing fine-grained systems. O'Reilly Media.
  36. Curbera, F., Y. N. Doganata, A. Martens, N. Mukhi, and A. Slominski (2008). “Business Provenance - A Technology to Increase Traceability of End-to-End Operations.” In: OTM Conferences (1)'08, pp. 100-119.
  37. Miebach, S. (2012). “Charakterisierung und Validierung der 13C-Stoffflussanalyse im Parallelansatz (in german)”. PhD thesis. Bielefeld, University.
  38. Matsunaga, A., M. Tsugawa, and J. Fortes (2008). “CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications”. In: ESCIENCE '08: Proceedings of the 2008 4th IEEE International Conference on eScience. Washington, DC, USA: IEEE Computer Society, pp. 222-229.
  39. Buyya, R., J. Broberg, and A. Gościński (2011). Cloud computing: principles and paradigms. Wiley Series on Parallel and Distributed Computing. John Wiley & Sons.
  40. - (2013). “Cloud MapReduce for Monte Carlo bootstrap applied to metabolic flux analysis”. In: Future Generation Computer Science 29 (2), pp. 582-590.
  41. Larkin, M., G. Blackshields, N. Brown, R. Chenna, P. McGettigan, H. McWilliam, F. Valentin, I. Wallace, A. Wilm, R. Lopez, J. Thompson, T. Gibson, and D. Higgins (2007). “Clustal W and Clustal X version 2.0”. In: Bioinformatics 23 (21), pp. 2947- 2948.
  42. Dörnemann, T., M. Smith, and B. Freisleben (2008). “Composition and execution of secure workflows in WSRF-grids”. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid '08). IEEE Press, pp. 122-129.
  43. Giorgino, T. (2009). “Computing and visualizing dynamic time warping alignments in R: the dtw package”. In: Journal of Statistical Software 31 (7), pp. 1-24.
  44. Matsuoka, Y., S. Ghosh, and H. Kitano (2009). “Consistent design schematics for biological systems: standardization of representation in biological engineering”. In: J. R. Soc. Interface 6 Suppl. 4, pp. 393-404.
  45. Ravikirthi, P., P. F. Suthers, and C. D. Maranas (2011). “Construction of an E. coli genome-scale atom mapping model for MFA calculations”. In: Biotechnology and Bioengineering 108 (6), pp. 1372-1382.
  46. Ferguson, N., B. Schneier, and T. Kohno (2010). Cryptography engineering - design principles and practical applications. Wiley.
  47. Droste, P. (2011). “Customizable visualization in the context of metabolic networks”. PhD thesis. Siegen, University.
  48. Droste, P., E. von Lieres, W. Wiechert, and K. Nöh (2010). “Customizable visualization on demand for hierarchically organized information in biochemical networks”. In: Computational Modeling of Objects Represented in Images. Ed. by R. P. Barneva, V. E. Brimkov, H. A. Hauptman, R. M. N. Jorge, and J. M. R. S. Tavares. Vol. 6026. Lecture Notes in Computer Science. Springer-Verlag Berlin, Heidelberg, pp. 163-174.
  49. Krishnan, S., L. Clementi, J. Ren, P. Papadopoulos, and W. Li (2009). “Design and evaluation of Opal2: a toolkit for scientific software as a service”. In: IEEE Congress on Services.
  50. Johnson, R. E. and B. Foote (1988). “Designing reusable classes”. In: Object-Oriented Programming 1 (2).
  51. Schuster, S., T. Dandekar, and D. Fell (1999). “Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering”. In: Trends Biotechnology 17 (2), pp. 53-60.
  52. Antoniewicz, M. R., J. K. Kelleher, and G. N. Stephanopoulos (2006). “Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements”. In: Metabolic Engineering, pp. 324-337.
  53. - (2007). “Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions”. In: Metabolic Engineering 9 (1), pp. 68-86.
  54. Hohpe, G. and B. Woolf (2003). Enterprise integration patterns: designing, building, and deploying messaging solutions. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
  55. Rubinger, A. L. and B. Burke (2010). Enterprise JavaBeans 3.1. 6th. O'Reilly Media.
  56. Runkel, J. (2009). “Entwurf und Implementierung einer Datenbank- und SicherheitsMiddleware für ein Scientific Workflow System (in german)”. Diploma Thesis. Univerität Siegen, Fachbereich Elektrotechnik & Informatik.
  57. Akram, A., D. Meredith, and R. Allan (2006). “Evaluation of BPEL to scientific workflows”. In: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06). Vol. 1. IEEE, pp. 269-274.
  58. Schwender, J. (2011). “Experimental flux measurements on a network scale”. In: Front in Plant Science 2 (63).
  59. Joshi, M., A. Seidel-Morgenstern, and A. Kremling (2006). “Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems”. In: Metabolic Engineering 8 (5), pp. 447-455.
  60. Missier, P., N. Paton, and K. Belhajjame (2010). “Fine-grained and efficient lineage querying of collection-based workflow provenance”. In: Procs. EDBT. Lausanne, Switzerland.
  61. Ebert, B. E., A.-L. Lamprecht, B. Steffen, and L. M. Blank (2012). “Flux-P: automating metabolic flux analysis”. In: Metabolites 2 (4), pp. 872-890.
  62. Goecks, J., A. Nekrutenko, J. Taylor, and T. G. Team (2010). “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences”. In: Genome Biology 11 (8), R86.
  63. Milstein, S., J. Biersdorfer, and M. MacDonald (2006). Google: the missing manual. 2nd. Missing manual. Sebastopol, CA, USA: O'Reilly & Associates, Inc.
  64. Sammer, E. (2012). Hadoop operations. O'Reilly Media.
  65. White, T. (2009). Hadoop: the definitive guide. 1st ed. O'Reilly Media.
  66. Weitzel, M. (2009). “High performance algorithms for metabolic flux analysis”. PhD thesis. University of Siegen, Germany.
  67. Niedenführ, S., W. Wiechert, and K. Nöh (2015). “How to measure metabolic flux: a taxonomic guide for 13C fluxomics”. In: Current opinion in biotechnology 34, pp. 82- 90.
  68. Zaharia, M., A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica (2008). “Improving MapReduce performance in heterogeneous environments”. In: OSDI. Ed. by R. Draves and R. van Renesse. USENIX Association, pp. 29-42.
  69. In: Proceedings of 4th GRID4TS Workshop, pp. 30-31 M. Weitzel, K. Nöh, T. Dalman, S. Niedenführ, B. Stute, and W. Wiechert (2013).
  70. Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein (2009). Introduction to Algorithms. 3rd. The MIT Press.
  71. Kumar, V., A. Grama, A. Gupta, and G. Karypis (2003). Introduction to parallel computing. 2nd ed. Addison-Wesley.
  72. Petersen, S. (2001). “Investigating the in vivo activity of anaplerotic pathways in Corynebacterium glutamicum with 13C-tracer technology (in german)”. Institut für Biotechnologie, Berichte des Forschungszentrums Jülich, 3875. PhD thesis. University of Düsseldorf.
  73. Jamae, J. and P. Johnson (2009). JBoss in action: configuring the JBoss application server. Manning Publications Co.
  74. Martín-Requena, V., J. Ríos, M. García, S. Ramírez, and O. Trelles (2010). “jORCA: easily integrating bioinformatics web services”. In: Bioinformatics 26 (4), pp. 553-559.
  75. Berthold, M. R., N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel (2009). “KNIME - the Konstanz information miner: version 2.0 and beyond”. In: SIGKDD Explor. Newsl. 11 (1), pp. 26-31.
  76. Kanehisa, M. and S. Goto (2000). “KEGG: Kyoto encyclopedia of genes and genomes”. In: Nucleic acids research 28 (1), pp. 27-30.
  77. Juhnke, E., D. Seiler, T. Stadelmann, T. Dörnemann, and B. Freisleben (2009). “LCDL: an extensible framework for wrapping legacy code”. In: Proceedings of ERPAS'2009, pp. 646-650.
  78. Ibrahim, S., H. Jin, L. Lu, S. Wu, B. He, and L. Qi (2010). “LEEN: locality/fairness-aware key partitioning for MapReduce in the cloud”. In: CloudCom. IEEE, pp. 17-24.
  79. Love, R. (2013). Linux system programming: talking directly to the kernel and C library. 2nd ed. O'Reilly Media.
  80. Hart, W. E. (2011). Managing scientific workflows in Python with pyutilib.workflow. url: https : / / software . sandia . gov / trac / pyutilib / export / 2215 / pyutilib . workflow/trunk/doc/workflow/workflow.pdf.
  81. Miner, D. and A. Shook (2012). MapReduce design patterns. O'Reilly Media.
  82. Dean, J. and S. Ghemawat (2004). “MapReduce: simplified data processing on large clusters”. In: OSDI'04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation. San Francisco, CA: USENIX Association, pp. 107- 113.
  83. Haugwitz, M. von (2016). “Mass spectrometric data processing for metabolomics and fluxomics - a flexible evaluation framework with quality awareness”. RWTH Aachen, Diss., 2015. Dr. RWTH Aachen, 291 p.
  84. Gross, J. H. (2011). Mass spectrometry - a textbook. 2nd ed. Springer-Verlag Berlin Heidelberg.
  85. Neuweger, H., S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, and A. Goesmann (2008). “MeltDB: a software platform for the analysis and integration of metabolomics experiment data”. In: Bioinformatics 24 (23), pp. 2726-2732.
  86. Dalman, T., T. Dörnemann, E. Juhnke, M. Weitzel, K. Nöh, W. Wiechert, and B. Freisleben (2010). “Metabolic flux analysis in the cloud”. In: Proceedings of IEEE 6th International Conference on e-Science, pp. 57-64.
  87. Fuhrmann, S. (2010). Metabolische Datenbanken in der Stoffflussanalyse: Entwurf und Implementierung (in german). VDM Publishing.
  88. Wiechert, W. (1996). Metabolische Kohlenstoff-Markierungssysteme: Modellierung, Simulation, Analyse, Datenauswertung (in german). Vol. 3301. Zentralbibliothek Forschungszentrum Jülich GmbH.
  89. Novère, N. L., A. Finney, M. Hucka, U. S. Bhalla, F. Campagne, J. Collado-Vides, E. J. Crampin, M. Halstead, E. Klipp, P. Mendes, P. Nielsen, H. Sauro, B. Shapiro, J. L. Snoep, H. D. Spence, and B. L. Wanner (2005). “Minimum information requested in the annotation of biochemical models (MIRIAM)”. In: Nature Biotechnology 23 (12), pp. 1509-1515.
  90. Aalst, W. van der and C. Stahl (2011). Modeling business processes: a Petri net-oriented approach. Cooperative information systems. MIT Press.
  91. Pratx, G. and L. Xing (2011). “Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce”. In: J. Biomed. Opt. 16 (12).
  92. - (2014). Next generation SOA: a concise introduction to service technology & serviceorientation. Prentice Hall.
  93. Edlich, S., A. Friedland, J. Hampe, and B. Brauer (2010). NoSQL - Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken (in german). Hanser.
  94. 3 Meanwhile, OmixTM has become a commercially maintained project. visualization.com/ for more information.
  95. Dörnemann, T., E. Juhnke, and B. Freisleben (2009). “On-demand resource provisioning for BPEL workflows using Amazon's elastic compute cloud”. In: Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid '09). IEEE Press, pp. 140-147.
  96. Quek, L. E., C. Wittmann, L. Nielsen, and J. Kromer (2009). “OpenFLUX: efficient modelling software for 13C-based metabolic flux analysis”. In: Microbial Cell Factories 8 (1), pp. 25+.
  97. Peterson, J. L. and A. Silberschatz (1985). Operating system concepts. 2nd. AddisonWesley.
  98. Atkinson, A. C. and A. N. Donev (1992). Optimum experimental designs. Vol. 8. Oxford Statistical Science Series. Clarendon Press, Oxford.
  99. Fowler, M. (2002). Patterns of enterprise application architecture. Boston, MA, USA: Addison-Wesley Longman Publishing, Inc.
  100. Deelman, E., G. Singh, M.-h. Su, J. Blythe, A. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz (2005). “Pegasus: a framework for mapping complex scientific workflows onto distributed systems”. In: Scientific Programming Journal 13, pp. 219-237.
  101. Bissyandé, T. F., F. Thung, D. Lo, L. Jiang, and L. Réveillère (2013). “Popularity, interoperability, and impact of programming languages in 100,000 open source projects”. In: 37th Annual International Computer Software & Applications Conference (COMPSAC 2013), Kyoto. IEEE, pp. 303-312.
  102. Obe, R. O. and L. S. Hsu (2014). Postgresql: up and running: a practical introduction to the advanced open source database. O'Reilly Media.
  103. Chopra, V., S. Li, and J. Genender (2007). Professional Apache Tomcat 6. Wrox Professional Guides. Wiley.
  104. Churches, D., G. Gombás, A. Harrison, J. Maassen, C. R. M. Shields, I. Taylor, and I. Wang (2006). “Programming scientific and distributed workflow with Triana services”. In: Concurrency and Computation: Practice and Experience 18 (10), pp. 1021-1037.
  105. Davidson, S. B. and J. Freire (2008). “Provenance and scientific workflows: challenges and opportunities”. In: Proceedings of ACM SIGMOD, pp. 1345-1350.
  106. Altintas, I., O. Barney, and E. Jaeger-Frank (2006). “Provenance collection support in the Kepler scientific workflow system”. In: Provenance and Annotation of Data: International Provenance and Annotation Workshop (IPAW 2006), Revised Selected Papers. Vol. 4145. Lecture Notes in Computer Science (LNCS). Springer-Verlag Berlin, Heidelberg, pp. 118-132.
  107. Crown, S. B. and M. R. Antoniewicz (2013). “Publishing 13C metabolic flux analysis studies: a review and future perspectives”. In: Metab Eng. 20, pp. 42-48.
  108. Collette, A. (2013). Python and HDF5. O'Reilly Media.
  109. Goyvaerts, J. and S. Levithan (2012). Regular expressions cookbook. O'Reilly Media.
  110. Pitkänen, E., A. Åkerlund, A. Rantanen, P. Jouhten, and E. Ukkonen (2008). “ReMatch: a web-based tool to construct, store and share stoichiometric metabolic models with carbon maps for metabolic flux analysis”. In: J. Integrative Bioinformatics 5 (2).
  111. Shoshani, A. and D. Rotem (2009). Scientific data management: challenges, technology, and deployment. Chapman & Hall/CRC. N. L. Novère (2011). “Minimum information about a simulation experiment (MIASE)”. In: PLoS Comput Biol 7 (4), e1001122.
  112. Barker, A. and J. V. Hemert (2008). “Scientific workflow: a survey and research directions”. In: Proceedings of the 7th International Conference on Parallel Processing and Applied Mathematics (PPAM 2007). Vol. 4967. Lecture Notes in Computer Science (LNCS). Springer-Verlag Berlin, Heidelberg, pp. 746-753.
  113. Ludäscher, B., I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao (2006). “Scientific workflow management and the Kepler system”. In: Concurrency and Computation: Practice and Experience 18 (10), pp. 1039-1065.
  114. Curcin, V. and M. Ghanem (2008). “Scientific workflow systems - can one size fit all?” In: Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International, pp. 1-9.
  115. Edelhofer, T. (2014). Selecting a MATLAB application deployment strategy. url: https: / / www . mathworks . com / company / newsletters / articles / selecting - a - matlab - application-deployment-strategy.html.
  116. Erl, T. (2004). Service-Oriented Architecture: a field guide to integrating XML and web services. Prentice Hall PTR.
  117. Dalman, T., E. Juhnke, T. Dörnemann, M. Weitzel, K. Nöh, W. Wiechert, and B. Freisleben (2010). “Service workflows and distributed computing methods for 13C metabolic flux analysis”. In: Proceedings of 7th EUROSIM Congress on Modelling and Simulation, pp. 1-7.
  118. Josuttis, N. M. (2007). SOA in practice: the art of distributed system design. O'Reilly Media.
  119. Senger, M., P. Rice, A. Bleasby, T. Oinn, and M. Uludag (2008). “Soaplab2: more reliable Sesame door to bioinformatics programs”. In: 9th annual Bioinformatics Open Source Conference.
  120. Hick, J. and J. Shalf (2009). “Storage Technology”. In: Scientific data management. Ed. by A. Shoshani and D. Rotem. Computational Science Series. Chapman & Hall. Chap. 1.
  121. Dörnemann, T. (2013). “Supporting quality of service in scientific workflows”. PhD thesis. Marburg, University.
  122. Kitano, H. (2002). “Systems Biology: brief overview”. In: Science 295 (5560), pp. 1662- 1664.
  123. Palsson, B. (2011). Systems Biology: simulation of dynamic network states. Cambridge University Press.
  124. Hull, D., K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn (2006). “Taverna: a tool for building and running workflows of services”. In: Nucleic Acids Research 34, pp. 729-732.
  125. Oinn, T., P. Li, D. B. Kell, C. Goble, A. Goderis, M. Greenwood, D. Hull, R. Stevens, D. Turi, and J. Zhao (2006). “Taverna/myGrid: aligning a workflow system with the life sciences community”. In: Workflows for e-Science: scientific workflows for grids. Springer-Verlag New York, Inc. Chap. 19.
  126. Anand, M. K., S. Bowers, and B. Ludäscher (2010). “Techniques for efficiently querying scientific workflow provenance graphs”. In: Proceedings of the 13th International Conference on Extending Database Technology (EDBT '10). ACM, pp. 287-298.
  127. Raymond, E. S. (1999). The cathedral & the bazaar. 1st ed. Sebastopol, CA, USA: O'Reilly & Associates, Inc.
  128. Jin, H., S. Ibrahim, L. Qu, H. Cao, S. Wu, and X. Shi (2011). “The MapReduce programming model and implementations”. In: Cloud computing: principles and paradigms. John Wiley & Sons. Chap. 14.
  129. Mell, P. and T. Grance (2011). The NIST definition of cloud computing (NIST special publication 800-145). National Institue of Standards and Technology, Computer Security Division.
  130. Moreau, L., B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche (2011). “The Open Provenance Model core specification (v1.1)”. In: Future Generation Computer Systems 27 (6), pp. 743-756.
  131. Jiang, D., B. C. Ooi, L. Shi, and S. Wu (2010). “The performance of MapReduce: an in-depth study”. In: Proc. VLDB Endow. 3 (1-2), pp. 472-483.
  132. Moreau, L., P. Groth, S. Miles, J. Vazquez-Salceda, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, and L. Varga (2008). “The provenance of electronic data”. In: Commun. ACM 51 (4), pp. 52-58.
  133. Hucka, M., A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, H. Kitano, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish-Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel, V. Gor, I. I. Goryanin, W. J. Hedley, T. C. Hodgman, J.-H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger, A. Kremling, U. Kummer, N. L. Novère, L. M. Loew, D. Lucio, P. Mendes, E. Minch, E. D. Mjolsness, Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada, J. C. Schaff, B. E. Shapiro, T. S. Shimizu, H. D. Spence, J. Stelling, K. Takahashi, M. Tomita, J. Wagner, and J. Wang (2003). “The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models”. In: Bioinformatics 9 (4), pp. 524-531.
  134. Mulder, S. and Z. Yaar (2006). The user is always right: a practical guide to creating and using personas for the web. First. Thousand Oaks, CA, USA: New Riders Publishing.
  135. Sears, R., C. V. Ingen, and J. Gray (2006). To BLOB or not to BLOB: large object storage in a database or a filesystem. Tech. rep. MSR-TR-2006-45. Microsoft Research, p. 10.
  136. - (2003). UML distilled: a brief guide to the standard object modeling language. 3rd ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
  137. Sehgal, S., M. Erdelyi, A. Merzky, and S. Jha (2011). “Understanding application-level interoperability: scaling-out MapReduce over high-performance grids and clouds”. In: Future Generation Computer Systems 27 (5), pp. 590-599.
  138. Lamprecht, A.-L. (2013). User-level workflow design - a bioinformatics perspective. Vol. 8311. Lecture Notes in Computer Science (LNCS). Springer.
  139. Collins-Sussman, B., B. W. Fitzpatrick, and C. M. Pilato (2008). Version control with Subversion. 2nd. O'Reilly Media.
  140. Nöh, K., P. Droste, and W. Wiechert (2015). “Visual workflows for 13C-metabolic flux analysis”. In: Bioinformatics 31 (3).
  141. Banks, T. (2006). Web services resource framework (WSRF) - primer v1.2. url: http: //docs.oasis-open.org/wsrf/wsrf-primer-1.2-primer-cd-02.pdf.
  142. Aalst, W. M. P. van der and K. M. van Hee (2002). Workflow management: models, methods, and systems. MIT Press.
  143. Deelman, E., D. Gannon, M. Shields, and I. Taylor (2009). “Workflows and e-science: an overview of workflow system features and capabilities”. In: Future Generation Computer Systems (FCGS) 25 (5), pp. 528-540.
  144. Barseghian, D., I. Altintas, M. B. Jones, D. Crawl, N. Potter, J. Gallagher, P. Cornillon, M. Schildhauer, E. T. Borer, E. W. Seabloom, and P. R. Hosseini (2010). “Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis”. In: Ecological Informatics 5 (1), pp. 42-50.
  145. Gannon, D., E. Deelman, M. Shields, and I. Taylor (2006). “Introduction”. In: Workflows for e-Science: scientific workflows for grids. Springer-Verlag New York, Inc. Chap. 1.
  146. Dalman, T., P. Droste, M. Weitzel, W. Wiechert, and K. Nöh (2010). “Workflows for metabolic flux analysis: data integration and human interaction”. In: Proceedings of the 4th International Conference on Leveraging Applications of Formal Methods, Verification, and Validation (ISoLA). Vol. 6415. Lecture Notes in Computer Science (LNCS). Springer-Verlag Berlin, Heidelberg, pp. 261-275.
  147. Pelleg, D. and A. W. Moore (2000). “X-means: extending K-means with efficient estimation of the number of clusters”. In: ICML. Ed. by P. Langley. Morgan Kaufmann, pp. 727-734.
  148. Hintjens, P. (2013). ZeroMQ: messaging for many applications. O'Reilly Media.
  149. Evans, B. (2014). What every Java developer needs to know about Java 9. O'Reilly Radar (online); http://radar.oreilly.com/2014/09/what-every-java-developer-needs-toknow-about-java-9.html.
  150. Hudson, G. (2002). Notes on keeping version histories of files. http://web.mit.edu/ ghudson/thoughts/file-versioning; last accessed: 2016-02-07.
  151. Ortel, J., J. Noehr, and N. V. Gheem (last accessed: Mar 2016). SUDS library web page. https://fedorahosted.org/suds/.
  152. Andrews, T., F. Curbera, H. Dholakia, Y. Goland, J. Klein, F. Leymann, K. Liu, D. Roller, D. Smith, S. Thatte, I. Trickovic, and S. Weerawarana (2003). Business process execution language for web services version 1.1. url: https://www.oasis-open.org/ committees/download.php/2046/BPEL%20V1-1%20May%205%202003%20Final.pdf.


* Das Dokument ist im Internet frei zugänglich - Hinweise zu den Nutzungsrechten