Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Luis Ferreira, Fabiano Lucchese Tomoari Yasuda, Chin Yau Lee Carlos Alexandre Queiroz Elton Minetto, Antonio Mungioli
ibm.com/redbooks
International Technical Support Organization Grid Computing in Research and Education April 2005
SG24-6649-00
Note: Before using this information and the product it supports, read the information in Notices on page xiii.
First Edition (April 2005) This edition applies to the capability of the IBM, ISVs, and open source products used to build a grid computing solution.
Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Part 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to grid concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Beginning of the grid concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Research and education on grid context. . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Why use grids in research and education? . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Leveraging research activities with grids . . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 Leveraging educational activities with grids . . . . . . . . . . . . . . . . . . . 10 1.3 What will the future bring? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 What exists today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 What is the potential for grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.3 What is likely to happen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2. How to implement a grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 The main difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Basic requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Human-resource requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Setting up grid environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Defining the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Hardware setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Software setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Setting up grid applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
2.4.1 Deploying an application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Making application data available . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Maintaining grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.1 Grid platform administration tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Grid application administration tasks . . . . . . . . . . . . . . . . . . . . . . . . 28 Part 2. Grid by examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 3. Introducing the examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 What you will find in these chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 4. Scientific simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 44 4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter 5. Medical images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 54 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Chapter 6. Computer-Aided Drug Discovery . . . . . . . . . . . . . . . . . . . . . . . 57 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
iv
6.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 63 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 7. Big Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 74 7.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Chapter 8. e-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 89 8.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 9. Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents
9.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . 101 9.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 10. Microprocessor design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.1.2 Business needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . 108 10.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 10.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Part 3. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Appendix A. TeraGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Beneficiaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 How to join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Appendix B. Research oriented grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Business requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 High level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Products used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
vi
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Contents
vii
viii
Figures
1-1 2-1 4-1 4-2 5-1 5-2 6-1 6-2 7-1 7-2 7-3 8-1 8-2 8-3 9-1 9-2 9-3 10-1 10-2 A-1 A-2 A-3 B-1 B-2 B-3 B-4 B-5 B-6 B-7 B-8 Heterogeneous and independent computing resources . . . . . . . . . . . . . 5 How a grid should expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Software component architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 e-learning framework schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Software components architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A user s point of view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Diagram model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 TeraGrid overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Layers diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Typical connection between sites and the TeraGrid backplane. . . . . . 119 Virtual environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Virtualization organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 High level component diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Globus Toolkit and meta-scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Submitting a job through Community Scheduler Framework. . . . . . . . 127 Job sequencer and gridport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Overall architecture diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Workflow of a research working on the grid . . . . . . . . . . . . . . . . . . . . . 130
ix
Tables
1-1 3-1 4-1 4-2 5-1 6-1 7-1 8-1 9-1 10-1 Types of grid that drive the grid solution for each area (shaded cells) . . 9 Examples of grid computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 54 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 63 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 74 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 89 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . 101 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
xi
xii
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
xiii
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AFS AIX DB2 DFS Eserver Eserver eServer ibm.com IBM Lotus OS/2 OS/390 POWER4 pSeries Redbooks Redbooks (logo) TCS Tivoli WebSphere xSeries zSeries
The following terms are trademarks of other companies: Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others.
xiv
Preface
This IBM Redbook, Grid Computing in Research and Education, belongs to a series of documents related to grid computing that IBM is presenting to the community to enrich the IT industry and all its players: customers, industry leaders, emerging enterprises, universities, and producers of technology. The book is mainly oriented to IT architects or those who have the responsibility of analyzing the capabilities to build in a grid solution. The book is organized into the following parts.
xv
e-Learning on page 79
Presents a network grid implementation supporting an e-learning infrastructure that embraces many of the requirements for exchanging information in the educational and research fields.
Visualization on page 93
Presents a grid implementation to support the field of advanced scientific visualization.
xvi
Fabiano Lucchese is the business director of Sparsi Computing in Grid (http://www.sparsi.com) and works as a grid computing consultant in a number of nation-wide projects. In 1994, Fabiano was admitted to the Computer Engineering undergraduate course of the State University of Campinas, Brazil, and in mid-1997, he moved to France to finish his undergraduate studies at the Central School of Lyon. Also in France, he pursued graduate-level studies in Industrial Automation. Back in Brazil, he joined Unisoma Mathematics for Productivity, where he worked as a software engineer on the development of image processing and optimization systems. From 2000 to 2002, he joined the Faculty of Electrical and Computer Engineering of the State University of Campinas as a graduate student and acquired a Master of Science degree in Computer Engineering for developing a task scheduling algorithm for balancing processing loads on heterogeneous grids. Fabiano has also taken part in the publishing of the IBM Redbook, Grid Services Programming and Application Enablement, SG24-6100-00. Tomoari Yasuda is an IBM Certified IT Specialist for Distributed Computing in IBM Japan. After getting a Master's degree in Mechanical Engineering at the graduate school of Keio University, he joined IBM and worked for digital media customers in Japan for 3 years as a consultant and a developer with the WebSphere family. He has a deep knowledge of the digital media industry. Since then, he has focused on offering new solutions to several cross-industry customers. In 2004, he was certified in IBM Grid Computing Technical Sales, and has been in charge of technical sales support for grid computing. Chin Yau Lee works as an Advisory Technical Specialist in grid computing for IBM ASEAN/South Asia. He holds an Honours degree in Computing and Information System from the University of Staffordshire. He has been using Linux since 1996 and had a few years of experience as a UNIX and Linux engineer before joining IBM. His areas of expertise includes High Performance Linux and UNIX, UNIX Systems Administration, High Availability solutions, Internet based solutions, and grid computing architectures, which he has been actively working on for the last 4 years. He is also an IBM Certified Advanced Technical Expert on AIX, a Sun Certified System/Network Administrator, and a Red Hat Certified Engineer. He is also a co-author of the IBM Redbook, Deploying Linux on IBM eServer pSeries clusters, SG24-7014-00. Carlos Alexandre Queiroz is an independent consultant working for Alex Microsystems. He has been working with grid computing, JINI, and J2EE technologies since 2000. Currently, he is earning a Master's degree at Universidade de So Paulo as a Distributed Systems and Network Specialist. He has published articles at several congresses, such as middleware2003, SBRC, grid computing, and parallel applications events. Carlos is an active developer of the Web site, http://gsd.ime.usp.br/integrade.
Preface
xvii
Elton Minetto is a professor at Universidade Comunitria Regional de Chapec, Brazil, teaching programming, networking, and operational systems courses. He also works as a System Analyst and Network Administrator in the same institution, supporting Linux, Oracle, PHP, Java, and Python. Elton holds a Bachelors degree in Computer Science by Universidade Comunitria Regional de Chapec, and a Latus Sensus Graduation degree in Computer Sciences by UNOESC/UFSC, Brazil. Elton is an active member of the open software community, collaborating on various projects. Antonio Saverio Rincon Mungioli is an electrical engineer and professor at Escola de Engenharia Mau, Sao Paulo, Brazil. He also works as a System Analyst in the computing center at Universidade de So Paulo, and as a Technical Consultant of the IBM Business Partners in Brazil. Antonio holds a Master of Science degree by Escola Politcnica of Universidade de So Paulo, Brazil.
Acknowledgements
Thanks to the following people for their contributions:
Joanne Luedtke, Lupe Brown, Cheryl Pecchia, Arzu Gucer, Chris Blatchley, Wade Wallace, Ella Buslovich, Yvonne Lyon International Technical Support Organization, IBM Tony White Worldwide Grid Computing Technical Sales Business Unit Executive, IBM Ronald Watkins Worldwide Grid Computing Business Development Executive, Public Sector, IBM Chris McMahon Americas Sales Executive, Grid Computing, Higher Education, IBM Dr. Martin F. Maldonado Sr. Technical Architect, Grid Computing, Higher Education and Research, IBM Joe Catani Grid Computing in Higher Education, Public Sector, IBM Lori Southworth Market Manager, Education Industry, IBM Al Hamid Executive IT Architect and STSM, Grid/OSS Worldwide Leader, BCS, IBM Chris Reech, Jeff Mausolf IBM Global Services / e-Technology Center, Grid Computing Initiative, IBM
xviii
Nina Wilner Grid Technology - IT Technical Architect LifeSciences, IBM Elizabeth B Davis Education Client Representative, IBM Wolfgang Roesner Verification Tools, eCLipz Verification, IBM John Reysa Processor Simulation and Infrastructure, IBM Ross Aiken HPC Technical Solutions Architect, IBM Nam Keung Senior Technical Consultant, IBM Lee B Wilson Technical Sales Specialist, IBM Takanori Seki Distinguished Engineer, IBM Japan Ryuhichi Nakata ICP TS - Higher Education Industry, IBM Japan Hideyuki Yokoyama EBO Support Technical Competency, IBM Japan Shu Shimizu Tokyo Research Laboratory, IBM Japan Naritoh Yamada, Michitaka Kamimura LifeSciences, IBM Japan Yoshihiko Itoh GEO Sales Lead, Grid Business AP, IBM Japan Fumiki Negishi Grid Computing Business, IBM Japan Stephen Chu Grid Computing Executive, IBM China Al Min Zhu University Relations, IBM China Jian Jiong Zhuang IBM China
Preface
xix
Jing Hui Li IBM China Li Yang Zhou Grid Computing, IBM China Linda Lin IT Architect, IBM China Jean-Yves Girard Grid Computing Specialist, IBM France Yann Guerin EMEA Grid Computing TSM, IBM France Sebastien Fibra IT Specialist, IBM France Jean-Pierre Prost EMEA Design Center for on demand business, IBM France Dr. Luigi Brochard Distinguished Engineer, IBM Deep Computing, IBM France Mariano Batista IT Architect, IBM Argentina Ruth Harada Alliances Manager, IBM Brasil Katia Pessanha Universities Alliances Manager, IBM Brasil Jose Carlos Duarte Goncalves Executive IT Architect, IBM Brasil Joao Marques dos Santos Account manager, Public Sector, IBM Brasil Luiz Roberto Rocha Grid Computing Technical Sales, IBM Brasil Joao Almeida IT Specialist, IBM Portugal Srikrishnan Sundararajan IBM India Software Labs Clive Harris Senior Architect, IBM UK
xx
John Easton Senior Consulting IT Specialist, IBM UK Dr. Victor Alessandrini IDRIS - CNRS - DEISA Gisele S. Craveiro, Rogerio Iope, Liria Sata, Srgio Kofuji Universidade de Sao Paulo, Brasil Edward Walker, Ph.D., Tina Romanella de Marquez, Chris Hempel Texas Advanced Computing Center, The University of Texas at Austin Trish L. Barker, Karen Green National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign Alex Tropsha and his team Director Molecular Modeling lab, University of North Carolina at Chapel Hill Terry O'Brien, Dr. Anne Aldous, Scott Oloff University of North Carolina Project - IBM Madhu Gombar LS Solutions Architect, Healthcare/Life Sciences Solutions Development, provided a case study on the pilot engagement conducted in the cheminformatics arena with molecular modeling lab of UNC-Chapel Hill, NC. This was accompanied by a multi-media, interactive Flash demo developed by her to highlight application of IBM middleware in drug discovery.
Preface
xxi
Chinese Ministry of Education - MOE China Peking University China Tsinghua University China Huazhong University of Science & Technology China Shanghai Jiao Tong University China Xi'an Jiao Tong University China Southeast University China Northeastern University China Sun Yat-Sen University China South China University of Technology China Shandong University China Beijing University of Aeronautics and Astronautics China National University of Defense Technology China
xxii
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at:
ibm.com/redbooks
Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493
Preface
xxiii
xxiv
Part 1
Part
Introduction
This part of the book includes the following chapters: Chapter 1, Introduction to grid concepts on page 3 Chapter 2, How to implement a grid on page 15
Chapter 1.
Types of grids
Ideally, a grid should provide full-scale integration of heterogeneous computing resources of any type: processing units, storage units, communication units, and so on. However, as the technology hasnt yet reached its maturity, real-world grid implementations are more specialized and generally focus on the integration of certain types of resources. As a result, nowadays we have different types of grids, which we describe as follows: Computational grid A computational grid is a grid that has the processing power as the main computing resource shared among its nodes. This is the most common type of grid and it has been used to perform high-performance computing to tackle processing-demanding tasks.
Data grid
Just as a computational grid has the processing power as the main computing resource shared among their nodes, a data grid has the data storage capacity as its main shared resource. Such a grid can be regarded as a massive data storage system built up from portions of a large number of storage devices. This is known as either a network grid or a delivery grid. Such a grid has as its main purpose to provide fault-tolerant and high-performance communication services. In this sense, each grid node works as a data router between two communication points, providing data-caching and other facilities to speed up the communications between such points. In this sense, the WWW can be regarded as an embryonic communication grid that does not satisfy (yet) the third requirement of the grid definition [see 1.1, Beginning of the grid concept on page 4].
Network grid
Note: There is no a clear boundary for each type of grid. Every computational grid has a data and network component; likewise for a data grid and a network grid. As such, there really is just one sort of grid which is biased towards one or more of these considerations. Despite grids being a new field of research and development, there are a number of bibliographic references that comprehensively describe the concept of grid computing and its applicability. See Related publications on page 137 for a comprehensive list of such references.
Knowledge-oriented activities are performed in a variety of environments: schools, high-schools, universities, research institutes, large corporations, etc. On the other hand, grid implementations only make sense in environments where a meaningful number of computing resources can be integrated to form a higher-performance system, which tends to be rather restrictive. In this book, we consider the implementation of grid systems in environments gifted with a rather large number of computing resources, and that can be greatly benefitted by grid technologies. Each type of grid may be more or less suitable for each type of institution. The following list presents some comments about what may be best in each case: Universities Here, all grid types may be used for leveraging research and educational activities; either a computational grid or a data grid would probably be focused on the area of research, while a network grid would better fit educational purposes.
Research institutes Just as for universities, it is easy to see how research activities performed in institutes can benefit from a computational grid or a data grid. A network grid might be useful in some particular cases, as shown in Part 2, Grid by examples on page 31. Schools In the case of grade schools through high schools, these institutions would probably invest in a network grid for leveraging their educational activities.
In the next section we discuss some issues related to the applicability of grid computing in research and education.
1.2 Applicability
This section presents a brief discussion on which types of research and educational activities could benefit from grid computing technologies.
Briefly stated, a computational grid provides high-performance computing; a data grid provides large storage capacity; and a network grid provides high throughput communication that may be useful for a variety of applications, such as virtual conferences. Having this in mind, we can list the main reasons for using grid computing as follows: Improve efficiency/reduce costs Exploit under-utilized resources Enable collaborations Virtual resources and virtual organizations (VO) Increase capacity and productivity Parallel processing capacity Support heterogeneous systems Provide reliability/availability Access to additional resources Resource balancing Reduce time to results When these reasons are regarded under the light of scientific research, it is easy to understand why scientists are so keen on grids: they believe that the use of grids will transform the practice of their science. As stated in Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects, by Bardeen et al, grids are a tool for: 1. Sharing the costs and burdens of immense computing needs 2. Supporting the participation of scientists worldwide in large collaborations in particle physics, astronomy, cosmology, fusion and nuclear physics, medicine, and life science. Grids have the potential to change how people work together to make scientific discoveries. On the side of education, it is important to note that grids can play a major role as, according to QuarkNet Cosmic Ray Studies and the Grid: Probing Extensive Showers by Bardeen et al, grids represent: An opportunity for a new style of collaborative learning An aid to online posters and discussions with students at other schools An easy way to present and review results An easy way to conduct peer-to-peer discussions A rapporteur of presentations and discussions A single portal to distributed resources Distance Education and Higher Education are the fields more directly touched by grid application in education.
Performing meteorological forecasts, calculating the aerodynamic behavior of an airplane, assembling the genome of an organism, analyzing the elementary particles on an accelerator, virtualizing computing resources, and data-mining several terabytes of data these actions all need extensive calculations and handling of enormous amounts of data. This is the perfect scenario for grid computing technologies. In Part 2, Grid by examples on page 31, a number of examples are analyzed and, for each one, a graph is used to represent the portion of computing, data and communication features that a specific implementation has.
Network grid
Thus, as stated in article ITR: Distance Collaboration - Education and Training on the Access Grid , by Morton et al, grid application in education represents an ambitious venture in a direction that substantially increases the ability of groups to cooperate and achieve a sense of collaborative community even though they are distributed across the planet.... Using one of the ... collaborative tools in the realm of information technology, ... the project investigators will launch an initiative to advance the state of the art (in a social and technical sense) in geographically distributed project oriented collaborations An interesting scenario that can be drawn from these ideas is, as presented in the Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects, by Bardeen et al, the one in which educators become interested in and excited about the potential that grid tools and techniques bring to data-based classroom projects and, as a result, use the grid as a hosting environment in which inquiry-based projects are standards-based, visually appealing, use common tools and data formats, allow for levels and scale of use, and provide support materials for educators and students. In such a scenario, some teachers will come to the projects as experienced users with a great deal of knowledge about the research and experience with inquiry-based learning using online resources. Others will be emerging or beginning users. Most classroom users will analyze data from the Web tools. Some will be interested in learning whats under the hood, exploring grid portals, and a few will become developers of grid skins or transforms.
10
Another interesting example described in GRASP-Grid Accessed Data and Computational Scientific Portal, by Sharly, shows how to develop a scientific portal for a learning community who can access Computational Servers, Streaming Servers, Digital Library, Course Materials from different servers spread across the globe, or use Mathematical packages, Computer Aided Design (CAD), Simulation packages, etc.
11
12
13
Here are some other interesting issues regarding the future of grid computing: The grid expansion may embrace multiple media types; thus, radios, televisions, and phone networks will also be available as a grid service. Personal and home-based offices will become a reality; this may change the way that small and large corporations are conceived. These are some of the possibilities that might arise from the grid world, and there is no doubt that they will definitely change the way that we deal with information in our personal and professional activities.
14
Chapter 2.
15
2.1 Introduction
Knowing what a grid is and what it can do for you is essential when you plan to use this technology to tackle your most demanding computational problems. However, when going through the process of implementing a grid computing environment, there are many other issues that arise and that may require special attention. This chapter offers a brief discussion on how to implement a grid computing environment and, as such, it covers the following topics: Basic requirements for setting up a grid computing environment How to set up an initial grid How to maintain and expand the grid The following topics are not covered: Which software or hardware, in particular, should be used for implementing grid environments Which companies can best provide grid implementation services More information about the various grid topics can be found in the bibliographic references presented in Related publications on page 137.
16
In the future, we expect that, as the technology evolves and the grid concept becomes part of common sense, implementing a grid will be as simple as installing a certain software application on a number of computers. Before getting there, grid implementors should be aware of the traps into which they may fall.
2.1.2 Approaches
There are two basic engineering approaches that we have chosen to adopt when implementing a grid environment: bottom-up implementation and incremental growing.
Bottom-up implementation
In order to understand what is meant by bottom-up, a system as a grid should be regarded as having multiple levels of abstraction. In this case, we are considering that the lowest level of abstraction is the one that takes into consideration the details about the hardware that build up the grid. As the level of abstraction increases, we move our focus to the software layer and, at last, the human factor layer. Having these things in mind, we are able to state that performing a bottom-up implementation implies making sure that everything in a certain layer of abstraction is working properly before moving to an upper layer. This may sound quite obvious, but it is not! There are very specific conditions in which a layer has to work and we will try to depict these conditions here.
Incremental growing
The bottom-up implementation philosophy refers to the way that a group of nodes should be set up to make part of the grid, but it does not address the way that a set of nodes should be integrated into the grid. In these circumstances, the order in which nodes are set up does matter, and for such, we recommend the adoption of an incremental growing philosophy.
17
The combination of these two ideas can be represented by the diagram in Figure 2-1.
In this figure, each tank represents a group of grid nodes, and the level of water inside a tank represents the level of abstraction in which we are working to set these nodes up for the grid. This figure suggests that: The group of nodes should be integrated into the grid only after fully-integrating previous nodes. The order in which nodes are integrated into the grid depends on the underlying physical and logical structures that connect them. In terms of this figure, implementing a grid is the same as filling interconnected tanks with water. In the following sections, we show how this interconnection should be accomplished and at which tank the water should be shed.
18
19
The performance of a data grid heavily depends on its communication links, but it is very difficult to express the grid performance as a function of the quality of its links. A worst-case estimate can be found calculating the time that it takes to exchange a data record between the two nodes for which the communication has the worst performance. For a network grid: The hardware requirements for these grids are even more difficult to determine due to the on-demand nature of their functionality. As a rule of thumb, the average data throughput provided by such grids between two points can be estimated as the average data throughput of the best communication path between these nodes.
Remotely and automatically upgrading the grid platform and the code for its applications; it is impossible to rely on manual software upgrades when
talking about dozens of computers (not to say hundreds or thousands).
Remotely monitoring the computing resources; the grid platform must provide
real-time information about the state of its computing resources, such as if they are working properly or if they have failed, how efficiently they are executing application tasks, and so on.
20
Storing logging information about all the activities performed on the platform; historical information about the grid performance is essential when
tuning applications. For such, the grid platform should provide a way that developers can analyze this information.
Controlling access to the platform; for obvious reasons, there must be a way to control the access to the platform. Securing the data exchanged within the platform; application developers will
not put their applications to run onto the grid if they are not assured that sensitive data can be secured. Once these requirements are satisfied, we can move to the next level of abstraction: the human-resource requirements.
21
In addition to these two analyst roles, it is recommended that: There is one analyst able to help application developers to develop and test their applications. Finally, here is an important remark concerning the human factor: The grid software execution should be as transparent as possible when performed on ordinary desktop computers; users tend to interrupt every running program that they do not recognize as useful or that they believe be a source of overhead; two generally good options are screen-savers and
system services.
We are assuming here that in a grid implementation, most of the integrated computing resources will not be dedicated to the grid. This means that they are already set in place and made part of a physical architecture that was previously defined for other purposes. In general, a grid implementation does not address complex physical architecture issues and, more important, it does not depend on such issues being accomplished. Defining the logical architecture of a grid implementation as, for example, separating computing resources in grid groups, is something we expect to be transparent in the future. Future generation grid platforms will hopefully be able to automatically map a given physical architecture into the best possible logical architecture by performing network tests and benchmarks. As these platforms are still to come, there are some simple rules that might be useful when implementing a grid:
22
Computing resources that are interconnected by a high-speed network and that are physically close to each other are the best candidates for building up a logical grid group; such a group would work very similarly to a cluster of computers, exchanging data among themselves at high rates and with other groups at low rates. If a logical group has been set for computers that are directly connected to each other by a high-speed network, then they will have a local computer that will bridge all the inter-group data exchange. The natural choice for this particular computer is the one that has the role of network gateway, for efficiency and security reasons. Setting logical links between logical groups of computing resources is referred to as defining the high-level architecture of the grid; as groups are not expected to exchange huge amounts of data, the performance of the communication links should not be a concern as long as there are no converging points of communication and/or coordination; a master-slave high-level architecture has this drawback. Inter-group links have to be stable. If stability cannot be assured, dynamic high-level architectures should be considered. Dynamic architectures depend a lot more on the grid platform and, if they can offer flexibility and robustness, they are harder to maintain and have not yet reached maturity in terms of standardization. Having these rules in mind, one should still remember the diagram in Figure 2-1, How a grid should expand on page 18, when planning the architecture of the grid. This means: Defining the point from where the grid will expand is crucial; this is the place where the administrative infrastructure of the grid will be located and where the fault-tolerant parts of the system will be installed. In general, this place has to have a fast and stable down and up links. Defining at which directions the grid will expand is crucial as well; the grid growth should never compromise its performance as, in theory, it is infinitely scalable.
23
Even if the platform you acquire claims to be hardware independent, there may be certain hardware configurations for which the platform performs better; this probably refers to the way that tasks are distributed across servers and how these servers are interconnected. If you are not setting up a general-purpose grid, special attention should be paid when choosing the hardware; the performance of some applications may dramatically vary depending on the type of machine they are running on; in particular, applications that perform intensive memory access or math processing belong to this category. Well-behaved grid applications have a high processing and communication rate. This means that communication hardware should not be as much an issue as processing hardware. Briefly, you should prefer faster processors more than faster networks. Make sure that your hardware meets your performance expectations before moving to the software set-up. Perform memory access, math processing, and communications benchmarks, and generate reports about the results.
24
Check if the plug-ins that might be required to integrate the grid platform with the applications are available. Perform integration tests if this is possible.
Data deployment
25
Unfortunately, data deployments are more time-consuming, more frequent, and, while they are being performed, the application stands idle waiting for the data to arrive. For this reason, there are a few things that are worth mentioning when discussing application deployments: Some grid platforms make it possible for multiple applications to be executed simultaneously; if this is the case, application deployments do not cause much impact, as the grid does not have to be idle while they are performed. Some few applications are capable of dealing with streaming data, and some grid platforms do support this sort of application (the application starts processing the data as soon as it gets to the nodes). If a single-application grid is to be set up and its application works this way, adopting a streaming enabled grid platform is something to consider. Deployments should be ultimately performed by the system administrator, but the platform might make facilities available for the application developers to submit their application code and data so that every deployment is correctly logged and assigned to its developer. Application deployments should not be a serious concern in terms of performance, as, for a well-behaved grid application, the processing time has to be much greater than the communication time. However, special attention should be paid so that such deployments do not cause the overall system performance to deteriorate in case of malicious or accidental user behavior.
Web publishing
Probably the simplest way to make data available to grid applications is to publish it on the ordinary Web sites of FTP servers. There is a whole generation of systems and tools to aid developers to accomplish this task efficiently, but this philosophy has some major drawbacks.
26
If publishing the data itself is easy, getting it to process may not be; the grid application programmer will have to deal with network programming to build its application, which is not desired; additionally, depending on how the application is designed, it can suffers from scalability, as every node may try to access the data at once. This happens because the responsibility for distributing the data across the grid relies on the application designer, and not on the grid platform. To sum up, this might be a good option when fast and short-term applications are to be developed, but one should not rely on this type of publishing for long-term and complex applications.
27
28
29
30
Part 2
Part
Grid by examples
This part of the book includes the following chapters: Chapter 3, Introducing the examples on page 33 Chapter 4, Scientific simulation on page 37 Chapter 5, Medical images on page 47 Chapter 6, Computer-Aided Drug Discovery on page 57 Chapter 7, Big Science on page 67 Chapter 8, e-Learning on page 79 Chapter 9, Visualization on page 93 Chapter 10, Microprocessor design on page 103
31
32
Chapter 3.
33
34
Type of grid implementation In this example we present a grid environment to support many of the educational and research requirements for exchanging information. Knowing the main ways that education can benefit from grid technology, we can deduce the basic technological needs associated with the development of e-learning. The e-learning infrastructure presented in this chapter is based on the Access Grid. In this example we present a grid implementation to support the field of advanced scientific visualization. The area of visualization is evolving as it addresses emerging and continuing issues, such as interactive and batch rendering of terascale data sets, through remote visualization. At same time, universities in general have a lot of heterogeneity, using many low-cost resources from different suppliers. This includes running different systems through advanced computing resources, such as super computers, advanced visualization systems, etc. Most of these resources are segregated in specific departments for local access only. In this example we present a computational grid solution that helps to reduce the microprocessors development cycle and also allows the design centers to share their resources more efficiently. Microprocessor design and microprocessor verification simulation requires massive computational power.
Visualization on page 93
The order in which these projects are presented was chosen so that chapters describing grid implementations with similar features are grouped together. The chapters describing the projects contain the following information: Business context This describes the current situation of the customer or organization. Business needs This describes the motivations to do something, compelling reasons to act, how the context is changing, what things the customer has to do to improve its business context or to move towards a new business context, or to adapt to something that is changing.
35
Case analysis Functional requirements This describes what the system is supposed to do and what the users want. Non-functional requirements This describes the attributes of a system/architecture/solution: Qualities, all those things that are not specifically asked by a user as a function to be done by the system, but which the technology has to provide anyway. Use-cases This describes the users, roles, and use-cases. Case design This describes the component model and architectural decisions, such as product selection. Implementation This presents the current implementation status.
36
Chapter 4.
Scientific simulation
In this chapter we discuss the following topic: A computational grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology
37
4.1 Introduction
In this example we present a grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology. The implementation tackles the problem of intensive calculations, which demands high performance computing and typically requires large computational infrastructures as clusters. Note: A similar solution has already been set in place in a number of research institutions in Japan, such as: National Institute of Advanced Industrial Science and Technology, AIST:
http://www.aist.go.jp/index_en.html
38
4.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
39
Functional
To satisfy the requirements described above, these are the recommendations: Provide high performance computing power sufficient for accomplishing computing intensive tasks (on the order of teraflops). Do this at a rather low cost (a high cost would be, at least, about hundreds of thousands of dollars, which is the cost of low-profile high-performance computers). Do this in a way that upgrading the computing infrastructure does not pose major difficulties.
Non-functional
The main non-functional requirements for this solution are as follows:
Ease of use: High performance computing systems are likely to become part
of everyday research activities. For this reason, the use of such systems should be accomplished by intuitive and common-place interfaces, hiding from their users the inner details of the process of executing an application.
Management: The system mustnt require large resources, either technical or human, to be maintained, as this can seriously compromise the cost of the solution.
4.2.2 Use-cases
Considering all the requirements presented in the previous section, which were basically drawn from the need for high-performance computing, we can draw up the following set of use-cases for such an infrastructure.
Use-cases diagram
The use-cases diagram is presented in Figure 4-1.
40
submit jobs to the system fetch and analyze results user authentication
Administrator
Researcher
Users description
These are the various roles involved:
Researcher: This is the user who performs the basic tasks with the processing
jobs. This role is normally personified by a university professor or research institutions technical staff.
Administrator: This is the user responsible for the management tasks on the
system. This role is typically personified by a network or system analyst or a data-base administrator.
Use-cases description
Here we describe the various use-cases:
User authentication: This use-case represents the procedure which all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation.
41
Submit jobs to the system: Actually, this use-case contains several sub-use-cases that describe the installation and the management of grid applications and will not be described here at length. In simple terms, the life-cycle of a grid application is: development, local test, installation on the grid, execution, and result analysis. Fetch and analyze results: When an application finishes executing, its results
have to be fetched and analyzed by the researchers. The analysis itself is generally performed using specific visualization tools, but before that, the system has to be able to make such results available. This is normally accomplished by making data files, generated during the application execution, available through a networked file system or through some specialized interface, such as the Web. In this example, we consider Web-based interfaces as the default option.
Manage resources utilization: In this use-case the administrator performs monitoring and performance-tuning tasks to make sure that the grid facility is working in an optimized way. Typical activities that make up part of these tasks are: checking activities logging, analyzing usage history, checking average system load, and setting caching parameters, among others. Manage security and authentication: This use-case includes all the
security-related issues about administrating the grid, such as managing user accounts, defining security policies, configuring network software and hardware components, and so on.
42
Integrated computing resources Cluster Portal Site Web Browser OGSA Toolkit Scheduler Cluster
Web Browser
Web Browser : This provides easy access to the grid. Job submission and
receiving results can be performed through this component.
OGSA Toolkit: This can integrate existing clusters, assuring the deliverance of
non-trivial qualities of service. Its basic role is to receive jobs upon user requests and submit them to the scheduler.
Clusters: These are conventional clusters made up by dozens of networked computers. In general, each cluster works on an exclusive high-speed local area network. Integrated computing resources: This structure is the virtual computer made up by the multiple clusters that are part of the grid.
43
Portal site
OGSA Toolkit
Globus Toolkit
Scheduler
Grid System Gateway: A portal solution, developed by IBM, for implementation of a computational grid environment. For more information about grid computing solutions, refer to: http://www.ibm.com/grid/jp/solutions/portal.shtml Platform LSF: Intelligent, policy-driven batch application workload processing middleware, platform computing product. For more detail, see: http://www.platform.com/
4.4 Implementation
The solution presented in this chapter is a reasonably standard computational grid implementation. This section presents the implementation level already achieved by the last one as well as giving some additional information about the technologies adopted.
44
Implementation status
One of the grid implementations taken into account has adopted the products listed in Table 4-2.
Table 4-2 A typical product selection Component Computing nodes Products IBM eServer IBM xSeries Chosen product In this example, a set of xSeries-based clusters were integrated into the grid. These servers are running Redhat Linux due to its reliability, cost-effectiveness, technical support availability, and full-compatibility with Globus Toolkit. The killer application in place is BLAST.
Operating System
Redhat Linux
Application
BLAST
BLAST: Basic Local Alignment Search Tool, provides a method for rapid searching of homology, such as a search of a nucleotide and protein database.
Their current implementation accounts for: A 4-node computing cluster made of Grid Mathematica servers An 8-node computing cluster made of BLAST servers managed by PlatformLSF A 4-node management cluster made of Globus Toolkit servers 1000BaseT network The expectation of the grid administrators is to expand this computational grid to other campuses so that several computing clusters can be integrated into this system. The total computing power expected for the end of 2004 is about 10 teraflops.
45
4.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies making use of high-performance computing infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the research institutions could afford to have. In this chapter, we presented a grid-based implementation for a high-performance computing infrastructure suitable for most of the researching demands. This infrastructure should be able to deliver high-performance computing at a much lower cost, which makes grid computing so appealing to research institutions. We recognize that a number of compute intensive applications might not benefit from this technology, but we are sure that the grid philosophy will greatly affect the way that research is performed by such institutions.
46
Chapter 5.
Medical images
In this chapter we discuss the following topic: A joint use of a data grid and a computational grid in a medical-image storage and processing framework
47
5.1 Introduction
In this example we present a data and computational grid in a medical-image storage and processing framework. The example tackles the problem of storing and processing large images, which typically requires large computational infrastructures such as distributed databases and clusters. The solution allows the use of idle storage and processing capacities in machines of the grid to store and process large amounts of medical digital images. Note: A similar solution has already been set in place in the eDiaMoND project. This is a collaborative project funded by grants from the Engineering and Physical Sciences Research Council (EPSRC), which is the UK Government's leading funding agency for research and training in engineering and the physical sciences, Department of Trade and Industry (DTI), and IBM. It is strictly a research project which has the ambitious aim of proving the benefits of grid technology to eHealth, in this case for Breast Imaging in the UK. More information about the eDiaMoND project can be found at:
http://www.ediamond.ox.ac.uk/
48
To improve the breast cancer screening and epidemiology applications, it is necessary to develop a system able to provide large-scale digital imaging storage and analysis services, make possible for medical sites to store, process, and data-mine medical images, manage mammograms as digital images, and make such images available to other sites, like clinics, hospitals, universities, and research institutes. One obstacle for a solution to support digital imaging is the space necessary to store these images. A digitalized A4-size mammography with the minimal resolution needed to do an effective analysis occupies around 32 MB of storage space. Usually, four images are taken, using around 128 MB of space per exam. These requirements make grid computing a good candidate.
5.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Functional
To satisfy the requirements described above, these are the recommendations: Provide capacity to store thousands of medical images, each one having a size of approximately 32 MB, and around 128 MB per patient. Provide capacity to store the non-image data about the patients, such as personal identification, data of tests, doctors responsible, and treatments, for thousands of patients. Provide computing power to process the medical images searching for patterns that can indicate a cancer. Provide access to patients images and information to hospitals, clinics, universities, etc.
Non-functional
The main non-functional requirements for this solution are as follows:
49
Logging: Provide ways to log all access to patients images, data, and all related diagnostics. Scalability: This system must be able to store up to 8 million medical images per year. The system must be able to accept the entrance of new universities and hospitals.
5.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 5-1 shows the use-cases diagram.
Users description
These are the various roles involved:
Technician: This person is responsible for the image acquisition. This role is normally personified by a specialist in the operation of computers in general, x-ray machines, and other medical devices.
50
Radiologist: This person is responsible for accomplishing a diagnostic based on the patients images and data. This role is personified by a specialist in radiology and oncology. Administrator: This person is responsible for performing high-level management tasks in the screening process. This role is normally personified by an analyst with knowledge about the workload and capacity of the participant hospitals. Researcher: This person uses the large amount of images developed for
research cases, such as new technologies and methodologies for diagnostics, new treatments, drugs to treat diseases, and systems that do image processing and recognition.
Use-cases description
Here we describe the various use-cases:
User authentication: In this use-case, this is the process that occurs for
ascertaining the identity of the originator of some request to the system. All users must be authenticated to use the system.
51
52
Grid nodes: The grid nodes are the resource providers of the grid infrastructure, defining the dimension of the grid. Each participant site of the grid, such as hospitals or universities, can add servers increasing the capacity of store, manipulate, and process the patients images and data. Beyond a toolkit to build the grid infrastructure, each server has the following components:
Image storage: In this component the DICOM image files are effectively stored. To manage these files, a software component is used, a content manager. Federated database: A relational database server is used to store the patients data and the image metadata that describes those files. This information is federated to all databases installed in the hospitals or universities that form the grid.
Image session workstation: Using this component, radiologists and researchers retrieve the images and related data to perform diagnostics and researches. Management portal: This component, installed in a central location, is used by the Administrator to perform management tasks, such as managing the system workload and capacity.
On the application development, the implementation is toward to Service Oriented Architecture (SOA) and open standards like Open Grid Services Architecture (OGSA), including Open Grid Services Architecture - Data Access Integration (OGSA-DAI). OGSA-DAI is a project developed by the UK Database Task Force, whose objective is to provide a standard interface for a distributed query processing system to access data in different databases. As shown in Figure 5-2 on page 52, both the image and non-image data are accessed by the components of the grid, through the OGSA-DAI standard implementation. For more information about the grid standards, refer to the following Web sites:
http://www.ogsadai.org.uk http://www.ggf.org http://www.oasis-open.org
Another important issue is about security and privacy. At the time the images are accessed through the institutions that use the grid, between a hospital and a university, for example, this data must be protected, using security techniques like cryptography and other security strategies defined in the OGSA standard.
53
Relational Database
Management portal
Servers
Operational systems
54
Chosen product IBM Visual Age is chosen due to its productivity and because it is a pre-requirement for Content Manager.
Another important reason for IBM Content Manager to be chosen is the possibility of integration with other components using OGSA. OGSA-DAI already supports relational and XML data sources and provides a flexible framework into which to plug other data sources. Since the query language for Content Manager is Xpath, it is possible to expose Content Manager as another XML data source.
5.4 Implementation
This section presents the implementation level already achieved and the next steps to be followed for a complete deployment.
Implementation status
Here is a summary of the current implementation level: There are 4 screening centers. There are 5 universities. Approximately 30-35 staff are involved. The system stores around 256 TB / year The system stores and process only mammograms. Approximately 1.5 million women are screened each year in the U.K. Women screened for breast cancer currently have one view per breast taken, meaning 3 million mammograms per year.
Next steps
The goal for a complete deployment is as follows: Plan to have 92 screening centers; 230 radiologists. Expand to create a worldwide digital mammography grid by linking up with screening programs being developed in France, Germany, Japan, and the United States. Expand to store and process other types of medical image not specific to mammograms or even cancers in general.
55
Data-mining technology will be able to search the database for images that are similar to the one being looked at and have a known diagnosis. Plan to increase over the next 2-5 years to two views of the mammogram per breast, meaning more than 6 million mammograms per year.
5.5 Conclusion
Cancer has been regarded as one of the most challenging research topics for the medical institutions. As a definite cure seems not to be within the researchers sight, preventive examinations and early diagnosis have been the best weapons with which doctors have been fighting this disease. In this context, the analysis of medical images plays a major role in cancer diagnosis and prevention. In this chapter, we presented a grid-based solution that aimed at leveraging medical image analysis and management. Such a solution provided a huge storage capacity, where an unbounded number of images could be stored, and high-performance computing, for the automatic analysis of massive quantities of images, at a rather low cost. We strongly believe that such applicability of grid computing can greatly improve the way that medical research is performed and, ultimately, provide a better quality of life for humanity as a whole.
56
Chapter 6.
57
6.1 Introduction
In this chapter, we describe a grid implementation example that tackles the problem of Computer-Aided Drug Discovery (CADD), a workflow that can be used to increase the hit rate in screening chemical compound databases and thus speed up drug discovery. This approach demands high performance computing and typically requires large computational infrastructures such as clusters, mainframes or super-computers. Note: A similar solution has been set in place at the Molecular Modeling Laboratory (MML) at the University of North Carolina (UNC) at Chapel Hill School of Pharmacy. More information about this project can be found at the following Web site:
http://www.ibm.com/software/ebusiness/jstart/casestudies/uncmodel.shtml
58
Lack of automation The need for a large computer resource to efficiently perform model generation The development of validated and predictive QSAR models can require building thousands of models per dataset. Assuming that a 100 compound dataset requires about 10 minutes to generate a model on a single CPU, and that frequently about 10,000 models need to be built, that equates to more than 76 days to complete the task.
6.2.1 Requirements
This section describes the functional and non-functional technical requirements for the proposed solution.
Functional
To satisfy the requirements described above, these are the recommendations: Provide ways to automate the execution of QSAR models, creating a workflow. Create an integrated and easy-to-use interface where researchers and users with little technical knowledge can submit and manage jobs. Create an integrated file system that allows various platforms (AIX, Linux, UNIX, etc.) to access data and create QSAR models regardless of the machine type. Utilize the grid computing capabilities to reduce the total processing time required to generate numerous QSAR models.
Non-functional
The main non-functional requirements for this solution are as follows:
Easy updates: Tasks such as workflow manipulation, data source modifications, and the addition of new modeling algorithms, must be performed by administrators and researchers with low levels of computational skills. Integration: Make possible the integration of visualization and analysis tools from third-party vendors, allowing users to easily analyze their results.
Chapter 6. Computer-Aided Drug Discovery
59
6.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 6-1 shows the use-cases diagram.
60
Users description
These are the various roles involved:
Public user: This user makes use of the public version of the system. The user creates his/her account and uses the system with limited privileges. This role normally is personified by a student. Private user: This user has more privileges on system utilization. This role is normally personified by a pharmaceutical company researcher or another business partner. Researcher: The researcher is the one that conducts complex analyses
based on results of all jobs stored by all users. This role is personified by a professor or scientist.
Administrator: This user is responsible for performing high-level management tasks on the server and grid system.
Use-cases description
Here we describe the various use-cases:
User authentication: The users must provide a valid username and password
to access the portal and built-in applications.
Create a new user profile: In this use-case, a public user requests the creation of a new user profile. This request will be approved by the administrator based on the user information provided. Submit/resubmit jobs: In this use-case, valid users can submit new jobs
based on parameters entered into the portal. They can also resubmit a completed job using parameters copied from a previous job.
View job status: In this use-case, a user checks the status of a submitted job,
which may be running, failed, or completed.
Delete jobs: In this use-case, a user deletes submitted or completed jobs. Run visualization tools: In this use-case, a user makes use of visualization tools to analyze the job's result, generating charts or another graphic representations. Store results: Researchers and private users can store their job results in the
data grid.
Analyze results: Researchers can utilize the stored job results to perform
complex analyzes and data mining to correlate information.
61
Portal: This is the component through which all users can interact with the
system. The users perform authentication in a portal and a defined role is assigned to them. According to this role, the user can execute activities such as job submission, job management, visualization, and analysis of results, etc. The administrators also use the portal to perform administration tasks.
Application server: This component uses data or pointers entered into the portal and executes the model generation process on the grid nodes. Workflow: Workflows are the stepwise execution of various QSAR
development programs to produce a QSAR model.
62
Grid nodes: The model generation process started by the user is divided into
smaller workflows that build a single QSAR model each, identified in Figure 6-2 as QSAR Model Mini-workflow. These compute jobs are sent to the grid nodes for processing.
Database: A relational database is used to provide quick retrieval of run-time data and users' profiles. GFS: The GFS, or Global File System, is a high-performance shared-disk file system standard that provides data access for multiple nodes running different operating systems. Using GFS, it allows multiple systems of various types to access and write to the same file space.
Database
Portal Server
Workflow modeling
63
Component GFS
Chosen product The Avaki DataGrid was chosen as it provides a global file system with a unified namespace but with a smaller system footprint. Additionally, it is simpler when compared to other options, like AFS. Globus Toolkit was chosen due to its open-source nature.
Grid middleware
Globus Toolkit
6.4 Implementation
This section presents the current implementation level achieved, and the next steps to be followed for a complete deployment.
Implementation status
Here is a summary of the current implementation level: Full-featured grid portal acting as a single point of interaction, allowing a great number of users to use the solution Integration of modeling tools in the form of a workflow Better automation of the process decreasing the need of human interaction Creates a single view of the file systems by the use of GFS, which makes it easier to execute the QSAR applications on a compute grid Decreases the QSAR model generation time from days to hours
Next steps
This solution can be expanded including new grid nodes or interconnecting with other existent grids, increasing the processing capabilities and availability.
6.5 Conclusion
The development of new drugs is a slow and expensive process that comprises several research phases and experiments to identify which chemical compounds can be developed as drugs.
64
The utilization of CADD solutions and mathematical models like QSAR can help in speeding up this process. By making use of these solutions, research institutes, universities, and pharmaceutical industries can accomplish complex simulations and determine important pharmaceutical characteristics of chemical compounds. When implemented in the proper way, this has the ability to greatly decrease the costs and time required to deliver new drugs to the market. Grid computing can play an important role in this situation, acting as an acceleration factor in the modeling process. These chemical modeling applications generally need large compute power to supply satisfactory results in a short time period. Utilizing distributed and parallel processing capabilities provided by grids, simulations that previously took several days, now can be completed in a few hours. The solution presented in this chapter also has the advantage of integrating several modeling tools into an automated workflow Additionally, with the integrated and easy to use interface provided by the portal, a greater number of research institutes, pharmaceutical industries, universities and students can interact and share their knowledge.
65
66
Chapter 7.
Big Science
In this chapter we discuss the following topic: Implementation of a data grid and computational grid to support government-sponsored laboratory projects (also known as Big Science )
67
7.1 Introduction
In this example we present an implementation of a data grid and computational grid to support government-sponsored laboratory projects (also known as Big Science). The system accomplishes the problem of storing huge quantities of data, which demands high storage capacity and typically requires large and parallel computational infrastructures. The data grid implementation is based on the IBM General Parallel File System (GPFS). Note: A similar solution has already been set in place in DEISA, a consortium of leading national super computing centers in Europe aiming to jointly build and operate a distributed terascale super computing facility. More information about the DEISA project can be found at the following Web site:
http://www.deisa.org/
68
In the last few decades, the use of computers in these research institutions has increased considerably, due to their capacity to aid the researchers in processing data and storing results.
69
7.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Functional
Considering a typical scientific application similar to the ones cited in the introduction of this chapter, the following functional requirements were drawn up: A storage capacity to accumulate at least 10 petabytes/year is needed. A single system image; the users and applications must be able to access the same data independently of the node they are using, by either: Mapping function, locating proper data across multiple disks over different nodes Message passing, shipping data between nodes
Non-functional
The main non-functional requirements for this solution are as follows:
Performance: The system must deliver raw recording rate from 0.1 to 1 gigabytes/sec. Consider the caching system (coherence, aging, swapping, data-shipping, etc.). Scalability: The system has to be able to grow until its storage capacity reaches dozens of petabytes without significant losses in performance Robustness: The system must make efficient use of its data-storage resources so that the information stored is maintained regardless of individual devices faults. Consider the data replication. Portability: The system should run on various platforms. Security: The system must be able to operate across firewalls and provide
robust security models.
7.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 7-1 presents the use-cases diagram.
70
user authentication
Researcher
Administrator
Users description
These are the various roles involved:
Administrator: This is the user responsible for the management tasks on the
system. This role is typically personified by a network or system analyst or a data-base administrator.
Use-cases description
Here we describe the various use-cases:
User authentication: This use-case represents the procedure that all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation. Store experiment results: In this use-case, researchers store the results of
their researches in the grid. This task is accomplished as if researchers fetched a file (or a set of files) from their local file-system.
Analyzes experiment results: In this use-case, the researcher fetches the data
from the grid and analyzes it. As in the previous case, this task is accomplished as if researchers fetched a file (or a set of files) from their local file-system.
71
Manage resources utilization: In this use-case, the administrator performs monitoring and performance-tuning tasks to make sure that the grid facility is working in an optimized way. Typical activities that make part of these tasks are: checking activities logging, analyzing usage history, checking average system load, and setting caching parameters, among others. Manage security and authentication: This use-case includes all the
security-related issues about administrating the grid, such as managing user accounts, defining security policies, configuring network software and hardware components, and so on.
The reason why we have chosen to base our grid implementation on GFS is that it partially solves some of the issues associated with data-grid implementations. Through a GFS layer, it is possible to provide a single file-system view to the system applications and services so that only minor control has to be aggregated by a data-grid platform. This considerably eases the task of providing data management services that satisfy all the basic grid requirements.
72
Applications Layer
OGSA Layer
G FS
OGSA
OGSA
N etw ork
G FS G FS
Grid Nodes
G FS Storage Node
OGSA
G FS
73
GFS storage nodes: These are the computers whose storage devices take
part in the GFS schema. A virtual disk is mounted and made available to the remaining computing resources by the data network.
Grid nodes: These are the computers where the OGSA software is running for providing grid services over the GFS schema. In addition to assuring non-trivial qualities of service over the data storage facilities provided by GFS, the OGSA layer may also provide computing-grid and communication-grid services.
There may be computers that both contribute to the GFS system and provide OGSA-level grid services, as we can see in the bottom portion of Figure 7-3.
GFS
Andrew File System (AFS) Avaki DataGrid IBM GPFS Linux IBM AIX v5
Operating System
The key to the high performance and scalability of the General Parallel File System (GPFS) implementation is its intrinsic parallelism. Files are not localized but striped across many disks on the file system. A computing node accessing a file can use multiple network paths to perform the access, avoiding network bottlenecks and increasing the availability of the data.
74
Another important characteristic of the GPFS is its compliance with the Portable Operating System Interface (POSIX) standard. The POSIX is a series of standards being developed by the IEEE that specify a Portable Operating System interface (the IX denotes the UNIX heritage of these standards). Its most important definition is a set of programming interface standards governing how to write application source code so that the applications are portable between operating systems.
75
7.4 Implementation
This section presents the implementation level already achieved in this site and the next steps to be followed for a complete deployment.
Implementation status
In the first step of implementation of the grid solution, four research institutions that have IBM equipment only are being integrated. A high throughput network was built between the institutions as basic infrastructure. The following resources are available on the grid: Over 4000 nodes Integrated peak performance of 24 Teraflops 125 racks spread over 3 countries (Germany, France, Italy) Capacity to accumulate at 5-8 petabytes/year
Next steps
The next steps of implementation of this grid will be as follows: It will be expanded to other research institutions and universities using other equipment of other brands. An upgrade of the dedicated network interconnect (from 1 Gb/s to 10 Gb/s) is also scheduled. Tens of petabytes are planned by 2007-2008, with an exabyte approximately 5-7 years later.
7.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies dealing with massive quantities of data stored on high-storage capacity infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the research institutions could afford to have.
76
In this chapter, we presented a grid-based implementation for a high-capacity storage computing infrastructure suitable for most of these researching demands. This infrastructure should be able to deliver high-performance computing at a much lower cost, which makes grid computing so appealing to research institutions. In addition, this particular implementation made use of the GFS standard, which simplified greatly its implementation.
77
78
Chapter 8.
e-Learning
In this chapter we discuss the following topic: Implementation of a network grid supporting an e-learning infrastructure that embraces many of the requirements for exchanging information in the educational and research fields
79
8.1 Introduction
In this example we present a grid environment to support many educational and research requirements for exchanging information. Knowing the main ways that education can benefit from a grid, we can draw up the basic technological needs associated with the development of e-learning. The e-learning infrastructure presented in this chapter is based on Access Grid. Note: Access Grid is an ensemble of resources including multimedia large-format displays, presentation and interactive environments, and interfaces to grid middleware and to visualization environments.The AG technology was developed by the Futures Laboratory at Argonne National Laboratory and is deployed by the NCSA PACI Alliance. For more information about the AccessGrid, refer to the following Web site:
http://www.accessgrid.org/
80
This chapter discusses how grid technologies can be deployed to tackle the problem of building up such framework.
Chapter 8. e-Learning
81
Having these needs in mind, we can design a grid-based solution for leveraging e-learning.
8.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Functional
The main functional requirements for this solution are as follows:
Simplex video broadcasting: When broadcasting a lecture, transmitting a high-quality image of the lecturer along with a blackboard is generally a desired requirement. Although one can argue that such visual information can be substituted by written and graphical content, such as formulas and maps, it has been proven that the audience tends to lose the focus on the presentation more easily when they are unable to associate the content that is presented with the lecturer. On the other hand, such visual contact does not need to be established from the audience to the lecturer, which means that the video streaming goes only one-way.
In other collaborative environments, such as virtual conferences, all the parties may broadcast video signal, but this case is not covered in this example (although all its implications can be easily deduced from the context presented here).
82
Storage capacity: Off-line content also plays a major role in e-learning; this
includes pre-recorded video and/or audio lectures, tutorials, articles, books, and so on. As mentioned before, this requires huge storage capacity that may easily scale to several terabytes.
Unified access to stored content: The system must provide users with a
comprehensive and unified listing of all the content stored in its data-bases. This helps not only the access to such content but also makes storage more efficient, as avoids unnecessary data replications.
Non-functional
The main non-functional requirements for this solution are as follows:
Performance: The system must provide good audio and video quality for the
live lectures, and good network throughput for downloading off-line content. A minimum requirement is a stable flow of at least 50 kilobytes/sec from the lecturer to the audience and 10 kilobytes/sec going the other way.
8.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 8-1 presents the use-cases diagram for this example:
Chapter 8. e-Learning
83
schedule lecture present live lecture store off-line content user authentication subscribe to a live lecture participate in a live lecture m anage content organazation m anage security m onitor and m aintain system assist live lectures
O perator
Professor
Adm inistrator
Student
Users description
These are the various roles involved:
Professor: This is the person responsible for presenting lectures and preparing off-line content. This person is normally a professor, especially for lectures, or one of their assistants. Student: These are persons who attend lectures and download educational
material from the system. They can be anyone interested in getting education from the system, including professors.
Administrator: This is the person responsible for performing the basic management tasks in the system. This role is typically personified by a system analyst or a network analyst.
84
Use-cases description
Here we describe the various use-cases:
User authentication: This use-case represents the procedure which all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation. Schedule lecture: In this use-case, the professor schedules a live lecture
using the systems interface. The process of scheduling a lecture is bound to such restrictions as the availability of broadcasting rooms and the availability of the lecturer and the target audience. Once the lecture is scheduled, it can be published to the target community.
Present the lecture: In this use-case, the professor presents a lecture in a broadcasting room assisted by an operator. Besides presenting the subject to the audience using an ordinary blackboard, the professor may make use of interactive tools for including electronic graphics and text in their presentation. Participate in a live lecture: Depending on the lecture setup and/or the number of students attending it, students can join it by using a personal computer or by going to a classroom where the lecture is being broadcast. Assist live lectures: Operators may assist both professors in broadcasting
rooms and students in classrooms. In either case, the operators role is to assure that the technical infrastructure is correctly set for the lecture session.
Store off-line content: The professors may also store educational material for download by the students. Such material might be recorded live lectures, audio classes, documents, tutorials, etc. The submission itself has to be performed by an interactive interface so that the material is correctly documented and made available to the right audience. Manage content organization: The off-line content has to be organized in
subject-oriented forums and disciplines. The creation of such a structure should be requested from and performed by the administrator, who is the person responsible for maintaining it.
Chapter 8. e-Learning
85
Monitor and maintain system: The administrator is the person responsible for
monitoring and tuning the system so that bottlenecks are detected and eliminated. Additionally, this person has to provide technical support for the users and make available all the tools needed to participate into the community.
86
Figure 8-3 presents the software component architecture used in this implementation.
Collaborative Applications
Basic Communication and Storage Grid Services OGSA
Chapter 8. e-Learning
87
Individual site: This denotes the places from which the students may join the conference using personal computing resources. They might be widely scattered and must provide all the facilities for participating in the lecture. Broadcasting room: This is the place from which the professor broadcasts the
lecture to all the students sitting in classrooms or individual sites. The technical resources are the same as those found in classrooms, with the exception that a video-camera is imperative. An operator assists the professor with the management of such equipment (just like in the classrooms).
Grid portal: This is the portal by which users perform activities such as submitting lectures, subscribing to lectures, and uploading and downloading off-line content.
These are the components for the software component architecture:
Physical layer / Operating System: This is the layer that comprises all the
computers used for building up the e-learning framework as well as the basic operating system they run.
OGSA: This is the layer where the basic grid platform sits. This software is
responsible for providing the grid infrastructure upon which the basic grid services, for high performance storage and communication, are implemented. Thus, it offers the basic tools for open standard communication and storage throughout the grid.
Basic Communication and Storage Grid Services: These are the grid services
that implement the high-level storage and communications functionality. Thus, they offer the collaborative applications a standard interface for storing content and receiving and/or sending streaming audio and video signals.
88
OGSA Toolkit
Globus Toolkit
Grid Services
None
Chapter 8. e-Learning
89
Products IBM Lotus Learning Management System (LMS) Distributed Power Point and Remote Power Point Peer-to-Group Media Broadcast or Kontiki
Chosen product There are a number of collaborative applications that might be set in place depending of the requirements of the specific lectures to be presented. The products listed are for managing off-line content, live-presentations, and collaborative virtual conferences. Grid System Gateway was chosen due to the reliability and technical support availability requirements.
Grid Portal
IBM WebSphere Application Server BEA WebLogic Server Grid System Gateway JBoss Tomcat
It is important to mention that the integration between the independent tools and products chosen in this design was possible due to the Open Grid Services Architecture - Data Access Integration (OGSA-DAI). OGSA-DAI is a project developed by UK Database Task Force, whose objective is to provide a standard interface for a distributed query processing system to access data in different data sources. For more information, refer to the following Web sites:
http://www.ogsa-dai.org.uk http://www.ibm.com/software/data/cm/
Among the collaborative applications adopted is an IBM solution for Learning Management Systems (LMS). With this tool, administrators can realize the management of several tasks related to the e-learning process, besides creating portals to ease the access of information to the users. For more information about Lotus Learning Management System, go to:
http://www.lotus.com/learning
For more information on the other collaborative tools, refer to the following Web sites:
http://www.accessgrid.org/agdp/guide/dppt.html http://scv.bu.edu/accessgrid/seminars/rppt.html
90
http://www-mice.cs.ucl.ac.uk/multimedia/software/nte/
8.4 Implementation
This section presents the implementation level already achieved and the next steps to be followed for a complete deployment.
Implementation status
The implementation level reached so far embraces: 4 educational institutions Over 100 users (among professors and regular students) 2 Mbps of average bandwidth for broadcasting lectures 5 terabytes of storage space for off-line educational content
Next steps
In the next two years, this implementation is expected to scale to: Around a dozen institutions across Europe participating Over 1000 users 5 Mbps of average bandwidth for broadcasting lectures 25 terabytes of storage space for off-line educational content
8.5 Conclusion
In this chapter we presented an example of how grid technologies can be used to build an e-learning framework able to connect a potentially unbounded number of professors and students. In this case, we believe that grid technologies are appealing due to the fact that the requirements for building such framework match very closely what a grid can offer in terms of computational resources. Additionally, a grid offers a much more cost-effective solution, as it employs existing low-cost and non-specialized computing resources to do the job. Finally, we strongly believe that the technological revolution that the grid is about to bring will definitely change the way that people deal with information and, ultimately, knowledge. In this sense, grid computing and e-learning form a perfect match.
Chapter 8. e-Learning
91
92
Chapter 9.
Visualization
In this chapter we discuss the following topic: Grid implementation to support the field of advanced scientific visualization
93
9.1 Introduction
In this example we present a grid implementation to support the field of advanced scientific visualization. The area of visualization is evolving as it addresses emerging and continuing issues, from interactive and batch rendering of terascale data sets, through remote visualization. At the same time, universities in general have a lot of heterogeneity. This is due to using many low-cost resources from different suppliers, running different systems through advanced computing resources such as super computers, advanced visualization systems, etc., with most of them segregated in specific departments for local access only. Note: This example of grid implementation is inspired by the scientific visualization the requirements from the Americas largest campus grid, University of Texas at Austin. More information about this campus grid can be found at the following Web sites:
http://www.tacc.utexas.edu/ http://www.tacc.utexas.edu/projects/grid_vis.php http://www.ibm.com/grid/grid_press/pr_ut.shtml
94
Through a grid portal, users have a single point to submit jobs, to verify status jobs, and to submit input data, as can be seen in Figure 9-1.
Chapter 9. Visualization
95
9.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Functional
To create a single view of entire IT resources, we must use and implement the following grid technologies: A portal to enable a single view of the entire resources: A virtualized view of the computing power devices A virtualized view of the data storage devices A scheduler to manage the distribution of jobs around the resources An engine to manage the distribution of data around the resources Integrate and leverage the use of visualization resources: Forward results to the specified visualization resource Support advanced reservations on the visualization resource
96
Non-functional
The main non-functional requirements for this solution are as follows:
Performance: As far as visualization is concerned, the system should be able to manage the data in a way to minimize the latency between the end of a computation process and the analysis of the results. This normally implies that the system has to provide a persistent data store and a means to replicate that data store or pre-stage data. This provides the capability to place the data close to the consuming application for optimal performance. Scalability: The system must support the ability to add resources, thereby
increasing computing and/or storage capacity without significant loss of performance. The system architecture has to be scalable to support arbitrarily large virtual organizations.
Security: The system has to allow authenticated and authorized users access
to grid resources without requiring them to authenticate on each resource. The system has to prevent unauthorized access to data.
Data integrity: Changes in system data have to be propagated automatically to any replicas within the grid. Availability: The system must be available for job submission in the event of
failure of any of its resources. The system must support the ability to restart failed jobs and run them to completion on a similar resource in the event of a single system failure.
Reliability: The system must be reliable in the correct execution of a model in the event of server and/or disk failures. Also, the system must hold jobs in the queue until notified of successful completion, thus providing the ability to re-start failed jobs. Maintainability: The system must allow the use of heterogeneous hardware
platforms and operating systems. Also, the system must allow the use of geographically distributed resources.
Chapter 9. Visualization
97
9.2.2 Use-cases
This section presents the use-cases that define and illustrates the use of the proposed solution.
Use-cases diagram
Figure 9-2 illustrates the main users and how they interact with the main use-cases.
Logon
Submit Jobs
Grid User
Grid Admin
Users description
These are the various roles involved:
Grid Portal User: This role refers to the researchers who are responsible for performing the tasks of everyday research. Grid Administrators: This is the person responsible for performing
management tasks, such as user account creation and deletion, grid resources management and so on.
98
Use-cases description
The use-cases can be briefly described as follows:
Logon: The user logs on to the grid portal. As precondition, the user has to
have an account and a certificate signed by a trusted CA.
Submit job: The user (after a successful logon) submits a job to the grid
through the grid portal. Verify job status: The user may verify his jobs status; if it is done or not, for example.
Cancel job: The user cancels a specific job. Visualize results: The user, through a visualization tool, visualizes the results. Create user account: The administrator creates the user account, defining the
profile and which resources they are able to access (after a successful logon).
Delete user account: The administrator can also delete a user account when
this becomes necessary.
Manage resources: The administrator performs similar actions on numerous resources. The actions would typically include viewing resource information, adding an additional resource, editing resource information, or deleting a resource.
Chapter 9. Visualization
99
G r id U ser
G r id P o rta l
G R A M
G rl io b T P G d F u s
G S I
G IS
S c h e d u le r
C lu s te r s
100
9.4 Implementation
The solution provided in this chapter is not implemented. Based on users demands and organization resources, the architecture was developed. After implementation, the time needed for users (researchers) to submit and visualize a simulation will decrease significantly.
Next steps
The next steps for this project are its implementation and deployment.
Chapter 9. Visualization
101
9.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies making use of heterogeneous tools and computing resources. Such diversity of building blocks has always imposed major difficulties to researchers and technical staffs that needed to accomplish time-critical and/or time-consuming tasks on a daily basis. In this chapter, we presented a grid-based implementation for integrating computing resources so that the everyday tasks performed in research institutions could be carried out more easily and efficiently. In particular, we analyzed the impact of such implementation over scientific visualization tasks, which traditionally are performed on heterogeneous software and hardware environments. Finally, we are sure that the grid philosophy will greatly affect the way that research is performed in research institutions worldwide.
102
10
Chapter 10.
Microprocessor design
In this chapter we discuss the following topic: A computational grid implementation that helps to reduce the microprocessor design cycle and also allows the design centers to share their resources more efficiently
103
10.1 Introduction
In this example we present a computational grid solution that helps to reduce the microprocessor development cycle and also allows the design centers to share their resources more efficiently. Microprocessor design and microprocessor verification simulation require a massive computational power. Note: A similar solution has been in place for more than 10 years in the Microprocessor Design Group at IBM Austin, TX. They design chips for the IBM Eserver high-performance systems, running thousands of simulations to verify timing closure. More information about this grid solution can be found at the following Web site:
http://www.ibm.com/software/success/cssdb.nsf/CS/BEMY-645U26?OpenDocument&Site=software
104
10.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Functional
To satisfy the requirements described above, here are the recommendations: Store 5-10 terabytes of data on a distributed file system. Achieve peak computing power of approximately 2 teraflops. Provide tools to improve the bug removal rate. Provide tools to reduce human-resources cost needed to submit 100,000+ tests a day.
Non-functional
Non-functional requirements define the value-added goals of the project that are not defined in use-cases such as these:
Robustness: The system should be available for job submission in the event of
failure of any computing resource of the grid.
Security: The system should prevent unauthorized access to data. Scalability: The system should support the ability to add resources, thereby
increasing computing and/or storage capacity. Also, the system architecture should be scalable to support arbitrarily large virtual organizations.
105
10.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 10-1shows the use-case diagram.
Submitter
Simulation
Debugger
Users description
These are the various roles involved:
Submitter : This is the person responsible to define the submission control, the
job requirements, the pass/fail thresholds, and the tests, setting up the parameters and test stimulus. Additionally, they are responsible for the submission of job requests.
Debugger : This is the engineer responsible for analyzing failed jobs to track
bugs and bottlenecks. This person is also responsible for rerunning models.
Use-cases description
The use-cases can be briefly described as follows:
Simulation: The simulation action gets the simulation request, in which there are a number of parameters such as submission control, tests, and model location, and dispatches the request to the available machines for processing.
106
Grid computing
Austin
Burlington
POK
107
Software client
Eclipse
File systems
10.4 Implementation
This implementation involves several physical locations, and its first version was delivered in the middle 80's. It has been evolving, and nowadays there are over 7,000 processors joining this computational grid.
Next steps
These are the next steps for implementation of this project: Add Eclipse plug-ins to enhance usability. Investigate replacing current servers and database with a WebSphere/DB2 implementation. Investigate expanding scope of plug-ins to include other simulation tools.
108
10.5 Conclusion
Nowadays, performing research and development on cutting-edge technologies quite often implies making use of high-performance computing infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the institutions could afford to have. In this chapter, we presented a grid-based implementation for a high-performance computing infrastructure suitable for a highly technological application: the development of microprocessor chips. This infrastructure is able to deliver high-performance computing at a rather lower cost when compared to an equivalent super-computer. We recognize that a number of compute intensive applications might not benefit from this technology, but we are sure that the grid philosophy will greatly affect the way that research and development are performed by industries and research institutions.
109
110
Part 3
Part
Appendixes
This part of the book includes the following appendixes: Appendix A, TeraGrid on page 113 Appendix B, Research oriented grid on page 121
111
112
Appendix A.
TeraGrid
In this appendix we present the following topic: An overview of the TeraGrid project
113
Introduction
This cyber-infrastructure aims to solve the problem of emerging terascale applications. It encompasses computing intensive applications, requiring multiple teraflop computing systems (HPC). Data intensive systems, necessitating creation or mining of multi-terabyte data archives to extract insights (Visualization) and others, must be coupled to scientific instruments, such as microscopes and telescopes (remote instrumentation). Note: TeraGrid is a effort to build the world's largest and fastest grid environment, launched by the NSF in 2001. For more information about TeraGrid, refer to the following Web sites:
http://www.teragrid.org/ http://www.nsf.gov/ http://www.ibm.com/press/PressServletForm.wss?MenuChoice=pressreleases&Templ ateName=ShowPressReleaseTemplate&SelectString=t1.docunid=1137&TableName=Data headApplicationClass&SESSIONKEY=any&WindowTitle=Press+Release&STATUS=publish
The present status of the TeraGrid project is a combination of three programs within the NSF (National Science Foundation) Terascale initiative: Terascale Computing System (TCS), Distributed Terascale Facility (DTF), and Extensible Terascale Facility (ETF). It attempts to create an infrastructure of unbounded capability and scope connecting universities and organizations by a cross-country network backbone, the fastest research networks currently in existence. It enables rapid access to remote resources and allows users to hide latency via aggressive data staging.
Organization
The project currently integrates nine major super computing sites across the US, as seen in Figure A-1.
114
ANL
Purdue
CACR NCSA IU
PSC
Each of these sites contributes with their resources and expertise to create a cyber-infrastructure for scientific research. They are: The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign: It offers 10 teraflops of capability computing through its IBM Linux cluster, which consists of 1,776 Itanium2 processors. In addition to the processing power capability, the NCSA also includes 600 terabytes of secondary storage and 2 petabytes of archival storage capacity. The San Diego Supercomputer Center (SDSC) at the University of California at San Diego: It leads the TeraGrid data and knowledge management effort. It provides a data-intensive IBM Linux cluster based on Itanium processors, that reaches over 4 teraflops and 540 terabytes of network disk storage. In addition, a portion of SDSCs IBM 10-teraflops supercomputer is assigned to the TeraGrid. An IBM HPSS archive currently stores a petabyte of data. Argonne National Laboratory: It provides users with high-resolution rendering and remote visualization capabilities via a 1-teraflop IBM Linux cluster with parallel visualization hardware. The Center for Advanced Computing Research (CACR) at the California Institute of Technology (Caltech): It provides online access to very large scientific data collections in astronomy and high-energy physics. In addition, it provides application expertise in the fields of geophysics and neutron science.
Appendix A. TeraGrid
115
The Pittsburgh Supercomputing Center (PSC): It provides computational power via its 3,000-processor HP AlphaServer system, TCS-1, which offers 6 teraflops of capability coupled uniquely to a 21-node visualization system. It also provides a 128-processor, 512-gigabyte shared-memory HP Marvel system, a 150-terabyte disk cache, and a mass storage system with a capacity of 2.4 petabytes. Oak Ridge National Laboratory (ORNL): In this case it is more of a user than a provider. Their users of neutron science facilities (the High Flux Isotope Reactor and the Spallation Neutron Source) will be able to access TeraGrid resources and services for their data storage, analysis, and simulation. Purdue and Indiana University (IU): They provide 6 teraflops of computing capability, 400 terabytes of data storage capacity, visualization resources, access to life science data sets, and a connection to the Purdue Terrestrial Observatory. The Texas Advanced Computing Center (TACC) at The University of Texas at Austin: It provides a 1024-processor Cray/Dell Xeon-based Linux cluster, a 128-processor Sun E25K terascale visualization machine with 512 gigabytes of shared memory, for a total of 6.75 teraflops of computing/visualization capacity, in addition to a 50 terabyte Sun storage area network. Only half of the cycles produced by these resources are available to TeraGrid users. In addition, there are major companies, like IBM, that also contribute with expertise and products to the project. IBM, specifically, is the most important contributor, as it provides expertise on high performance computing, GPFS, Linux Clusters, and Power4 processors. Through its nine resource sites, the TeraGrid offers advanced computational, visualization, instrumentation, and data resources. Currently the sites are interconnected through a 40 Gbps network backbone at rates from 10 to 30 Gbps. Together, the sites are providing more than 40 teraflops of computing power and more than 1 petabyte of disk accessible data storage. This infrastructure is enabling scientists to work on advanced research, such as: Real time brain mapping Earthquake modeling Molecular dynamics simulation MCell, Monte Carlo simulation of cellular micro physiology Encyclopedia of life
116
Functionality
Super schedulers, MIPCH-G2
Implementation
SRB, MPICH-G2, dsitributed Accounting
Basic Grid Services Authentication, Access Resource Allocation, Resource Information Service
Appendix A. TeraGrid
117
Management
The TeraGrid sites are autonomously managed, but issues such as distributed accounting, authentication, certificates, sign-on and distributed applications management, are managed by the Coordinated TeraGrid Software and Services (CTSS) software. This drives a common user environment across the heterogeneous resources in TeraGrid as well as supporting Grid-based capabilities.
Security
TeraGrid uses a X.509 certificate-based authentication scheme based on the Grid Security Infrastructure (GSI) protocol. The TeraGrid project evaluated both a centralized approach (all users must obtain a TeraGrid authentication certificate from a central Certificate Authority, a.k.a. CA) and an approach that allows the acceptance of certificates from an approved CA. For better scalability, the TeraGrid does not set up a TeraGrid specific CA, but rather to define TeraGrid certificate policy requirements and to accept certificates from CAs that meet these requirements.
Network Infrastructure
The design is based on a 10 Gbps system as a minimum requirement. The main backbone is connected by 40 Gbps rates; it spans from Los Angeles to Chicago. Sites connected to this backbone must follow some rules regarding the network infrastructure. They must have an aggregation switch connected to the border router, where three such channels (10 Gbps) are passed through to the TeraGrid backplane. The aggregation switch and border routers are separate for two main reasons. 1. This separation allows for local configuration changes, outages, or experiments to be done without affecting the operation of the TeraGrid backplane. 2. The requirements for switching and routing network traffic over LANs versus WANs are quite different. Enterprise IP routers (such as are being used for the border routers and internal routers) are designed to handle the necessary buffering and associated requirements for long-delay, high bandwidth wide area networks. LAN switches, on the other hand, are optimized for short-delay, low-latency connectivity as would be expected in a LAN environment. Figure A-3 shows how a site connects to the TeraGrid.
118
(30 Gbps)
Aggregation Switch
Figure A-3 Typical connection between sites and the TeraGrid backplane
Beneficiaries
The TeraGrid infrastructure and its terascale computing system will enable scientists to study drug interactions with cancer cells, and thereby develop better cancer drugs. It will allow them to further study the human genome and how the brain works, and allow scientists to analyze weather data so quickly that they will be able to create real-time weather forecasts that can predict, down to the precise region, where a tornado or other severe storm is likely to hit. It will help engineers design better aircraft by allowing them to do realistic simulations of new designs, and it will help scientists understand the properties of our universe and how it formed. Several institutions across US already are to benefit from the TeraGrid infrastructure, such as: The Center for Imaging Science (CIS) at Johns Hopkins University: It has deployed its shape-based morphometric tools on the TeraGrid to support the Biomedical Informatics Research Network, a National Institutes of Health initiative involving 15 universities and 22 research groups whose work centers on brain imaging of human neurological disorders and associated animal models. For more information, refer to:
http://cis.jhu.edu
Appendix A. TeraGrid
119
California Institute of Technology in Pasadena: It has a project to investigate the efficiency of detecting the decay of the Higgs boson into two energetic photons. The work involves generating, simulating, reconstructing, and analyzing tens of millions of proton-proton collisions. For more information, refer to:
http://cmsinfo.cern.ch/outreach
University of California, San Diego: It has a project to simulate the evolution of the universe, through an adaptive mesh renement code for cosmology simulations (Enzo). Once the code is ported to the TeraGrid, the simulation will be from shortly after the Big Bang, through the formation of gas clouds and galaxies, all the way to the present era. For more information, refer to:
http://casswww.ucsd.edu
University of Illinois, Urbana-Champaign: It has a project that uses massive parallelism on the TeraGrid for major advances in the understanding of membrane proteins. Another project is also harnessing the TeraGrid to attack problems in the mechanisms of bioenergetic proteins, the recognition and regulation of DNA by proteins, the molecular basis of lipid metabolism, and the mechanical properties of cells. For more information, refer to:
http://www.ks.uiuc.edu/#res
How to join
From the technical point of view, there are two issues to be addressed in order to join the TeraGrid infrastructure.
Software perspective
Prospective TeraGrid sites must implement the interfaces specified to use TeraGrid resources or define interfaces, within TeraGrid specifications, to enable others sites to use their resources. The sites are encouraged to use the NMI software release to implement their interfaces and accomplish the TeraGrid needs.
Network perspective
Prospective TeraGrid sites must follow the network architecture defined in Figure A-3 on page 119. This means that sites must have a separate network infrastructure to join the TeraGrid backplane. This is explained in more detail in Network Infrastructure on page 118. A complete document can be found at:
http://www.teragrid.org/about/TeraGrid-Primer-Sept-02.pdf
120
Appendix B.
121
Introduction
This is an example of an institutional research organization that has several Research and Development sites worldwide. Each site has its IT infrastructure in place, which can vary in terms of network topology, server platform, operating systems (such as Linux, Mac OS X, AIX, Windows, and OS/390), directory servers, etc. In the sites, there are a number of researches being developed that pose varying degrees of computing demand, from fast processing through huge storage capacity. As the sites are based on diverse platforms, it is not an easy task to provide comprehensive sharing of resources among them. Grid computing is taken here as the technology that fills this gap, so that sites will no longer be limited to their own capacity to perform their research. The goal of this appendix is to provide architects and technologists in general with meaningful information about applying grid computing technologies to a typical scenario. For this, all necessary steps are presented and discussed in detail. Note: The grid infrastructure presented in this chapter is based on concepts adapted from University of Texas at Austin, where they have multiple different platforms, schedulers, and heterogeneous cluster types. More information about this campus grid can be found at the following Web sites:
http://www.tacc.utexas.edu/ http://www.tacc.utexas.edu/projects/grid_vis.php http://www.ibm.com/grid/grid_press/pr_ut.shtml
The next sections cover the following aspects of the grid architecture: In Business requirements on page 122, we describe the business context in which the problem arises. In High level design on page 124, we present the design of the solution that fulfills the requirements. In Products used on page 131, we present the list of products used in this example.
Business requirements
It is the goal of the institution or company to integrate the numerous divisions and resources within the organization to share a common infrastructure for research and computation. The grid to be designed will unify and simplify the usage of the
122
diverse computational, storage, visualization, data, and instrument resources of the organization to facilitate new, powerful paradigms for research and development. This will include resources from the following hypothetical research centers: Tokyo Research Lab Singapore Research Lab Lisbon Research Lab Paris Research Lab Nowadays, each lab works in a quite independent fashion: they have their own budget, their own technical staff, and their own computational infrastructure that is managed by local administrators and is used in the research activities that take place locally. For these sites to be fully integrated, there must be implemented a grid platform capable of virtualizing both storage capacity and computing power from highly scattered computing resources. Some non-functional requirements of such a solution, such as scalability and performance, are discussed in the next section.
Non-functional requirements
This section describes the non-functional technical requirements for the solution.
Scalability: The system must be able to unlimited growth without significant loss of performance. Availability: The system must be available on a 24/7 basis, even when individual resources are unavailable. Reliability: The system must not become unavailable when one or more resources are out-of-work, no matter the reason. Maintainability: The system should be fully manageable in a scalable way,
meaning that its growth doesnt imply a proportional growth on its management complexity.
Security: The system should provide services for user authentication and authorization as well as secure information exchange between two computing resources.
123
Current status
In the current environment, the users have made use of the following infrastructure: IBM Eserver pSeries 690 cluster based on IBM POWER4 running IBM AIX IBM Eserver pSeries 655 nodes based on IBM POWER4+ running IBM AIX Intel Xeon based nodes running Linux operating system Intel Pentium III nodes running Linux operating systems Figure B-1 illustrates the different clusters that will be pulled together to form the grid environment.
Future B
Singapore
Tokyo
Virtual Environment
Future C
124
G rid U sers
F irew all
G rid P ortal
125
Having to manage all the clusters and schedulers simultaneously, we need a common mechanism to coordinate job submissions to all schedulers. This is where the meta-scheduler comes in. In this case, the portal and the nodes will interact with a meta-scheduler, that retrieves information about the resources requirement for each job using the information providers bundled with the Globus Toolkit, that has been adopted in this solution. Figure B-4 illustrates how the information are pulled from the a cluster and queried by the meta-scheduler.
PBS Cluster
Index Service
Condor
GPIR
126
The Community Scheduler Framework (CSF) is an open source add-on donated by Platform Computing to the Globus Toolkit Version 3.0 for the development of community schedulers. Community schedulers are commonly referred to as meta-schedulers, which are designed to accept user requests to run jobs, and map them to the resources. Community Scheduler Framework provides an intelligent, policy based meta-scheduling for building grids where there are multiple types of job schedulers involved. It is also often used in the environment, in preparation for growth. CSF is a grid meta-scheduling middleware solution that continues to provide local control over how the resources are shared, while providing transparency and interoperability among the various existing job schedulers in the research centers. It is an OGSI-compliant scheduling framework compatible with the Globus Toolkit version 3.0. Figure B-5 below shows the design and the interface between the various schedulers used in the research organization.
G r id U s e r P o r ta l
J o b S e r v ic e
G l o b u s T o o l k i t
CSF
Q u e u e S e r v ic e
LL
PBS
C ondor
In d e x S e r v ic e
LL
PBS
C ondor
127
Apart from the broader availability of computing resources, such a solution might also leverage the way that researchers deal with their everyday computing activities: In most cases, the researchers do not just run a single job, they have a sequence of jobs to run to complete their research. Usually, what they do is wait for the first job to complete, then submit the second job, and so on. By having a workflow of the jobs to be run, researchers are free to submit the job only once. The workflow will then execute the jobs in sequence or according to the workflow created. As part of this solution, Gridports job sequencer is used, where researchers can create a workflow of the different jobs to run. In this solution, the job sequencer is a portlet within Gridport that creates and manages sequences of tasks to be submitted. A sample sequence could consists of job submission, file transfer, and resubmit the job to another resource. GSI authentication is used by the sequencer to execute the job submitted by the user. Figure B-6 illustrates the job sequencer.
Portal
Grid Port
LL Cluster
PBS Cluster
Condor
128
Putting this together, Figure B-7 illustrates the different components in the design for this organization. The grid portal interacts with the Community Scheduler Framework to submit, manage, and retrieve job information. The CSF interacts with MMJFS within Globus Toolkit to query the clusters in the grid infrastructure for the load information. The users can also build a workflow of a set of chained jobs to be executed via grid port, using a job sequencer.
GPIR
GRAM LL
Index Service
Index
Service
Provider
RM Scheduler Plug-in RM RM
Queue
User experience
By placing the clusters under the management of a grid portal, it is imperative that the user feels comfortable using it, as that implies that the users are no longer allowed to access all the servers to run the jobs. The scheduler will be running the job on the users behalf via the grid security proxy. With all the multiple components that we mentioned above, the workflow that a user follows is as illustrated in Figure B-8.
129
Access Portal
Terminate
Create Job
Submit Job
Grid Select
Appropriate Resource Y
Successful
130
Products used
The grid infrastructure is based on multiple products. The next list summarizes this set. Globus Globus Toolkit 3.0 is used as the middleware for the portal and other components to retrieve resource information. GRAM is used to retrieved the resource allocation to the different cluster within the infrastructure. The GSI component within Globus is used for the single sign on infrastructure. Portal The portal could be implemented by products such as IBM WebSphere or Tomcat. It will be interfacing with Globus to retrieve information about their grid environment and manage and use them interactively. GridPort GridPort, which is a software package based on JBOSS, designed to aid in the development of science portals and application interfaces on a computational grid, is used in the solution. Actually, two of its components were used in this solution: GridPort Information Repository (GPIR) and Job Sequencer. GPIR caches grid and portal related data (for example, those captured by MDS) and make them available by standard Web services. In case the data are not found, or newer data are required, GPIR will retrieved it instantly. Job sequencer within GridPort allows a sequence of jobs to be created so that users do not need to resubmit the job after each job is completed. Community Scheduler Framework (CSF) Community Scheduler Framework is used as the meta-scheduler to interface with the different types of scheduler used by the different clusters. It also interfaces with GridPort and Globus to retrieve grid resources information.
Conclusion
In this appendix, we presented a brief overview of a grid-based software implementation that aimed at integrating scattered computing resources from regional labs of a world-wide corporation.
131
132
Glossary
AFS. Andrew file system (AFS) is a distributed networked file system developed by Carnegie Mellon University as part of their Andrew Project. It is named for Andrew Carnegie and Andrew Mellon. Its primary use is in distributed computing. BLAST. Basic Local Alignment Search Tool is a guide that provides a methods for rapid searching of homology search like search of nucleotide and protein database BRICK. Emerging business countries; Brazil, Russia, India, China and Korea. CA. A Certificate Authority is (1) an instance or external institute to issue authority certificates to identify the certificate holder to use certain services; (2) in e-commerce, an organization that issues certificates, authenticates the certificate owner's identity and the services that the owner is authorized to use, issues new certificates, renews existing certificates, and revokes certificates belonging to users who no longer exist. CADD. Computer-Aided Drug Discovery CADe. Computer Aided Detection are systems created to aid doctors in the task of disease diagnostics CLI. Command Line Interface. DEISA. A consortium of leading national supercomputing centers in Europe DFS. Distributed File System
DNS The Domain Name System is a system that stores information about host names and domain names on networks, such as the Internet.
DPPT. Distributed PowerPoint application provides a mechanism by which a presenter can control a Microsoft PowerPoint presentation on multiple sites from a single machine. FC/IP. Fibre Channel over IP is an Internet Protocol -based storage networking technology developed by the Internet Engineering Task Force. FC/IP mechanisms enable the transmission of Fibre Channel information by tunneling data between storage area network (SAN) facilities over IP networks GFS. Global File System. It provides a system of storing files on a computer. It functions as a shared-storage journaled cluster file system.
133
GGF. The Global Grid Forum was founded in 2001 when the merger of regional grid organizations created a single worldwide one. Globus. A collaborative project centered at Argonne National Laboratory that is focused on enabling the application of grid concepts to computing. GPFS. General Parallel File System is a type of mountable networked file systems. GridFTP. A high-performance, secure, robust data transfer mechanism GSI. The Grid Security Infrastructure contains components to secure your grid network. H.323. H.323 is a recommendation from the ITU-T, that defines the protocols to provide audio-visual communication sessions on any packet networks.
134
OGSA. The Open Grid Services Architecture is a standard setting the base for communication in grids across virtual organizations. OGSA marries open standards and grid computing protocols with Web Services, bringing together the ability to share computing resources with the ability to provide application interoperability over the Internet. OGSA-DAI. Open Grid Services Architecture Data Access Integration, project developed by UK Database Task Force whose objective is to provide a standard interface for a distributed query processing system to access data in different databases. P2G. Peer to Group. One of the topologies enhanced from the Peer-to-Peer network. The Group is dynamically formed with logical or topographical maps. PACI. Partnerships for an Advanced Computational Infrastructure is a program of the National Science Foundation's Directorate for Computer and Information Science and Engineering Particle accelerator. A particle accelerator is a scientific equipment that uses electric fields to propel charged particles to great energies. Everyday applications are found in TV sets and X-ray generators.
QoS. Quality of Service, a term used in a Service Level Agreement that denotes a guaranteed level of performance (for example, response times less than 1 second). QSAR. Quantitative Structure Activity Relationship are mathematical models that represent the relationship between a given property and the structural attributes of a chemical compound RPPT. Remote PowerPoint application, or RPPT, provides a mechanism by which a presenter can control a Microsoft PowerPoint presentation on multiple sites from a single machine. SAN. A Storage Area Network is a high-speed special-purpose network (or subnetwork) that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users.
Glossary
135
TeraGrid. TeraGrid is a multi-year effort to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research. The TeraGrid project was launched by the National Science Foundation in August 2001 with $53 million in funding to four sites: the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign, the San Diego Supercomputer Center (SDSC) at the University of California, San Diego, Argonne National Laboratory in Argonne, IL, and Center for Advanced Computing Research (CACR) at the California Institute of Technology in Pasadena. Virtual Organization. A virtual entity whose users and servers are geographically apart but share their resources collectively as a larger grid. The users of the grid can be organized dynamically into a number of virtual organizations, each with different policy requirements. VPN. Virtual Private Network, a network that is constructed by using public wires to connect nodes, using encryption and other security mechanisms to ensure that only authorized users can access the network and that the data cannot be intercepted. WAN. Wide Area Network is a computer network covering a wide geographical area Web Services. A way of providing computational capabilities using standard Internet protocols and architectural elements. X.509. In cryptography, X.509 is a standard for public key infrastructure. X.509 specifies, amongst other things, standard formats for public key certificates and a certification path validation algorithm. XML. Extensible Markup Language is a W3C recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet.
Xpath. XML Path Language is a terse (non-XML) syntax for addressing portions of an XML document.
136
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see How to get IBM Redbooks on page 143. Note that some of the documents referenced here may be available in softcopy only. A CICS-to-Linux Grid Implementation, REDP-3758-00 An Information Grid Proof of Concept using Avaki Data Grid Software, REDP-3853-00 Configure Grid Security in the IBM Grid Toolbox using the Globus Certificate Service, TIPS0409 Enabling Applications for Grid Computing with Globus, SG24-6936-00 Fundamentals of Grid Computing, REDP-3613-00 Globus Toolkit 3.0 Quick Start, REDP-3697-00 GPFS: A Parallel File System, SG24-5165-00 Grid Computing with the IBM Grid Toolbox, SG24-6332-00 Grid Services Programming and Application Enablement, SG24-6100-00 Introduction to Grid Computing with Globus, SG24-6895-01
Other publications
These publications are also relevant as further information sources: Ahmar Abbas - Grid Computing: A Practical Guide to Technology and Applications - Charles River Media - 2004 - ISBN 1584502762 Alistair Cockburn - Writing Effective Use Cases - Addison-Wesley - 2001 ISBN 0201702258 Fran Berman (Editor), Geoffrey Fox (Editor), Anthony J.G. Hey (Editor) - Grid Computing: Making The Global Infrastructure a Reality - Wiley Seires in Communications, Networking & Distributed System - 2003 - ISBN 0470853190
137
GPFS: A Shared-Disk File System for Large Computing Clusters by Frank Schmuck, Roger Haskin in Proceedings of the Conference on File and Storage Technologies, 2002 Ian Foster, Carl Kesselman (editors), et altrii - The Grid 2: Blueprint for a New Computing Infrastructure - Elsevier Science - 2004 - ISBN 1558609334 IBM involvement in DEISA by Luigi Brochard in IBM Deep Computing, 2004 Jack Dongarra (Editor), Ian Foster, Geoffrey Fox (Editor), Ken Kennedy, Andy White, Linda Torczon, William Gropp (Editor) - The Sourcebook of Parallel Computing - Elsevier Science - 2003 - ISBN 1558608710 Joshy Joseph, Craig Fellenstein - Grid Computing (On Demand Series) - IBM Press - ISBN 0131456601 Linux on zSeries by Richard Seeley in Z JOURNAL, April/May 2003
Online resources
These Web sites and URLs are also relevant as further information sources: AccessGrid
http://www.accessgrid.org
Audience Penetration
http://www.mediainfocenter.org/compare/penetration/
ChinaGrid
http://www.chinagrid.edu.cn
DICOM
http://medical.nema.org
eDiamond Project
http://www.ediamond.ox.ac.uk/index.html
138
GridPort
http://gridport.net
INTERNET GROWTH
http://www.internetworldstats.com/emarketing.htm
OASIS
http://oasis-open.org
OpenH323 Project
http://www.openh323.org/
Related publications
139
AccessGrid Description
http://www.csm.ornl.gov/~bernhold/tcf/ag-info.html
ActiveSpaces on the Grid: The Construction of Advanced Visualization and interaction Environments
http://www-unix.mcs.anl.gov/fl/publications/activespaces-pdc.pdf
140
Everything you always wanted to know about the Grid and never dared to ask
http://www.grid2002.org/pgclasssummer03/PGGridPart2jul03.ppt
How To Install and Configure AG 2.1 on a Single Machine Node (PIG), for Windows
http://www.accessgrid.org/agdp/howto/ag2-0-install-pig/1.3/html/book1.html
Human Factors
http://charlie.dgrc.crc.ca/cgi-bin/Sylvie/Blog/casarch.pl?2004/00/23/9.txt
Introduction to Grid computing and overview of the EU Data GridProject Data Grid - 2004
http://www.twgrid.org/event/isgc2003/ISGC_pdf/The_Architecture_of_EDG.pdf
Related publications
141
Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects
http://www-ed.fnal.gov/uueo/documents/Grid_Report_04.pdf
QuarkNet Cosmic Ray Studies & the Grid: Probing Extensive Showers
http://www.opensciencegrid.org/events/meetings/NSF-OSG-091703/Bardeen-Quark netGrid.pdf
Tutorial 1: How to Build and Install an Access Grid Node (AGN) An Elementary Guide for Technical Users
http://www.apan.net/home/training/ag/Tutorial1/T1.htm
Videoconferencing update
http://hepwww.rl.ac.uk/sysman/july2004/talks/hepsysman-2004-07-videoconf. ppt
142
Related publications
143
144
Index
A
Access Grid 80 AFS 64, 74, 108 Andrew File System see AFS Apache Tomcat 131 Architectural decisions and product selection Big Science 74 Drugs Discovery 63 e-learning 89 Medical Images 54 Microprocessor Design 108 Scientific Simulation 44 Visualization 101 authentication 128 Avaki DataGrid 64, 74 CAD 11 CADD 5758, 65 CADe 51 Certificate Authority see CA cluster 124 collaborative learning 8 Component model description Big Science 74 Drugs Discovery 62 e-learning 88 Medical Images 52 Microprocessor Design 107 Scientific Simulation 43 Visualization 101 Component model diagram Big Science 72 Drugs Discovery 62 e-learning 86 Medical Images 52 Microprocessor Design 107 Scientific Simulation 42 Visualization 100 computational grids 5, 19, 69 Computer Aided Design see CAD Computer Aided Detection see CADe Computer-Aided Drug Discovery see CADD content manager 53 CSF 127, 129, 131
B
Basic Local Alignment Search Tool see BLAST BEA WebLogic Server 90 BLAST 45 Breast cancer 48 Business context Big Science 68 Drugs Discovery 58 e-learning 80 Medical Images 48 Microprocessor Design 104 Scientific Simulation 38 Visualization 94 Business needs Big Science 69 Drugs Discovery 58 e-learning 81 Medical Images 48 Microprocessor Design 104 Scientific Simulation 39 Visualization 96
D
data grids 6, 10, 19 data-mining 9, 56 DEISA 68 delivery grids see network grids Deploying code deployment 25 data deployment 25 DFS 108
C
CA 99, 118
145
DHCP 24 DICOM 52 Digital Imaging and Communications in Medicine see DICOM Distributed File System see DFS Distributed Power Point 90 DNS 24
I
IBM AIX 54, 74, 108, 122, 124 IBM DB2 54, 89 IBM DB2 Content Manager 5455, 89 IBM eServer 45 IBM eServer pSeries 124 IBM eServer pSeries 655 124 IBM GPFS 34, 64, 7475, 116 IBM Grid Toolbox 54, 101 IBM LoadLeveller 108 IBM Lotus Learning Management System 90 IBM pSeries 54 IBM Tivoli Workload Scheduler 44 IBM Visual Age C++ 55 IBM WebSphere Application Server 44, 54, 90, 131 IBM WebSphere Portal 101 IBM xSeries 45, 54 IEEE 75 image recognition 52 Implementation Big Science 76 Drug Discovery 64 e-learning 91 Medical Images 55 Microprocessor Design 108 Scientific Simulation 44 Visualization 101 Intel Pentium III 124 Intel Xeon 124
E
Eclipse 55, 108 eDiaMoND 48
F
Federated database 53 Firefox 44 firewalls 24, 70
G
GFS 6364, 72, 74 Global File System see GFS Globus Toolkit 4445, 54, 64, 74, 89, 126, 129 version 3 54, 74, 127, 131 GRAM 131 Grid architecture logical architecture 22 physical architecture 22 Grid implementation Approaches Bottom-up implementation 17 Incremental growing 17 Hardware requirements 19 Human-resource requirements 21 Software requirements 20 Grid Port 128 Grid Portal 10, 88, 101 grid portal 129 Grid Security Infrastructure see GSI grid services 11, 7475, 88 Grid System Gateway 44, 90 GridPort 131 GSI 118, 128, 131
J
Java 108 JBoss 131 JBoss Application Server 54, 90 JFS 75 job submission 43, 97, 107 Journaling File System see JFS
K
Kontiki 90
H
High-Performance Computing
L
Learning Management System
146
see LMS legacy systems 2627 Linux 54, 72, 74, 115116, 122, 124 LMS 90
M
Mac OS X 122 Maintaining grids Grid application 28 Grid platform 27 mammography 4849 massive data-storage systems see data grids MDS 131 metadata 53, 75 meta-scheduler 126, 131 Microsoft Internet Explorer 44 MMJFS 129 Mozilla 44 MySQL 54
N
Netscape Navigator 44 Network File System see NFS network grids 6, 10 see communication grids NFS 75
O
OGSA 4344, 5355, 7475, 8889, 101 OGSA-DAI 53, 55, 90 OGSI 127 on demand 20 Open Grid Services Architecture see OGSA Open Grid Services Architecture - Data Access Integration see OGSA-DAI open source 127 Oracle 54 OS/390 122
P
Peer-to-Group Media Broadcast 90 peer-to-peer 8 Platform LSF 4445
Platform LSF Multicluster 101 Portable Operating System Interface see POSIX portlet 128 POSIX 75 PostgreSQL 54 Products AFS 64, 74, 108 Apache Tomcat 131 Avaki DataGrid 64, 74 BEA WebLogic Server 90 BLAST 45 Distributed Power Point 90 Eclipse 55, 108 Firefox 44 Globus Toolkit 4445, 54, 64, 74, 89 Grid System Gateway 44, 90 GridPort 131 IBM AIX 54, 74, 108 IBM DB2 54, 89 IBM DB2 Content Manager 5455, 89 IBM eServer 45 IBM eServer pSeries 124 IBM eServer pSeries 655 124 IBM GPFS 34, 64, 7475, 116 IBM Grid Toolbox 54, 101 IBM LoadLeveller 108 IBM Lotus Learning Management System 90 IBM pSeries 54 IBM Tivoli Workload Scheduler 44 IBM Visual Age C++ 55 IBM WebSphere Application Server 44, 54, 90 IBM WebSphere Portal 101 IBM xSeries 45, 54 Intel Pentium III 124 Intel Xeon 124 Java 108 JBoss 131 JBoss Application Server 54, 90 Linux 54, 72, 74, 115116 Microsoft Internet Explorer 44 Microsoft Windows 122 Mozilla 44 MySQL 54 Netscape Navigator 44 NFS 75 Oracle 54 Platform LSF 4445 Platform LSF Multicluster 101
Index
147
PostgreSQL 54 RedHat Linux 45, 89 Remote Power Point 90 Scheduling CSF 127, 129, 131 Tomcat 44, 90 Web Services Core/Hosting IBM WebSphere Application Server 131
S
Service Oriented Architecture see SOA SIMD 25 single sign-on 131 single-instruction-multiple-data see SIMD SOA 53 Streaming 82, 88 streaming applications 10 streaming servers 11
Q
QoS 10 QSAR 59, 64 quality of service see QoS
T
Tomcat 44, 90 Types of grids computational grids 5 data grids 6 network grids 6
R
Redbooks Web site 143 Contact us xxiii RedHat Linux 45, 89 Remote Instrumentation 114 Remote Power Point 90 Requirements Big Science 70 Drugs Discovery 59 e-learning 82 Medical Images 49 Microprocessor Design 105 Scientific Simulation 39 Visualization 96 Research Areas Astronautics and aerospace 9 Astrophysics 9 Automotive 9 Biology and genetics 9 Chemistry 9 Distance Education 8 Earthquake Modeling 116 Economics analysis 9 Encyclopedia of Life 116 Environmental studies 9 High energy physics 9 Higher Education 8 Life Science 8 Materials 9 Medicine imagery 9 Molecular Dynamics simulation 116 Real Time Brain Mapping 116 Resource balancing 8
U
Use-cases Big Science 70 Drugs Discovery 60 e-learning 83 Medical Images 50 Microprocessor Design 106 Scientific Simulation 40 Visualization 98
V
virtual organization 124 Virtual Organizations see VO Virtual Private Network see VPN Virtual resources 8 VO 8, 96, 117 see virtual organization VPN 24
W
workload management 43, 107
X
X.509 118 XML 55
148
Xpath 55
Index
149
150
Back cover
BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.