SG 246649

Front cover
Grid Computing in g Research and Education

Grid in Research Institutions Grid in Universities Examples
Luis Ferreira, Fabiano Lucchese Tomoari Yasuda, Chin Yau Lee Carlos Alexandre Queiroz Elton Minetto, Antonio Mungioli
ibm.com/redbooks
International Technical Support Organization Grid Computing in Research and Education April 2005
SG24-6649-00
Note: Before using this information and the product it supports, read the information in Notices on page xiii.
First Edition (April 2005) This edition applies to the capability of the IBM, ISVs, and open source products used to build a grid computing solution.
Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Part 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to grid concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Beginning of the grid concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Research and education on grid context. . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Why use grids in research and education? . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Leveraging research activities with grids . . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 Leveraging educational activities with grids . . . . . . . . . . . . . . . . . . . 10 1.3 What will the future bring? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 What exists today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 What is the potential for grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.3 What is likely to happen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2. How to implement a grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 The main difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Basic requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Human-resource requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Setting up grid environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Defining the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Hardware setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Software setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Setting up grid applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Copyright IBM Corp. 2005. All rights reserved.
iii
2.4.1 Deploying an application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Making application data available . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Maintaining grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.1 Grid platform administration tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Grid application administration tasks . . . . . . . . . . . . . . . . . . . . . . . . 28 Part 2. Grid by examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 3. Introducing the examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 What you will find in these chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 4. Scientific simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 44 4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter 5. Medical images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 54 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Chapter 6. Computer-Aided Drug Discovery . . . . . . . . . . . . . . . . . . . . . . . 57 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
iv
Grid Computing in Research and Education
6.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 63 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 7. Big Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 74 7.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Chapter 8. e-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . 89 8.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 9. Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.1.2 Business needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents
9.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . . 101 9.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 10. Microprocessor design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.1.1 Business context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.1.2 Business needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.3 Case design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.1 Component model diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.2 Component model description . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.3.3 Architectural decisions and product selection . . . . . . . . . . . . . . . . 108 10.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 10.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Part 3. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Appendix A. TeraGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Beneficiaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 How to join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Appendix B. Research oriented grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Business requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 High level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Products used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
vi
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Contents
vii
viii
Figures
1-1 2-1 4-1 4-2 5-1 5-2 6-1 6-2 7-1 7-2 7-3 8-1 8-2 8-3 9-1 9-2 9-3 10-1 10-2 A-1 A-2 A-3 B-1 B-2 B-3 B-4 B-5 B-6 B-7 B-8 Heterogeneous and independent computing resources . . . . . . . . . . . . . 5 How a grid should expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Software component architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Use-cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 e-learning framework schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Software components architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A user s point of view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Diagram model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Component model diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 TeraGrid overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Layers diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Typical connection between sites and the TeraGrid backplane. . . . . . 119 Virtual environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Virtualization organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 High level component diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Globus Toolkit and meta-scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Submitting a job through Community Scheduler Framework. . . . . . . . 127 Job sequencer and gridport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Overall architecture diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Workflow of a research working on the grid . . . . . . . . . . . . . . . . . . . . . 130
ix
Tables
1-1 3-1 4-1 4-2 5-1 6-1 7-1 8-1 9-1 10-1 Types of grid that drive the grid solution for each area (shaded cells) . . 9 Examples of grid computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 54 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 63 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 74 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . . 89 Architectural decisions and product selection . . . . . . . . . . . . . . . . . . . 101 A typical product selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
xi
xii
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
xiii
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AFS AIX DB2 DFS Eserver Eserver eServer ibm.com IBM Lotus OS/2 OS/390 POWER4 pSeries Redbooks Redbooks (logo) TCS Tivoli WebSphere xSeries zSeries
The following terms are trademarks of other companies: Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others.
xiv
Preface
This IBM Redbook, Grid Computing in Research and Education, belongs to a series of documents related to grid computing that IBM is presenting to the community to enrich the IT industry and all its players: customers, industry leaders, emerging enterprises, universities, and producers of technology. The book is mainly oriented to IT architects or those who have the responsibility of analyzing the capabilities to build in a grid solution. The book is organized into the following parts.
Part 1, Introduction on page 1

In this part of the book we present the basics about what, why, and how the grid concept can be applied to the research and education fields. When going through this part, the reader can expect to acquire a concise yet comprehensive view of how researchers, professors, teachers, and R&D professionals in general might benefit from this brand-new field of computing industry.
Part 2, Grid by examples on page 31

In this part of the book we present a collection of examples based of real-world grid implementations that have been accomplished in the research and educational world. They aim to show the multiple aspects of such implementations and illustrate the concepts formerly presented in more concrete terms. Here is a list of the examples:
Scientific simulation on page 37

Presents a computational grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology.
Medical images on page 47

Presents a joint use of data grid and computational grid in a medical-image storage and processing framework.
Computer-Aided Drug Discovery on page 57

Presents a computational grid implementation to support the area of CADD (Computer-Aided Drug Discovery).
xv
Big Science on page 67

Presents an implementation of a data and computational grid to support government-sponsored laboratory projects (also known as big science).
e-Learning on page 79
Presents a network grid implementation supporting an e-learning infrastructure that embraces many of the requirements for exchanging information in the educational and research fields.
Visualization on page 93
Presents a grid implementation to support the field of advanced scientific visualization.
Microprocessor design on page 103

Presents a computational grid implementation that helps to reduce the microprocessor design cycle and also allows the design centers to share their resources more efficiently.
Part 3, Appendixes on page 111

Describes the Teragrid project, a cyber-infrastructure that aims to solve the problem of emerging terascale applications. Also provides a hypothetical example of a research oriented grid involving multiple schedulers and multiple different components and services.
The team that wrote this redbook

This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center. Luis Ferreira, also known as Luix, is a Senior Software Engineer at the International Technical Support Organization, Austin Center, working on Linux and grid computing projects. He has 20 years of experience with UNIX-like operating systems in design, architecture, and implementation, and holds a Master of Science degree in Systems Engineering from Universidade Federal do Rio de Janeiro in Brazil. Before joining the ITSO, Luis worked at Tivoli Systems as a Certified Tivoli Consultant, at IBM Brazil as a Certified IT Specialist, and at Cobra Computadores as a kernel developer and operating systems designer.
xvi
Fabiano Lucchese is the business director of Sparsi Computing in Grid (http://www.sparsi.com) and works as a grid computing consultant in a number of nation-wide projects. In 1994, Fabiano was admitted to the Computer Engineering undergraduate course of the State University of Campinas, Brazil, and in mid-1997, he moved to France to finish his undergraduate studies at the Central School of Lyon. Also in France, he pursued graduate-level studies in Industrial Automation. Back in Brazil, he joined Unisoma Mathematics for Productivity, where he worked as a software engineer on the development of image processing and optimization systems. From 2000 to 2002, he joined the Faculty of Electrical and Computer Engineering of the State University of Campinas as a graduate student and acquired a Master of Science degree in Computer Engineering for developing a task scheduling algorithm for balancing processing loads on heterogeneous grids. Fabiano has also taken part in the publishing of the IBM Redbook, Grid Services Programming and Application Enablement, SG24-6100-00. Tomoari Yasuda is an IBM Certified IT Specialist for Distributed Computing in IBM Japan. After getting a Master's degree in Mechanical Engineering at the graduate school of Keio University, he joined IBM and worked for digital media customers in Japan for 3 years as a consultant and a developer with the WebSphere family. He has a deep knowledge of the digital media industry. Since then, he has focused on offering new solutions to several cross-industry customers. In 2004, he was certified in IBM Grid Computing Technical Sales, and has been in charge of technical sales support for grid computing. Chin Yau Lee works as an Advisory Technical Specialist in grid computing for IBM ASEAN/South Asia. He holds an Honours degree in Computing and Information System from the University of Staffordshire. He has been using Linux since 1996 and had a few years of experience as a UNIX and Linux engineer before joining IBM. His areas of expertise includes High Performance Linux and UNIX, UNIX Systems Administration, High Availability solutions, Internet based solutions, and grid computing architectures, which he has been actively working on for the last 4 years. He is also an IBM Certified Advanced Technical Expert on AIX, a Sun Certified System/Network Administrator, and a Red Hat Certified Engineer. He is also a co-author of the IBM Redbook, Deploying Linux on IBM eServer pSeries clusters, SG24-7014-00. Carlos Alexandre Queiroz is an independent consultant working for Alex Microsystems. He has been working with grid computing, JINI, and J2EE technologies since 2000. Currently, he is earning a Master's degree at Universidade de So Paulo as a Distributed Systems and Network Specialist. He has published articles at several congresses, such as middleware2003, SBRC, grid computing, and parallel applications events. Carlos is an active developer of the Web site, http://gsd.ime.usp.br/integrade.
Preface
xvii
Elton Minetto is a professor at Universidade Comunitria Regional de Chapec, Brazil, teaching programming, networking, and operational systems courses. He also works as a System Analyst and Network Administrator in the same institution, supporting Linux, Oracle, PHP, Java, and Python. Elton holds a Bachelors degree in Computer Science by Universidade Comunitria Regional de Chapec, and a Latus Sensus Graduation degree in Computer Sciences by UNOESC/UFSC, Brazil. Elton is an active member of the open software community, collaborating on various projects. Antonio Saverio Rincon Mungioli is an electrical engineer and professor at Escola de Engenharia Mau, Sao Paulo, Brazil. He also works as a System Analyst in the computing center at Universidade de So Paulo, and as a Technical Consultant of the IBM Business Partners in Brazil. Antonio holds a Master of Science degree by Escola Politcnica of Universidade de So Paulo, Brazil.
Acknowledgements
Thanks to the following people for their contributions:
Joanne Luedtke, Lupe Brown, Cheryl Pecchia, Arzu Gucer, Chris Blatchley, Wade Wallace, Ella Buslovich, Yvonne Lyon International Technical Support Organization, IBM Tony White Worldwide Grid Computing Technical Sales Business Unit Executive, IBM Ronald Watkins Worldwide Grid Computing Business Development Executive, Public Sector, IBM Chris McMahon Americas Sales Executive, Grid Computing, Higher Education, IBM Dr. Martin F. Maldonado Sr. Technical Architect, Grid Computing, Higher Education and Research, IBM Joe Catani Grid Computing in Higher Education, Public Sector, IBM Lori Southworth Market Manager, Education Industry, IBM Al Hamid Executive IT Architect and STSM, Grid/OSS Worldwide Leader, BCS, IBM Chris Reech, Jeff Mausolf IBM Global Services / e-Technology Center, Grid Computing Initiative, IBM
xviii
Nina Wilner Grid Technology - IT Technical Architect LifeSciences, IBM Elizabeth B Davis Education Client Representative, IBM Wolfgang Roesner Verification Tools, eCLipz Verification, IBM John Reysa Processor Simulation and Infrastructure, IBM Ross Aiken HPC Technical Solutions Architect, IBM Nam Keung Senior Technical Consultant, IBM Lee B Wilson Technical Sales Specialist, IBM Takanori Seki Distinguished Engineer, IBM Japan Ryuhichi Nakata ICP TS - Higher Education Industry, IBM Japan Hideyuki Yokoyama EBO Support Technical Competency, IBM Japan Shu Shimizu Tokyo Research Laboratory, IBM Japan Naritoh Yamada, Michitaka Kamimura LifeSciences, IBM Japan Yoshihiko Itoh GEO Sales Lead, Grid Business AP, IBM Japan Fumiki Negishi Grid Computing Business, IBM Japan Stephen Chu Grid Computing Executive, IBM China Al Min Zhu University Relations, IBM China Jian Jiong Zhuang IBM China
Preface
xix
Jing Hui Li IBM China Li Yang Zhou Grid Computing, IBM China Linda Lin IT Architect, IBM China Jean-Yves Girard Grid Computing Specialist, IBM France Yann Guerin EMEA Grid Computing TSM, IBM France Sebastien Fibra IT Specialist, IBM France Jean-Pierre Prost EMEA Design Center for on demand business, IBM France Dr. Luigi Brochard Distinguished Engineer, IBM Deep Computing, IBM France Mariano Batista IT Architect, IBM Argentina Ruth Harada Alliances Manager, IBM Brasil Katia Pessanha Universities Alliances Manager, IBM Brasil Jose Carlos Duarte Goncalves Executive IT Architect, IBM Brasil Joao Marques dos Santos Account manager, Public Sector, IBM Brasil Luiz Roberto Rocha Grid Computing Technical Sales, IBM Brasil Joao Almeida IT Specialist, IBM Portugal Srikrishnan Sundararajan IBM India Software Labs Clive Harris Senior Architect, IBM UK
xx
John Easton Senior Consulting IT Specialist, IBM UK Dr. Victor Alessandrini IDRIS - CNRS - DEISA Gisele S. Craveiro, Rogerio Iope, Liria Sata, Srgio Kofuji Universidade de Sao Paulo, Brasil Edward Walker, Ph.D., Tina Romanella de Marquez, Chris Hempel Texas Advanced Computing Center, The University of Texas at Austin Trish L. Barker, Karen Green National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign Alex Tropsha and his team Director Molecular Modeling lab, University of North Carolina at Chapel Hill Terry O'Brien, Dr. Anne Aldous, Scott Oloff University of North Carolina Project - IBM Madhu Gombar LS Solutions Architect, Healthcare/Life Sciences Solutions Development, provided a case study on the pilot engagement conducted in the cheminformatics arena with molecular modeling lab of UNC-Chapel Hill, NC. This was accompanied by a multi-media, interactive Flash demo developed by her to highlight application of IBM middleware in drug discovery.
Thanks to the following institutions for their contributions:

TACC - Texas Advanced Computing Center University of Texas at Austin MML - Molecular Modeling Laboratory University of North Carolina at Chapel Hill NCSA - National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Computational Science Research Center in Hosei University Japan National Institute of Advanced Industrial Science and Technology, AIST Japan Advanced Center for Computing and Communication, RIKEN Japan
Preface
xxi
Chinese Ministry of Education - MOE China Peking University China Tsinghua University China Huazhong University of Science & Technology China Shanghai Jiao Tong University China Xi'an Jiao Tong University China Southeast University China Northeastern University China Sun Yat-Sen University China South China University of Technology China Shandong University China Beijing University of Aeronautics and Astronautics China National University of Defense Technology China
Become a published author

Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.
xxii
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at:
ibm.com/redbooks
Send your comments in an e-mail to:

redbook@us.ibm.com
Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493
Preface
xxiii
xxiv
Part 1
Part
Introduction
This part of the book includes the following chapters: Chapter 1, Introduction to grid concepts on page 3 Chapter 2, How to implement a grid on page 15
Chapter 1.
Introduction to grid concepts

In this chapter we discuss the following topics: Research and education on the grid concept The applicability of the grid in research and education environments Some thoughts about what the future of grid computing concepts might bring to research and education institutions and to society as a whole
1.1 Beginning of the grid concept

The term grid, coined in the mid 90s in the academic world, was originally proposed to denote a distributed computing system that would provide computing services on demand just like conventional power and water grids do. During the last few years, as the technology evolved and the grid concept started being explored on commercial endeavours, some slight but meaningful changes have been made in its original definition. Nowadays, an accepted definition, world-wide, states that a grid is a system that: coordinates resources that are not subject to centralized control... ... using standard, open, general-purpose interfaces and protocols... ... to deliver non-trivial qualities of service For more information, refer to What is the Grid? A Three Point Checklist by I. Foster in GRID Today, July 20, 2002. Nowadays, most of the interest driven toward the grid concept derives from the fact that, stated as it is, a grid can be regarded as a technology with no boundaries. In fact, if one can integrate all its computing resources, no matter what they are, in a single virtual computing environment, such a system would make possible: The effective use of computing resources that otherwise would remain idle for most of the time... To perform complex and computing-demanding tasks that would normally require large-scale computing resources. As Web technologies have changed the way that information in shared all over the world, grid computing aims at being the next technological revolution, integrating and making available not only information, but also computing resources such as computing power and data-storage capacity. Figure 1-1 illustrates the way that a grid can be built by means of computing resources that are somehow interconnected by the Internet but that have no relation among them.
Figure 1-1 Heterogeneous and independent computing resources
In the next section we describe the types of grids available.
Types of grids
Ideally, a grid should provide full-scale integration of heterogeneous computing resources of any type: processing units, storage units, communication units, and so on. However, as the technology hasnt yet reached its maturity, real-world grid implementations are more specialized and generally focus on the integration of certain types of resources. As a result, nowadays we have different types of grids, which we describe as follows: Computational grid A computational grid is a grid that has the processing power as the main computing resource shared among its nodes. This is the most common type of grid and it has been used to perform high-performance computing to tackle processing-demanding tasks.
Chapter 1. Introduction to grid concepts
Data grid
Just as a computational grid has the processing power as the main computing resource shared among their nodes, a data grid has the data storage capacity as its main shared resource. Such a grid can be regarded as a massive data storage system built up from portions of a large number of storage devices. This is known as either a network grid or a delivery grid. Such a grid has as its main purpose to provide fault-tolerant and high-performance communication services. In this sense, each grid node works as a data router between two communication points, providing data-caching and other facilities to speed up the communications between such points. In this sense, the WWW can be regarded as an embryonic communication grid that does not satisfy (yet) the third requirement of the grid definition [see 1.1, Beginning of the grid concept on page 4].
Network grid
Note: There is no a clear boundary for each type of grid. Every computational grid has a data and network component; likewise for a data grid and a network grid. As such, there really is just one sort of grid which is biased towards one or more of these considerations. Despite grids being a new field of research and development, there are a number of bibliographic references that comprehensively describe the concept of grid computing and its applicability. See Related publications on page 137 for a comprehensive list of such references.
1.1.1 Research and education on grid context

A knowledge-oriented activity is what exists behind research and education. Therefore, we can say that this redbook discusses how knowledge-oriented activities might be leveraged by the use of (potentially) massive computing resources made available through high-scale distributed systems that adhere to the grid formal definition.
Knowledge-oriented activities are performed in a variety of environments: schools, high-schools, universities, research institutes, large corporations, etc. On the other hand, grid implementations only make sense in environments where a meaningful number of computing resources can be integrated to form a higher-performance system, which tends to be rather restrictive. In this book, we consider the implementation of grid systems in environments gifted with a rather large number of computing resources, and that can be greatly benefitted by grid technologies. Each type of grid may be more or less suitable for each type of institution. The following list presents some comments about what may be best in each case: Universities Here, all grid types may be used for leveraging research and educational activities; either a computational grid or a data grid would probably be focused on the area of research, while a network grid would better fit educational purposes.
Research institutes Just as for universities, it is easy to see how research activities performed in institutes can benefit from a computational grid or a data grid. A network grid might be useful in some particular cases, as shown in Part 2, Grid by examples on page 31. Schools In the case of grade schools through high schools, these institutions would probably invest in a network grid for leveraging their educational activities.
In the next section we discuss some issues related to the applicability of grid computing in research and education.
1.2 Applicability
This section presents a brief discussion on which types of research and educational activities could benefit from grid computing technologies.
1.2.1 Why use grids in research and education?

In 1.1, Beginning of the grid concept on page 4, we have presented what a grid is, but havent gone into details on what it can do. Actually, it is not difficult to figure out how useful a high-performance computing infrastructure can be, but this is not all of the truth. The fact is that such an infrastructure can be built up from computing resources that are already available, which is the reason why grids are so appealing.
Briefly stated, a computational grid provides high-performance computing; a data grid provides large storage capacity; and a network grid provides high throughput communication that may be useful for a variety of applications, such as virtual conferences. Having this in mind, we can list the main reasons for using grid computing as follows: Improve efficiency/reduce costs Exploit under-utilized resources Enable collaborations Virtual resources and virtual organizations (VO) Increase capacity and productivity Parallel processing capacity Support heterogeneous systems Provide reliability/availability Access to additional resources Resource balancing Reduce time to results When these reasons are regarded under the light of scientific research, it is easy to understand why scientists are so keen on grids: they believe that the use of grids will transform the practice of their science. As stated in Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects, by Bardeen et al, grids are a tool for: 1. Sharing the costs and burdens of immense computing needs 2. Supporting the participation of scientists worldwide in large collaborations in particle physics, astronomy, cosmology, fusion and nuclear physics, medicine, and life science. Grids have the potential to change how people work together to make scientific discoveries. On the side of education, it is important to note that grids can play a major role as, according to QuarkNet Cosmic Ray Studies and the Grid: Probing Extensive Showers by Bardeen et al, grids represent: An opportunity for a new style of collaborative learning An aid to online posters and discussions with students at other schools An easy way to present and review results An easy way to conduct peer-to-peer discussions A rapporteur of presentations and discussions A single portal to distributed resources Distance Education and Higher Education are the fields more directly touched by grid application in education.
1.2.2 Leveraging research activities with grids

As grid computing makes available low-cost high-performance computing (HPC) infrastructures, the best candidates for using such infrastructures are applications that require high computational power, large storage capacity, or fast and high-throughput networking. Table 1-1 shows an example of research areas that typically make use of high performance infrastructures and which type of grid would first drive their needs. As mentioned before, every computational grid has a data and network component; likewise for a data grid and a network grid.
Table 1-1 Types of grid that drive the grid solution for each area (shaded cells) Research area High energy physics Environmental studies (*) Biology and genetics Chemistry Materials Astrophysics Astronautics and aerospace Automotive Economics analysis Medicine imagery Remote access to experimental apparatus Computational * * * * * * * * * * * * * * Data * * * * Network
Performing meteorological forecasts, calculating the aerodynamic behavior of an airplane, assembling the genome of an organism, analyzing the elementary particles on an accelerator, virtualizing computing resources, and data-mining several terabytes of data these actions all need extensive calculations and handling of enormous amounts of data. This is the perfect scenario for grid computing technologies. In Part 2, Grid by examples on page 31, a number of examples are analyzed and, for each one, a graph is used to represent the portion of computing, data and communication features that a specific implementation has.
1.2.3 Leveraging educational activities with grids

When discussing grid computing in education, one is probably talking about a data grid or a network grid. Here are some of the reasons: Data grid The reason that a data grid can be used to leverage educational activities can be easily understood when one considers how important is the role that WWW has been playing in education. A data grid would provide the same sort of service which is making information readily available world-wide satisfying specific quality of service (QoS) requirements. Such a grid makes it possible for data-intensive streaming applications to be executed on non-specialized communication networks. This allows for audio and video applications to take place, such as remote learning sessions, and even collaborative sessions in which a large number of parties can participate.
Network grid
Thus, as stated in article ITR: Distance Collaboration - Education and Training on the Access Grid , by Morton et al, grid application in education represents an ambitious venture in a direction that substantially increases the ability of groups to cooperate and achieve a sense of collaborative community even though they are distributed across the planet.... Using one of the ... collaborative tools in the realm of information technology, ... the project investigators will launch an initiative to advance the state of the art (in a social and technical sense) in geographically distributed project oriented collaborations An interesting scenario that can be drawn from these ideas is, as presented in the Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects, by Bardeen et al, the one in which educators become interested in and excited about the potential that grid tools and techniques bring to data-based classroom projects and, as a result, use the grid as a hosting environment in which inquiry-based projects are standards-based, visually appealing, use common tools and data formats, allow for levels and scale of use, and provide support materials for educators and students. In such a scenario, some teachers will come to the projects as experienced users with a great deal of knowledge about the research and experience with inquiry-based learning using online resources. Others will be emerging or beginning users. Most classroom users will analyze data from the Web tools. Some will be interested in learning whats under the hood, exploring grid portals, and a few will become developers of grid skins or transforms.
10
Another interesting example described in GRASP-Grid Accessed Data and Computational Scientific Portal, by Sharly, shows how to develop a scientific portal for a learning community who can access Computational Servers, Streaming Servers, Digital Library, Course Materials from different servers spread across the globe, or use Mathematical packages, Computer Aided Design (CAD), Simulation packages, etc.
1.3 What will the future bring?

In this section, we analyze the new perspectives that grid technologies bring to research and education, and present what has already been accomplished.
1.3.1 What exists today

As stated before, grid computing is still in its early years of existence. Although on the one hand, the concepts that define the grid philosophy are converging toward something more consistent, on the other hand there is still a lot to be done in terms of the software and (why not?) the hardware infrastructure. As one can see in Part 2, Grid by examples on page 31 of this redbook, a number of research and education oriented grid computing implementations are already in place all over the world. Actually, these implementations provide grid services delivered according to rather strict QoS requirements. However, even though the level of integration reached can still be considered embryonic, we are still talking about several unconnected grids instead of a single global grid. Nevertheless, the grid culture is already becoming a part of the lives of todays researchers and educators. Just as the World Wide Web has changed the way that society deals with information, researchers and educators now expect the grid to change the way that they deal with computing resources and, ultimately, how they deal with knowledge. And last but not least, it is important to mention that, so far, the computational grid has reached a much higher level of maturity than the other types of grid. This may derive from the fact that computing science has always been concerned with computing activities, for obvious reasons, and that distributed computing research and development has been on the road for more than 30 years now. The data grid and the network grid are taking off mainly due to recent database and Web technology breakthroughs, and still have a rather long way to go until full-scale integration.
11
1.3.2 What is the potential for grids

Thanks to the WWW revolution, networked desktop computers can be found everywhere today. Having a computer and, what is most surprising, having this computer connected to the Internet, is something that few people can afford to reject. Government, industry, universities, and other research and educational institutions rely on every sort of computer to perform their daily activities, and this computer-based society is growing bigger every day. In addition to this, we know that as time goes by and technology evolves, the computers and network connections get faster and more reliable. As a result, the global computing pool becomes more powerful and more strongly coupled, leading to systems that can handle large amounts of data in shorter periods of time. These factors can be summarized in a few words under the grid perspective:
A global grid infrastructure is evolving to be readily available in the near future.

Knowing that this infrastructure will be somehow available, we should analyze the potential applicability of grid technologies. As seen in 1.2, Applicability on page 7, there are a great number of research and educational areas that could benefit from grid technology; having them in mind, we can say that the potential for grid technology applications depends on the following facts: After the World Wide Web, the grid has been regarded as the next natural step towards the evolution of information technology. The forthcoming scientific breakthroughs are likely to be brought by the power unleashed by grid computing. Such a powerful computing infrastructure can embrace existing and brand new paradigms of application execution. Researchers and educators do believe that the grid will come to reality and are looking forward to using it. For all these reasons, we believe that grid computing will form part of an inexorable future, changing the way that research, education, and even ordinary or everyday tasks are performed.
12
1.3.3 What is likely to happen

Throughout this chapter, establishing a grid computing infrastructure has been an abstract concept with which we have been dealing in a rather superficial way; we have a reasonably good idea of what such a structure is capable of doing, but we have not gone into the whereabouts of how to set up a grid. Actually, this is the subject of Chapter 2, How to implement a grid on page 15, but one thing that is not covered in this chapter is: How people will interact with the grid? Answering to this question might give us a clue about the future paths that this technology will follow, which is exactly what this section is about. As the technology evolves and computers come out-of-the-box with grid-enabled software, connecting a computer to the grid will probably be as easy as connecting a data cable to the proper outlet. When starting up a grid on-line machine, logging onto the operational system might register users onto the grid, which is the environment where their personal computing tasks will all be performed: reading their e-mail, editing documents, managing files, browsing the Web, and so on. So far, so good, but exactly what is the role that the grid is playing in this scenario? Despite what normally happens in an application-server-oriented architecture, where all the computing-intensive tasks are centralized in a server that is accessed by a number of dumb terminals, in a grid, every computing resource has to play its part in the overall job: Thus, every computer should be regarded as a source of computing power, data storage, and even data routing, from a networking perspective. Having this in mind, when we get back to the scenario depicted above, we understand that plugging a computer into the grid means not only getting access to the grid but also making more resources available to the grid. The amount of resources that are granted to a certain user may be proportional to the amount of resources that he donates to the grid, but his computing tasks are no longer dependent on his personal computing infrastructure! It is very easy and fun to figure out the implications of such a fully integrated computing environment: As long as a user is connected to the grid, he/she will be able to use the same working environment, no matter how or where this connection is established. Computing power, storage capacity, and data throughput will be available as commodities on the grid, and a user will be able to use them on demand. Once the application code is being executed on the grid, its level of availability and fault-tolerance can be arbitrarily high. The same applies to the data stored to the grid: its level of availability and fail-tolerance can be arbitrarily high.
13
Here are some other interesting issues regarding the future of grid computing: The grid expansion may embrace multiple media types; thus, radios, televisions, and phone networks will also be available as a grid service. Personal and home-based offices will become a reality; this may change the way that small and large corporations are conceived. These are some of the possibilities that might arise from the grid world, and there is no doubt that they will definitely change the way that we deal with information in our personal and professional activities.
14
Chapter 2.
How to implement a grid

In this chapter we discuss the following topics: Information on how the ideas presented in Introduction to grid concepts on page 3 might be implemented in a real-world computational environment How a grid computing infrastructure can be correctly set up In a practical sense, how this chapter can be regarded as a bridge between the concepts and the examples presented in Grid by examples on page 31
15
2.1 Introduction
Knowing what a grid is and what it can do for you is essential when you plan to use this technology to tackle your most demanding computational problems. However, when going through the process of implementing a grid computing environment, there are many other issues that arise and that may require special attention. This chapter offers a brief discussion on how to implement a grid computing environment and, as such, it covers the following topics: Basic requirements for setting up a grid computing environment How to set up an initial grid How to maintain and expand the grid The following topics are not covered: Which software or hardware, in particular, should be used for implementing grid environments Which companies can best provide grid implementation services More information about the various grid topics can be found in the bibliographic references presented in Related publications on page 137.
2.1.1 The main difficulties

According to the definition presented in 1.1, Beginning of the grid concept on page 4 and reproduced here, a grid is a system that: Coordinates resources that are not subject to centralized control... Using standard, open, general-purpose protocols and interfaces... To deliver nontrivial qualities of service This definition suggests a very important aspect of a grid system: its unbounded distributed nature. As any system that is not subject to centralized control, a grid is made up of nodes that might be physically distributed across a world-wide area and that interact with each other by means of open and general-purpose protocols and interfaces. Another important aspect of a grid system is its heterogeneity, which is a natural consequence of its distributed nature. Setting up a computational environment that is physically distributed across a potentially wide area and that integrates heterogeneous computing resources may cause severe headaches. Apart from the technical issues, many human related issues, such as political interests and personal preferences, make it very difficult to accomplish this task.
16
In the future, we expect that, as the technology evolves and the grid concept becomes part of common sense, implementing a grid will be as simple as installing a certain software application on a number of computers. Before getting there, grid implementors should be aware of the traps into which they may fall.
2.1.2 Approaches
There are two basic engineering approaches that we have chosen to adopt when implementing a grid environment: bottom-up implementation and incremental growing.
Bottom-up implementation
In order to understand what is meant by bottom-up, a system as a grid should be regarded as having multiple levels of abstraction. In this case, we are considering that the lowest level of abstraction is the one that takes into consideration the details about the hardware that build up the grid. As the level of abstraction increases, we move our focus to the software layer and, at last, the human factor layer. Having these things in mind, we are able to state that performing a bottom-up implementation implies making sure that everything in a certain layer of abstraction is working properly before moving to an upper layer. This may sound quite obvious, but it is not! There are very specific conditions in which a layer has to work and we will try to depict these conditions here.
Incremental growing
The bottom-up implementation philosophy refers to the way that a group of nodes should be set up to make part of the grid, but it does not address the way that a set of nodes should be integrated into the grid. In these circumstances, the order in which nodes are set up does matter, and for such, we recommend the adoption of an incremental growing philosophy.
Chapter 2. How to implement a grid
17
The combination of these two ideas can be represented by the diagram in Figure 2-1.
Figure 2-1 How a grid should expand
In this figure, each tank represents a group of grid nodes, and the level of water inside a tank represents the level of abstraction in which we are working to set these nodes up for the grid. This figure suggests that: The group of nodes should be integrated into the grid only after fully-integrating previous nodes. The order in which nodes are integrated into the grid depends on the underlying physical and logical structures that connect them. In terms of this figure, implementing a grid is the same as filling interconnected tanks with water. In the following sections, we show how this interconnection should be accomplished and at which tank the water should be shed.
2.2 Basic requirements

In this section we analyze the requirements that must be satisfied for a grid to be implemented.
18
2.2.1 Hardware requirements

A grid environment is made up of computing resources. A computing resource, which in normal conditions is simply a computer, can be regarded as a source of computing power and data storage capacity. The basic hardware requirements that must be satisfied by any grid implementation are as follows: Every computing resource must have enough computing power and data storage capacity to properly run the grid platform. The computing resources do not need to be directly connected to each other. The resource needs to know some entity that takes it to the grid; an entity could be an internal scheduler, or a data server, and so forth. Computing resources can be indirectly connected, through routers, gateways, hubs, switches, bridges, and wireless connections, by which a data packet can be sent from one computing resource to another. Note: By directly or indirectly connected we mean that there is a physical path, which includes cables, routers, gateways, hubs, switches, bridges, and wireless connections, by which a data packet can be sent from one computing resource to another. Depending on the type of grid that we intend to implement, there are additional hardware requirements that have to be satisfied: For a computational grid: The overall computing power of the grid, calculated as the sum of the computing power of its nodes, gives you a clue as to how powerful a grid is, but this is no guarantee of performance at all. The efficiency of a grid will largely depend on the application that it executes. The overall performance of a grid also largely depends on the quality of the communication links that interconnect the nodes. Under the best conditions, the time spent on data exchange during the execution of an application should be negligible compared to the time spent on processing this data. For a data grid: The overall data storage capacity of a grid is the sum of the storage capacity made available for the grid in all its nodes. Apart from this capacity, each node should have enough room to house the grid platform and to let the computer users perform their daily activities.
19
The performance of a data grid heavily depends on its communication links, but it is very difficult to express the grid performance as a function of the quality of its links. A worst-case estimate can be found calculating the time that it takes to exchange a data record between the two nodes for which the communication has the worst performance. For a network grid: The hardware requirements for these grids are even more difficult to determine due to the on-demand nature of their functionality. As a rule of thumb, the average data throughput provided by such grids between two points can be estimated as the average data throughput of the best communication path between these nodes.
2.2.2 Software requirements

These are the basic software requirements that must be satisfied by any grid implementation: There must be an interoperability among grid platforms of all the computing resources. Network software must be properly configured to allow the direct or indirect communication between any pair of computing resources. In other words, there must exist at least one logical path by which two computing resources can exchange data. These are requirements that must be met prior to the installation of a grid platform, but there are some important requirements that must be met by the grid platform itself. To administrate a grid, which is, as stated earlier, a widely distributed computing environment, the need for comprehensive administration tools is imperative. When choosing a grid platform, the availability of such tools should be carefully checked, as they must provide facilities for:
Easily first-installing the platform in a computing resource; this means that

the platform should be available through some sort of on-line network, such as the Internet, or through commonly-used storage medias, such as CD-ROMs. It also means that the installation itself should be straightforward, requiring few and simple steps.
Remotely and automatically upgrading the grid platform and the code for its applications; it is impossible to rely on manual software upgrades when
talking about dozens of computers (not to say hundreds or thousands).
Remotely monitoring the computing resources; the grid platform must provide
real-time information about the state of its computing resources, such as if they are working properly or if they have failed, how efficiently they are executing application tasks, and so on.
20
Storing logging information about all the activities performed on the platform; historical information about the grid performance is essential when
tuning applications. For such, the grid platform should provide a way that developers can analyze this information.
Controlling access to the platform; for obvious reasons, there must be a way to control the access to the platform. Securing the data exchanged within the platform; application developers will
not put their applications to run onto the grid if they are not assured that sensitive data can be secured. Once these requirements are satisfied, we can move to the next level of abstraction: the human-resource requirements.
2.2.3 Human-resource requirements

Nowadays, grid systems depend on human resources much more than we would like them to. Besides the high-level administrative tasks, traditionally assigned to a specialized analyst, there are several tasks, such as software installation, that might somehow be performed by non-specialized people. This section presents some basic rules that have been learned from grid implementations regarding human-resource factors: There must be at least one analyst who will be responsible for the higher-level administrative tasks. Roughly speaking, these tasks include: Upgrading the grid platform Managing applications (installing new applications, starting them up, interrupting, cancelling etc.) Managing user accounts Monitoring the grid and generating reports There must be at least one analyst who will be responsible for the technical support of the grid environment. His activities comprise: Installing the grid software onto the computing resources or helping people do so Helping and guiding network administrators to properly configure their environments so that their networked resources can join the grid Fixing reported failures on the grid Answering technical questions that users and developers may pose Maintaining the grid Web portal
21
In addition to these two analyst roles, it is recommended that: There is one analyst able to help application developers to develop and test their applications. Finally, here is an important remark concerning the human factor: The grid software execution should be as transparent as possible when performed on ordinary desktop computers; users tend to interrupt every running program that they do not recognize as useful or that they believe be a source of overhead; two generally good options are screen-savers and
system services.
2.3 Setting up grid environments

In this section we present the basic steps for setting up a grid environment and discuss some of the issues that commonly arise during these implementations.
2.3.1 Defining the architecture

At first, we must clearly distinguish the concepts of logical and physical architecture: physical architecture This is the architecture defined by the way that the computing resources are physically connected to each other as far as communication links, gateways, bridges, hubs, and routers are concerned. Logical architecture This is the architecture defined by the way that the computing resources are logically connected to each other as far as software configurations are concerned.
We are assuming here that in a grid implementation, most of the integrated computing resources will not be dedicated to the grid. This means that they are already set in place and made part of a physical architecture that was previously defined for other purposes. In general, a grid implementation does not address complex physical architecture issues and, more important, it does not depend on such issues being accomplished. Defining the logical architecture of a grid implementation as, for example, separating computing resources in grid groups, is something we expect to be transparent in the future. Future generation grid platforms will hopefully be able to automatically map a given physical architecture into the best possible logical architecture by performing network tests and benchmarks. As these platforms are still to come, there are some simple rules that might be useful when implementing a grid:
22
Computing resources that are interconnected by a high-speed network and that are physically close to each other are the best candidates for building up a logical grid group; such a group would work very similarly to a cluster of computers, exchanging data among themselves at high rates and with other groups at low rates. If a logical group has been set for computers that are directly connected to each other by a high-speed network, then they will have a local computer that will bridge all the inter-group data exchange. The natural choice for this particular computer is the one that has the role of network gateway, for efficiency and security reasons. Setting logical links between logical groups of computing resources is referred to as defining the high-level architecture of the grid; as groups are not expected to exchange huge amounts of data, the performance of the communication links should not be a concern as long as there are no converging points of communication and/or coordination; a master-slave high-level architecture has this drawback. Inter-group links have to be stable. If stability cannot be assured, dynamic high-level architectures should be considered. Dynamic architectures depend a lot more on the grid platform and, if they can offer flexibility and robustness, they are harder to maintain and have not yet reached maturity in terms of standardization. Having these rules in mind, one should still remember the diagram in Figure 2-1, How a grid should expand on page 18, when planning the architecture of the grid. This means: Defining the point from where the grid will expand is crucial; this is the place where the administrative infrastructure of the grid will be located and where the fault-tolerant parts of the system will be installed. In general, this place has to have a fast and stable down and up links. Defining at which directions the grid will expand is crucial as well; the grid growth should never compromise its performance as, in theory, it is infinitely scalable.
2.3.2 Hardware setup

As we are not focusing on physical architecture issues, hardware installation is not treated in detail in this chapter. However, we present some important notes that should be taken into account when setting up the starting point of the grid environment, which is commonly the place for which new equipment is acquired: Choose your grid software prior to acquiring new hardware; this may sound contradictory from a grid perspective, but remember that grid platforms are not yet fully independent of hardware, and some tasks may strongly depend on the underlying physical architecture.
23
Even if the platform you acquire claims to be hardware independent, there may be certain hardware configurations for which the platform performs better; this probably refers to the way that tasks are distributed across servers and how these servers are interconnected. If you are not setting up a general-purpose grid, special attention should be paid when choosing the hardware; the performance of some applications may dramatically vary depending on the type of machine they are running on; in particular, applications that perform intensive memory access or math processing belong to this category. Well-behaved grid applications have a high processing and communication rate. This means that communication hardware should not be as much an issue as processing hardware. Briefly, you should prefer faster processors more than faster networks. Make sure that your hardware meets your performance expectations before moving to the software set-up. Perform memory access, math processing, and communications benchmarks, and generate reports about the results.
2.3.3 Software setup

Once your hardware is set up and ready to use, you are able to set up the grid software. Actually, there may have some setup activities involving non-grid software. We describe these possibilities in the following sections.
Non-grid software set-up

Before setting up the grid platform, you should make sure that: 1. The operating systems are properly configured: Make sure that accounts have been created on all machines where this is a requirement. Make sure that required network protocols and clients are installed. Make sure that DHCP, DNS, and other configurations are properly set. 2. The networking software is properly configured: Make sure that firewalls are correctly set if they are to be used. Make sure that DHCP and DNS servers are working properly and that your grid platform works fine with the issued IP range. Consider setting a Virtual Private Network (VPN) for the grid. 3. The applications that might be used by the grid platform are properly set up: Check if the applications are installed and working properly.
24
Check if the plug-ins that might be required to integrate the grid platform with the applications are available. Perform integration tests if this is possible.
Grid software set-up

Once all the non-grid software is properly set up, you can move to the grid platform installation and configuration. Especially, make sure that: The platform installation is successfully tested. Every node is visible through remote monitoring tools. Every node responds to are you alive requests. All security components are properly set up. If possible, some additional tests that should be performed: Do a grid platform remote upgrade. Check for security holes (access control, unprotected data channels, denial of service attack...).
2.4 Setting up grid applications

In most of the grid platforms, setting up a grid application should be very straightforward and should not require additional remarks about non-trivial issues. In this section, we present some basic notes about the setup of grid applications that the grid administrator should be aware of. We assume that by this time the grid platform is already successfully installed and checked.
2.4.1 Deploying an application

Grid applications are, in general, single-instruction-multiple-data (SIMD) programs. This derives from the fact that most of the computing-demanding applications have this feature and that, in a loosely coupled distributed system, the data-parallelism tends to be more efficiently exploited. Being so, the deployment of an application has two distinct phases: Code deployment This phase is performed when the application is first deployed to the grid or when the code is modified and has to be updated. This phase has to be performed every time a new execution is issued.
Data deployment
25
Unfortunately, data deployments are more time-consuming, more frequent, and, while they are being performed, the application stands idle waiting for the data to arrive. For this reason, there are a few things that are worth mentioning when discussing application deployments: Some grid platforms make it possible for multiple applications to be executed simultaneously; if this is the case, application deployments do not cause much impact, as the grid does not have to be idle while they are performed. Some few applications are capable of dealing with streaming data, and some grid platforms do support this sort of application (the application starts processing the data as soon as it gets to the nodes). If a single-application grid is to be set up and its application works this way, adopting a streaming enabled grid platform is something to consider. Deployments should be ultimately performed by the system administrator, but the platform might make facilities available for the application developers to submit their application code and data so that every deployment is correctly logged and assigned to its developer. Application deployments should not be a serious concern in terms of performance, as, for a well-behaved grid application, the processing time has to be much greater than the communication time. However, special attention should be paid so that such deployments do not cause the overall system performance to deteriorate in case of malicious or accidental user behavior.
2.4.2 Making application data available

Deploying the application data may be performed in several ways. If the application relies on centralized data-base servers, there must be a platform tool or even an application task for fetching the data at the server, partitioning it conveniently, and sending the pieces to the grid nodes. This automated process is usually the best option when the grid application is integrated with legacy systems that store their state on data bases, but other issues arise when deciding how to spread the application data across the grid. Some common options for making the application data available on the grid, as well as their pros and cons, are discussed next.
Web publishing
Probably the simplest way to make data available to grid applications is to publish it on the ordinary Web sites of FTP servers. There is a whole generation of systems and tools to aid developers to accomplish this task efficiently, but this philosophy has some major drawbacks.
26
If publishing the data itself is easy, getting it to process may not be; the grid application programmer will have to deal with network programming to build its application, which is not desired; additionally, depending on how the application is designed, it can suffers from scalability, as every node may try to access the data at once. This happens because the responsibility for distributing the data across the grid relies on the application designer, and not on the grid platform. To sum up, this might be a good option when fast and short-term applications are to be developed, but one should not rely on this type of publishing for long-term and complex applications.
Data-base server oriented

This is the case exemplified in the beginning of this section. It differs from the previous scenario in the sense that many legacy systems already have their data stored on data-base servers. The main drawbacks also apply to this case: The application developers will probably have to deal with database access issues when developing their applications.
Grid platform driven

When the grid platform provides facilities for fetching the application data and distributing it across the platform automatically, the application development process can be drastically simplified. In this case, the platform must include tools for describing the data in its original source and specifying how it should be partitioned and distributed among the grid nodes. Scalability issues remain totally under control of the platform itself. This is certainly the best option among all, but so far there isnt any grid platform that provides full-fledged facilities for fetching and distributing application data across its nodes. The few available solutions still rely on proprietary technologies and as such cannot be rated as grid solutions.
2.5 Maintaining grids

Once a grid is set up, several tasks have to be performed during its everyday administration. Most of them are similar to those related to any networked system administration, such as user account management, but some of them deserve special attention and are discussed briefly in this section.
2.5.1 Grid platform administration tasks

There are two specific tasks that only grid administrators must perform to maintain a grid: upgrading the platform and adding/removing computing resources. Each one has its issues.
27
Upgrading the grid platform software

Upgrading the grid platform is a task that can cause major impact on the normal operation of the grid. While it is being performed, all the activities must be stopped and, as the upgrade may introduce incompatibilities with deployed applications, it is always advisable to wait for all the applications to finish instead of simply pausing them for a while. Therefore, before choosing a grid platform, check if it will be possible to test new releases on restricted environments before applying a new upgrade over all the computing resources. A nice feature that should also be checked is the possibility of controlled and selective upgrades, so that part of the platform is upgraded and tested before moving to a complete upgrade.
Adding/removing computing resources to/from the grid

Adding computing resources to the grid is something that, in theory, can be performed in two basic ways. Resources can join the grid based only on the will of their owners, who would be responsible for acquiring, installing, and setting up the necessary software, or the joining could also depend on the administrators approval. No matter which philosophy is chosen for a particular implementation, adding new resources to a grid should not be a concern as long as all security issues are clearly stated for both the grid administrators and the resource owners. In fact, if the grid implementation presents security holes, malicious users might join the grid so that their machines can gain access to sensitive information; in extreme cases, a single machine could bring havoc to the grid.
2.5.2 Grid application administration tasks

Among all the tasks related to the administration of grid applications, upgrading and scheduling executions are those that especially deserve a few comments.
Upgrading a grid application

Upgrading applications is a task commonly and frequently performed during grid administration, and, in certain situations, it could be delegated to application developers. As with the addition of computing resources, giving developers the ability to deploy and upgrade their applications should be based on what the platform offers in terms of control over an application. As an example, using a grid for denial-of-service attacks can be extremely easy. Finally, application upgrades should be scheduled for the same reasons presented on section 2.4.1, Deploying an application on page 25.
28
Scheduling application executions

Organizing the way that different applications are executed in a grid by defining their execution order and priority is something that any grid platform should provide to its administrator. Besides the control that the administrator gains over the platform when such functionality is available, experience has shown that when application developers have to submit their executions to queues, they tend to be more careful in testing their code prior to the submission. There are several policies that can be adopted when scheduling applications. The best option strongly depends on the purpose of the grid implementation but, as a general rule, applications should be prioritized according to the importance of their owner. This means that applications themselves do not have a priority, but they inherit their users priority. Thus, if every user has the same rights over the platform, then all the applications should share the same priority; otherwise, if a user has a higher priority over the platform, their applications should have a higher priority as well.
29
30
Part 2
Part
Grid by examples
This part of the book includes the following chapters: Chapter 3, Introducing the examples on page 33 Chapter 4, Scientific simulation on page 37 Chapter 5, Medical images on page 47 Chapter 6, Computer-Aided Drug Discovery on page 57 Chapter 7, Big Science on page 67 Chapter 8, e-Learning on page 79 Chapter 9, Visualization on page 93 Chapter 10, Microprocessor design on page 103
31
32
Chapter 3.
Introducing the examples

In this chapter we provide a briefing of the examples presented in this redbook.
33
3.1 What you will find in these chapters

In the following chapters we provide real life examples of grid computing implementations in the areas of research and education. We offer these examples in the hope that they will be taken as a guide to the use of grid computing to solve your current business problems successfully.
Table 3-1 Examples of grid computing Example Scientific simulation on page 37 Type of grid implementation In this example we present a grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology. The implementation tackles the problem of intensive calculations, which demands high performance computing and typically requires large computational infrastructures such as clusters. In this example we present a data and computational grid in a medical image storage and processing framework. The example tackles the problem of storing and processing large images, which typically requires large computational infrastructures such as distributed databases and clusters. In this example we present a grid implementation that tackles the problem of Computer-Aided Drug Discovery (CADD), which demands high performance computing and typically requires large computational infrastructures such as clusters, mainframes, or super-computers. In this example we present an implementation of a data and computational grid to support government sponsored laboratory types of projects (also known as Big Science). The system accomplishes the problem of storing huge quantities of data, which demands high storage capacity and typically requires large and parallel computational infrastructures. The data grid implementation is based on the IBM General Parallel File System (GPFS).
Medical images on page 47
Computer-Aided Drug Discovery on page 57
Big Science on page 67
34
Example e-Learning on page 79
Type of grid implementation In this example we present a grid environment to support many of the educational and research requirements for exchanging information. Knowing the main ways that education can benefit from grid technology, we can deduce the basic technological needs associated with the development of e-learning. The e-learning infrastructure presented in this chapter is based on the Access Grid. In this example we present a grid implementation to support the field of advanced scientific visualization. The area of visualization is evolving as it addresses emerging and continuing issues, such as interactive and batch rendering of terascale data sets, through remote visualization. At same time, universities in general have a lot of heterogeneity, using many low-cost resources from different suppliers. This includes running different systems through advanced computing resources, such as super computers, advanced visualization systems, etc. Most of these resources are segregated in specific departments for local access only. In this example we present a computational grid solution that helps to reduce the microprocessors development cycle and also allows the design centers to share their resources more efficiently. Microprocessor design and microprocessor verification simulation requires massive computational power.
Visualization on page 93
Microprocessor design on page 103
The order in which these projects are presented was chosen so that chapters describing grid implementations with similar features are grouped together. The chapters describing the projects contain the following information: Business context This describes the current situation of the customer or organization. Business needs This describes the motivations to do something, compelling reasons to act, how the context is changing, what things the customer has to do to improve its business context or to move towards a new business context, or to adapt to something that is changing.
Chapter 3. Introducing the examples
35
Case analysis Functional requirements This describes what the system is supposed to do and what the users want. Non-functional requirements This describes the attributes of a system/architecture/solution: Qualities, all those things that are not specifically asked by a user as a function to be done by the system, but which the technology has to provide anyway. Use-cases This describes the users, roles, and use-cases. Case design This describes the component model and architectural decisions, such as product selection. Implementation This presents the current implementation status.
36
Chapter 4.
Scientific simulation
In this chapter we discuss the following topic: A computational grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology
37
4.1 Introduction
In this example we present a grid implementation to provide the execution of complex system simulations in the areas of physics, chemistry, and biology. The implementation tackles the problem of intensive calculations, which demands high performance computing and typically requires large computational infrastructures as clusters. Note: A similar solution has already been set in place in a number of research institutions in Japan, such as: National Institute of Advanced Industrial Science and Technology, AIST:
http://www.aist.go.jp/index_en.html
Computational Science Research Center in Hosei University:

http://www.hosei.ac.jp/english/
Advanced Center for Computing and Communication, RIKEN:

http://accc.riken.jp/E/index_e.html
4.1.1 Business context

Researchers have a number of calculation requirements which affect many fields, such as engineering, physical science, medicine, and economics. From the global-level simulations such as weather, ocean current, aerial pollution, and marine contamination, to the molecular-level simulations such as solid-liquid-vapor phase transformation and pattern matching calculation, these all fall into the category of computing intensive tasks. Since these phenomena are experimentally not reproducible, many of such complex system research projects are made possible for the first time through the visualization technology and numerical simulation using high-performance computing technology. Conventional high-performance facilities include computer clusters, super-computers, and a wide range of dedicated computing devices that provide good computing power but at a high cost. Apart from the acquisition costs, such devices usually have high maintenance costs and tend to become obsolete within a rather short period of time (a few years at most), having to be replaced by new ones as the research evolves.
38
4.1.2 Business needs

Nowadays, there are not many research institutions worldwide that can afford to have a state-of-the-art high-performance computing infrastructure. For those that cannot, there are private and public high-computing centers to which computing jobs can be submitted and charged on demand and according to specific policies. For many customers, this may be a fairly good option as far as the cost/benefit relation is concerned but, for some institutions, having its own high-performance computing infrastructure is imperative. Another important point that has to be mentioned is that, for most computing intensive applications, there isnt an upper bound limit to where the computing infrastructure should scale. Thus, a general rule that applies to this context is that: the greater the computing power you have, the better. Actually, in scientific research, greater computing power means better results or similar results in a shorter period of time, which just validates the rule. Summing up these ideas, research activities that perform computing intensive calculations depend on high performance computing and, as a result, on very expensive infrastructures. Such infrastructures must be not only performatic but also cost-effective and long-lasting. Therefore, this is the perfect scenario for grid computing technologies. As mentioned in Part 1, Introduction on page 1 of this redbook, grid computing provides a means by which conventional computing resources, like desktop computers, can be used in a comprehensive way so that inexpensive high-performance computing can be accomplished. In addition, this eases the upgrading process of large computing facilities, making it possible for smaller research institutions to perform research activities that depend on huge computing power.
4.2 Case analysis

In this section, we analyze how a grid computing solution may be built to tackle the problem described in the previous section.
4.2.1 Requirements
This section describes the functional and non-functional technical requirements for the solution proposed.
Chapter 4. Scientific simulation
39
Functional
To satisfy the requirements described above, these are the recommendations: Provide high performance computing power sufficient for accomplishing computing intensive tasks (on the order of teraflops). Do this at a rather low cost (a high cost would be, at least, about hundreds of thousands of dollars, which is the cost of low-profile high-performance computers). Do this in a way that upgrading the computing infrastructure does not pose major difficulties.
Non-functional
The main non-functional requirements for this solution are as follows:
Security: This is a basic requirement that every computing resource has to

satisfy. Data privacy and user authenticity has to be guaranteed for every transaction performed over the system, especially when dealing with confidential and cutting-edge technology researches.
Ease of use: High performance computing systems are likely to become part
of everyday research activities. For this reason, the use of such systems should be accomplished by intuitive and common-place interfaces, hiding from their users the inner details of the process of executing an application.
Management: The system mustnt require large resources, either technical or human, to be maintained, as this can seriously compromise the cost of the solution.
4.2.2 Use-cases
Considering all the requirements presented in the previous section, which were basically drawn from the need for high-performance computing, we can draw up the following set of use-cases for such an infrastructure.
Use-cases diagram
The use-cases diagram is presented in Figure 4-1.
40
submit jobs to the system fetch and analyze results user authentication
Administrator
manage resources utilization manage security and authentication
Researcher
Figure 4-1 Use-cases diagram
Users description
These are the various roles involved:
Researcher: This is the user who performs the basic tasks with the processing
jobs. This role is normally personified by a university professor or research institutions technical staff.
Administrator: This is the user responsible for the management tasks on the
system. This role is typically personified by a network or system analyst or a data-base administrator.
Use-cases description
Here we describe the various use-cases:
User authentication: This use-case represents the procedure which all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation.
41
Submit jobs to the system: Actually, this use-case contains several sub-use-cases that describe the installation and the management of grid applications and will not be described here at length. In simple terms, the life-cycle of a grid application is: development, local test, installation on the grid, execution, and result analysis. Fetch and analyze results: When an application finishes executing, its results
have to be fetched and analyzed by the researchers. The analysis itself is generally performed using specific visualization tools, but before that, the system has to be able to make such results available. This is normally accomplished by making data files, generated during the application execution, available through a networked file system or through some specialized interface, such as the Web. In this example, we consider Web-based interfaces as the default option.
Manage resources utilization: In this use-case the administrator performs monitoring and performance-tuning tasks to make sure that the grid facility is working in an optimized way. Typical activities that make up part of these tasks are: checking activities logging, analyzing usage history, checking average system load, and setting caching parameters, among others. Manage security and authentication: This use-case includes all the
security-related issues about administrating the grid, such as managing user accounts, defining security policies, configuring network software and hardware components, and so on.
4.3 Case design

In this section, we design a grid computing solution that fulfills the requirements drawn from the analysis presented in the previous section.
4.3.1 Component model diagram

Figure 4-2 presents the component model diagram for this solution.
42
Integrated computing resources Cluster Portal Site Web Browser OGSA Toolkit Scheduler Cluster
Web Browser
Figure 4-2 Component model diagram
4.3.2 Component model description

These are the components described in Figure 4-2.
Web Browser : This provides easy access to the grid. Job submission and
receiving results can be performed through this component.
Portal Site: This provides a general-purpose and user-friendly interface to the

grid system. Both research and administrative tasks, such as job status management, execution history management, and resource information, can be performed.
OGSA Toolkit: This can integrate existing clusters, assuring the deliverance of
non-trivial qualities of service. Its basic role is to receive jobs upon user requests and submit them to the scheduler.
Scheduler : This manages workload management and job submission, both

inter-clusters and intra-clusters.
Clusters: These are conventional clusters made up by dozens of networked computers. In general, each cluster works on an exclusive high-speed local area network. Integrated computing resources: This structure is the virtual computer made up by the multiple clusters that are part of the grid.
43
4.3.3 Architectural decisions and product selection

This section takes into account all the information provided by the previous sections and presents the product selection for the proposed solution (Table 4-1).
Table 4-1 A typical product selection Component Web-browser Products Internet Explorer Netscape Navigator Firefox Mozilla IBM WebSphere Application Server Grid System Gateway Tomcat Chosen product Any of these browsers should be able to properly access the grid portal. Grid System Gateway was chosen due to the reliability and technical support availability requirements. Globus Toolkit was chosen as it is the default OGSA implementation available. Platform LSF was chosen due to its full-compatibility with Globus Toolkit.
Portal site
OGSA Toolkit
Globus Toolkit
Scheduler
Platform LSF Tivoli Workload Scheduler
Grid System Gateway: A portal solution, developed by IBM, for implementation of a computational grid environment. For more information about grid computing solutions, refer to: http://www.ibm.com/grid/jp/solutions/portal.shtml Platform LSF: Intelligent, policy-driven batch application workload processing middleware, platform computing product. For more detail, see: http://www.platform.com/
4.4 Implementation
The solution presented in this chapter is a reasonably standard computational grid implementation. This section presents the implementation level already achieved by the last one as well as giving some additional information about the technologies adopted.
44
Implementation status
One of the grid implementations taken into account has adopted the products listed in Table 4-2.
Table 4-2 A typical product selection Component Computing nodes Products IBM eServer IBM xSeries Chosen product In this example, a set of xSeries-based clusters were integrated into the grid. These servers are running Redhat Linux due to its reliability, cost-effectiveness, technical support availability, and full-compatibility with Globus Toolkit. The killer application in place is BLAST.
Operating System
Redhat Linux
Application
BLAST
BLAST: Basic Local Alignment Search Tool, provides a method for rapid searching of homology, such as a search of a nucleotide and protein database.
Their current implementation accounts for: A 4-node computing cluster made of Grid Mathematica servers An 8-node computing cluster made of BLAST servers managed by PlatformLSF A 4-node management cluster made of Globus Toolkit servers 1000BaseT network The expectation of the grid administrators is to expand this computational grid to other campuses so that several computing clusters can be integrated into this system. The total computing power expected for the end of 2004 is about 10 teraflops.
45
4.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies making use of high-performance computing infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the research institutions could afford to have. In this chapter, we presented a grid-based implementation for a high-performance computing infrastructure suitable for most of the researching demands. This infrastructure should be able to deliver high-performance computing at a much lower cost, which makes grid computing so appealing to research institutions. We recognize that a number of compute intensive applications might not benefit from this technology, but we are sure that the grid philosophy will greatly affect the way that research is performed by such institutions.
46
Chapter 5.
Medical images
In this chapter we discuss the following topic: A joint use of a data grid and a computational grid in a medical-image storage and processing framework
47
5.1 Introduction
In this example we present a data and computational grid in a medical-image storage and processing framework. The example tackles the problem of storing and processing large images, which typically requires large computational infrastructures such as distributed databases and clusters. The solution allows the use of idle storage and processing capacities in machines of the grid to store and process large amounts of medical digital images. Note: A similar solution has already been set in place in the eDiaMoND project. This is a collaborative project funded by grants from the Engineering and Physical Sciences Research Council (EPSRC), which is the UK Government's leading funding agency for research and training in engineering and the physical sciences, Department of Trade and Industry (DTI), and IBM. It is strictly a research project which has the ambitious aim of proving the benefits of grid technology to eHealth, in this case for Breast Imaging in the UK. More information about the eDiaMoND project can be found at:
http://www.ediamond.ox.ac.uk/

In our current world, time is precious, and in the field of science and medicine, time has a remarkable worth, meaning saving lives. Everyday new drugs and new procedures are being discovered, but some diseases continue without cure. Conquering cancer is still one of the most important goals of the biotechnology industry and medical research in the search to afford more efficient methods of prevention and cure. Breast cancer, for example, is the most common cancer in women nearly 30% of all cancers in women occur in the breast. The best ways to reduce mortality from breast cancer is to improve the screening, diagnosis, and prevention capabilities.

Image processing like digital mammography is an important tool for detection of non-typical breast tissues and abnormal features. Old medical systems using films and papers to store the results make difficult the access and the analysis. Digital images stored in computers allow for more efficient and fast information access, either for doctors or researchers. Another advantage is the possibility of keeping medical records available to screen patients in an epidemic zone.
48
To improve the breast cancer screening and epidemiology applications, it is necessary to develop a system able to provide large-scale digital imaging storage and analysis services, make possible for medical sites to store, process, and data-mine medical images, manage mammograms as digital images, and make such images available to other sites, like clinics, hospitals, universities, and research institutes. One obstacle for a solution to support digital imaging is the space necessary to store these images. A digitalized A4-size mammography with the minimal resolution needed to do an effective analysis occupies around 32 MB of storage space. Usually, four images are taken, using around 128 MB of space per exam. These requirements make grid computing a good candidate.
5.2 Case analysis

5.2.1 Requirements
Functional
To satisfy the requirements described above, these are the recommendations: Provide capacity to store thousands of medical images, each one having a size of approximately 32 MB, and around 128 MB per patient. Provide capacity to store the non-image data about the patients, such as personal identification, data of tests, doctors responsible, and treatments, for thousands of patients. Provide computing power to process the medical images searching for patterns that can indicate a cancer. Provide access to patients images and information to hospitals, clinics, universities, etc.
Non-functional
Ethics: Patients identification must be protected when images are accessed

by universities or researchers.
Chapter 5. Medical images
49
Security: Images and related information must be protected against

unauthorized modification or visualization.
Logging: Provide ways to log all access to patients images, data, and all related diagnostics. Scalability: This system must be able to store up to 8 million medical images per year. The system must be able to accept the entrance of new universities and hospitals.
5.2.2 Use-cases
This section presents the use-cases that define and illustrate the use of the proposed solution.
Use-cases diagram
Figure 5-1 shows the use-cases diagram.
Users description
Technician: This person is responsible for the image acquisition. This role is normally personified by a specialist in the operation of computers in general, x-ray machines, and other medical devices.
50
Radiologist: This person is responsible for accomplishing a diagnostic based on the patients images and data. This role is personified by a specialist in radiology and oncology. Administrator: This person is responsible for performing high-level management tasks in the screening process. This role is normally personified by an analyst with knowledge about the workload and capacity of the participant hospitals. Researcher: This person uses the large amount of images developed for
research cases, such as new technologies and methodologies for diagnostics, new treatments, drugs to treat diseases, and systems that do image processing and recognition.
User authentication: In this use-case, this is the process that occurs for
ascertaining the identity of the originator of some request to the system. All users must be authenticated to use the system.
Image acquisition: In this use-case, a technician scans all available studies

for the patient from a patient folder in which all the paper records for that patient are present, like mammograms or x-ray images. The patient folders may typically contain screening forms, patient medical history information, and past diagnosis reports. The technician will enter all available information about the patient from the paper forms and will associate the images with the patient details split by study. Identifying patient information will be removed and new names made up by the system to protect the patients identity. The technician will then anonymize the images and save them on the grid.
Image reading: In this use-case, a radiologist performs a screening session.

The images are retrieved from the grid and presented to the radiologist. He may also use Computer Aided Detection (CADe) results and first reader opinion annotations, if applicable, to aid in the diagnostics.
Manage screening: In this use-case, the administrator manages the workload

and capacity within the system. He uses two important pieces of information to perform this task. One is how many images are waiting for a first or second reading in the system. The other information is about staff availability (such as number of radiologists) and average statistics for the hospital for time required to read the images. If the workload is greater than the reading capacity, he may request that some images be read remotely at a site that has enough capacity.
Manage security and authentication: The administrator is responsible for

executing management tasks, like creating user accounts, updating the grid software, etc.
51
5.3 Case design


Figure 5-2 shows the component model diagram.

These are the components described in Figure 5-2:
Image capture workstation: In this component are located the image

acquisition tasks. This component is operated by a technician in each medical site. The images are digitalized and converted into a high quality data image using the Digital Imaging and Communications in Medicine (DICOM) open standard format. DICOM is a standard developed by the American College of Radiology, the National Electrical Manufactures Association, and other members, and defines a pattern to describe the images and other related information. Utilizing this standard simplifies the implementation of the application to perform the image recognition and analysis. More information about the open standard DICOM format can be found at:
http://medical.nema.org/
52
Grid nodes: The grid nodes are the resource providers of the grid infrastructure, defining the dimension of the grid. Each participant site of the grid, such as hospitals or universities, can add servers increasing the capacity of store, manipulate, and process the patients images and data. Beyond a toolkit to build the grid infrastructure, each server has the following components:
Image storage: In this component the DICOM image files are effectively stored. To manage these files, a software component is used, a content manager. Federated database: A relational database server is used to store the patients data and the image metadata that describes those files. This information is federated to all databases installed in the hospitals or universities that form the grid.
Image session workstation: Using this component, radiologists and researchers retrieve the images and related data to perform diagnostics and researches. Management portal: This component, installed in a central location, is used by the Administrator to perform management tasks, such as managing the system workload and capacity.
On the application development, the implementation is toward to Service Oriented Architecture (SOA) and open standards like Open Grid Services Architecture (OGSA), including Open Grid Services Architecture - Data Access Integration (OGSA-DAI). OGSA-DAI is a project developed by the UK Database Task Force, whose objective is to provide a standard interface for a distributed query processing system to access data in different databases. As shown in Figure 5-2 on page 52, both the image and non-image data are accessed by the components of the grid, through the OGSA-DAI standard implementation. For more information about the grid standards, refer to the following Web sites:
http://www.ogsadai.org.uk http://www.ggf.org http://www.oasis-open.org
Another important issue is about security and privacy. At the time the images are accessed through the institutions that use the grid, between a hospital and a university, for example, this data must be protected, using security techniques like cryptography and other security strategies defined in the OGSA standard.
53

Table 5-1 presents the product selection based on architectural decisions.
Table 5-1 Architectural decisions and product selection Component OGSA Toolkit Products Globus Toolkit 3 (GT3) IBM Grid Toolbox IBM DB2 PostgreSQL MySQL Oracle IBM DB2 Content Manager Chosen product Globus Toolkit 3 (GT3) is chosen due to its open source nature. IBM DB2 is chosen due to its performance, reliability, and technical support availability. IBM DB2 Content Manager is chosen due to its capacity to manage, share, integrate, and deliver critical information such as images, documents, reports and Web content. IBM WebSphere Application Server is chosen due to its performance, reliability, and technical support availability. IBM pSeries and xSeries are chosen due to the outstanding availability and price/performance capabilities that help better manage and provision the IT environment. Both are chosen due to their performance, reliability, and technical support availability. Actually, a combination of free and non-free operating systems is adopted as a way to implement a robust and cost-effective grid.
Relational Database
Image storage / content management
Management portal
IBM WebSphere Application Server JBoss Application Server
Servers
IBM pSeries IBM xSeries
Operational systems
IBM AIX V5 Linux
54
Component Development environment
Products IBM Visual Age C++, Eclipse
Chosen product IBM Visual Age is chosen due to its productivity and because it is a pre-requirement for Content Manager.
Another important reason for IBM Content Manager to be chosen is the possibility of integration with other components using OGSA. OGSA-DAI already supports relational and XML data sources and provides a flexible framework into which to plug other data sources. Since the query language for Content Manager is Xpath, it is possible to expose Content Manager as another XML data source.
5.4 Implementation
This section presents the implementation level already achieved and the next steps to be followed for a complete deployment.
Here is a summary of the current implementation level: There are 4 screening centers. There are 5 universities. Approximately 30-35 staff are involved. The system stores around 256 TB / year The system stores and process only mammograms. Approximately 1.5 million women are screened each year in the U.K. Women screened for breast cancer currently have one view per breast taken, meaning 3 million mammograms per year.
Next steps
The goal for a complete deployment is as follows: Plan to have 92 screening centers; 230 radiologists. Expand to create a worldwide digital mammography grid by linking up with screening programs being developed in France, Germany, Japan, and the United States. Expand to store and process other types of medical image not specific to mammograms or even cancers in general.
55
Data-mining technology will be able to search the database for images that are similar to the one being looked at and have a known diagnosis. Plan to increase over the next 2-5 years to two views of the mammogram per breast, meaning more than 6 million mammograms per year.
5.5 Conclusion
Cancer has been regarded as one of the most challenging research topics for the medical institutions. As a definite cure seems not to be within the researchers sight, preventive examinations and early diagnosis have been the best weapons with which doctors have been fighting this disease. In this context, the analysis of medical images plays a major role in cancer diagnosis and prevention. In this chapter, we presented a grid-based solution that aimed at leveraging medical image analysis and management. Such a solution provided a huge storage capacity, where an unbounded number of images could be stored, and high-performance computing, for the automatic analysis of massive quantities of images, at a rather low cost. We strongly believe that such applicability of grid computing can greatly improve the way that medical research is performed and, ultimately, provide a better quality of life for humanity as a whole.
56
Chapter 6.
Computer-Aided Drug Discovery

In this chapter we discuss the following topic: A computational grid implementation to support the area of Computer-Aided Drug Discovery (CADD).
57
6.1 Introduction
In this chapter, we describe a grid implementation example that tackles the problem of Computer-Aided Drug Discovery (CADD), a workflow that can be used to increase the hit rate in screening chemical compound databases and thus speed up drug discovery. This approach demands high performance computing and typically requires large computational infrastructures such as clusters, mainframes or super-computers. Note: A similar solution has been set in place at the Molecular Modeling Laboratory (MML) at the University of North Carolina (UNC) at Chapel Hill School of Pharmacy. More information about this project can be found at the following Web site:
http://www.ibm.com/software/ebusiness/jstart/casestudies/uncmodel.shtml

The development of a new drug is a lengthy process consuming 7-12 years and on average costing 800 million dollars. It includes several research phases to identify chemical compounds that can be developed as drugs with the desired pharmacological effects on patients of the researched disease. The pharmaceutical industry is always searching for ways to reduce this time to market, which also reduces the cost and time until they see a return on their investment. A potential solution for these problems is the utilization of computers to perform a rapid property profiling of virtual chemical libraries and molecular databases through the use of robust Quantitative Structure Activity Relationship (QSAR) models. QSAR models represent a statistically significant relationship between a given property and structural attributes of chemical compounds. Using these mathematical models, researchers can predict a large amount of pharmaceutical properties for a chemical compound, thereby decreasing the number of costly and time-consuming experiments that need to be performed.

The utilization of the QSAR models has great advantages such as reducing the costs and cycle time of drug development. However, some existing problems can make the utilization of this technique a slow and inefficient process. These problems are: Lack of an integrated interface, with which researchers can easily employ modeling tools
58
Lack of automation The need for a large computer resource to efficiently perform model generation The development of validated and predictive QSAR models can require building thousands of models per dataset. Assuming that a 100 compound dataset requires about 10 minutes to generate a model on a single CPU, and that frequently about 10,000 models need to be built, that equates to more than 76 days to complete the task.
6.2 Case analysis

In this section, we analyze how a grid computing solution may be built to tackle the CADD problem described in the previous section.
6.2.1 Requirements
This section describes the functional and non-functional technical requirements for the proposed solution.
Functional
To satisfy the requirements described above, these are the recommendations: Provide ways to automate the execution of QSAR models, creating a workflow. Create an integrated and easy-to-use interface where researchers and users with little technical knowledge can submit and manage jobs. Create an integrated file system that allows various platforms (AIX, Linux, UNIX, etc.) to access data and create QSAR models regardless of the machine type. Utilize the grid computing capabilities to reduce the total processing time required to generate numerous QSAR models.
Non-functional
Easy updates: Tasks such as workflow manipulation, data source modifications, and the addition of new modeling algorithms, must be performed by administrators and researchers with low levels of computational skills. Integration: Make possible the integration of visualization and analysis tools from third-party vendors, allowing users to easily analyze their results.
Chapter 6. Computer-Aided Drug Discovery
59
Security: Provide authentication and authorization methods, the capacity to

define different classes of users with authorization to perform a pre-defined set of activities. Provide logging capacities to register the activities performed in the system, which can be used for auditing purposes.
6.2.2 Use-cases
Use-cases diagram
Figure 6-1 shows the use-cases diagram.
60
Users description
Public user: This user makes use of the public version of the system. The user creates his/her account and uses the system with limited privileges. This role normally is personified by a student. Private user: This user has more privileges on system utilization. This role is normally personified by a pharmaceutical company researcher or another business partner. Researcher: The researcher is the one that conducts complex analyses
based on results of all jobs stored by all users. This role is personified by a professor or scientist.
Administrator: This user is responsible for performing high-level management tasks on the server and grid system.
User authentication: The users must provide a valid username and password
to access the portal and built-in applications.
Manage security and authentication: In this use-case, the administrator

performs user account management, being able to add, delete, and modify accounts. They can restrict new user accounts requests, created by public users.
Create a new user profile: In this use-case, a public user requests the creation of a new user profile. This request will be approved by the administrator based on the user information provided. Submit/resubmit jobs: In this use-case, valid users can submit new jobs
based on parameters entered into the portal. They can also resubmit a completed job using parameters copied from a previous job.
View job status: In this use-case, a user checks the status of a submitted job,
which may be running, failed, or completed.
Delete jobs: In this use-case, a user deletes submitted or completed jobs. Run visualization tools: In this use-case, a user makes use of visualization tools to analyze the job's result, generating charts or another graphic representations. Store results: Researchers and private users can store their job results in the
data grid.
Analyze results: Researchers can utilize the stored job results to perform
complex analyzes and data mining to correlate information.
61
6.3 Case design

In this section, we design a grid computing solution that fulfills the requirements drawn from the system proposal presented in the previous section.

Figure 6-2 shows the component model.

The components chosen for this solution are as follows:
Portal: This is the component through which all users can interact with the
system. The users perform authentication in a portal and a defined role is assigned to them. According to this role, the user can execute activities such as job submission, job management, visualization, and analysis of results, etc. The administrators also use the portal to perform administration tasks.
Application server: This component uses data or pointers entered into the portal and executes the model generation process on the grid nodes. Workflow: Workflows are the stepwise execution of various QSAR
development programs to produce a QSAR model.
62
Grid nodes: The model generation process started by the user is divided into
smaller workflows that build a single QSAR model each, identified in Figure 6-2 as QSAR Model Mini-workflow. These compute jobs are sent to the grid nodes for processing.
Database: A relational database is used to provide quick retrieval of run-time data and users' profiles. GFS: The GFS, or Global File System, is a high-performance shared-disk file system standard that provides data access for multiple nodes running different operating systems. Using GFS, it allows multiple systems of various types to access and write to the same file space.

Table 6-1 presents the product selection based on the architectural decisions.
Table 6-1 Architectural decisions and product selection Component Application Server Products IBM Websphere Application Server, JBoss Application Server Chosen product IBM WebSphere Application Server Enterprise Edition is chosen due to the reliability and technical support availability requirements. IBM DB2 is chosen due to its performance, reliability, and available technical support. IBM WebSphere Portal Server is chosen due to its integration with the other products used in this solution. IBM WebSphere Application Developer Integration Edition is chosen due to its ability to integrate with other products used in this solution.
Database
IBM DB2 Oracle Sybase IBM WebSphere Portal Server
Portal Server
Workflow modeling
IBM WebSphere Application Developer Integration Edition
63
Component GFS
Products Andrew File System (AFS) Avaki DataGrid IBM GPFS
Chosen product The Avaki DataGrid was chosen as it provides a global file system with a unified namespace but with a smaller system footprint. Additionally, it is simpler when compared to other options, like AFS. Globus Toolkit was chosen due to its open-source nature.
Grid middleware
Globus Toolkit
6.4 Implementation
This section presents the current implementation level achieved, and the next steps to be followed for a complete deployment.
Here is a summary of the current implementation level: Full-featured grid portal acting as a single point of interaction, allowing a great number of users to use the solution Integration of modeling tools in the form of a workflow Better automation of the process decreasing the need of human interaction Creates a single view of the file systems by the use of GFS, which makes it easier to execute the QSAR applications on a compute grid Decreases the QSAR model generation time from days to hours
Next steps
This solution can be expanded including new grid nodes or interconnecting with other existent grids, increasing the processing capabilities and availability.
6.5 Conclusion
The development of new drugs is a slow and expensive process that comprises several research phases and experiments to identify which chemical compounds can be developed as drugs.
64
The utilization of CADD solutions and mathematical models like QSAR can help in speeding up this process. By making use of these solutions, research institutes, universities, and pharmaceutical industries can accomplish complex simulations and determine important pharmaceutical characteristics of chemical compounds. When implemented in the proper way, this has the ability to greatly decrease the costs and time required to deliver new drugs to the market. Grid computing can play an important role in this situation, acting as an acceleration factor in the modeling process. These chemical modeling applications generally need large compute power to supply satisfactory results in a short time period. Utilizing distributed and parallel processing capabilities provided by grids, simulations that previously took several days, now can be completed in a few hours. The solution presented in this chapter also has the advantage of integrating several modeling tools into an automated workflow Additionally, with the integrated and easy to use interface provided by the portal, a greater number of research institutes, pharmaceutical industries, universities and students can interact and share their knowledge.
65
66
Chapter 7.
Big Science
In this chapter we discuss the following topic: Implementation of a data grid and computational grid to support government-sponsored laboratory projects (also known as Big Science )
67
7.1 Introduction
In this example we present an implementation of a data grid and computational grid to support government-sponsored laboratory projects (also known as Big Science). The system accomplishes the problem of storing huge quantities of data, which demands high storage capacity and typically requires large and parallel computational infrastructures. The data grid implementation is based on the IBM General Parallel File System (GPFS). Note: A similar solution has already been set in place in DEISA, a consortium of leading national super computing centers in Europe aiming to jointly build and operate a distributed terascale super computing facility. More information about the DEISA project can be found at the following Web site:
http://www.deisa.org/

Big Science is a term used by scientists and historians of science to describe the change in science which occurred in industrial nations during and after World War II. The bulk of these scientific activities took place in a new form of research facility: the government-sponsored laboratory, employing thousands of technicians and scientists, managed by universities. Big Science usually implies these specific characteristics: Big budgets: Scientists were now able to use budgets on an unprecedented scale for basic research. Big staffs: The number of practitioners of science on any one project grew. Big machines: This is an era of massive machines (requiring massive staffs and budgets) as the tools of basic scientific research. Big laboratories: Because of the increase in cost to do basic science (with the increase of large machines), centralization of scientific research in large laboratories (such as Lawrence Berkeley National Laboratory, or CERN) has become a cost-effective strategy, though questions over facility access have become prevalent.
Note: The above description about Big Science was obtained from Wikipedia, the free encyclopedia, at:
http://en.wikipedia.org/
68
In the last few decades, the use of computers in these research institutions has increased considerably, due to their capacity to aid the researchers in processing data and storing results.

Research experiments may produce a large amount of data, which can hardly be stored in a single place. Experiments using a particle accelerator, for example, can produce hundreds of petabytes of data each year. All of this complex data must be stored and analyzed in search of results. These applications require two types of computer resources: large processing power to analyze the data and large storage capacity to store the data and the results of the researches. To tackle the processing power problem, various solutions can be used, such as clusters and a computational grid, as described in other chapters of this redbook; those solutions are not the focus of this chapter. The main problems that these applications face regarding storage capacity are as follows: The storage space of research institutions can be enough to store all temporary data generated by an experiment, but this availability is dispersed and roughly interconnected. The characteristics of data production by an experiment may need not just storage space, but also storage speed. We must consider how the extra computing and storage resources can be supplied without buying a new storage system or upgrading the existing one. Perhaps this can be done by just using the remaining space from other sites. Also, we need to investigate how computers can be connected with others from other institutions to collaborate and share spare storage resources. Such storage capacity problems might be solved by using a data grid approach as shown in this chapter.
7.2 Case analysis

In this section, we analyze how a solution can be built to solve the problem described in the previous section.
Chapter 7. Big Science
69
7.2.1 Requirements
Functional
Considering a typical scientific application similar to the ones cited in the introduction of this chapter, the following functional requirements were drawn up: A storage capacity to accumulate at least 10 petabytes/year is needed. A single system image; the users and applications must be able to access the same data independently of the node they are using, by either: Mapping function, locating proper data across multiple disks over different nodes Message passing, shipping data between nodes
Non-functional
Performance: The system must deliver raw recording rate from 0.1 to 1 gigabytes/sec. Consider the caching system (coherence, aging, swapping, data-shipping, etc.). Scalability: The system has to be able to grow until its storage capacity reaches dozens of petabytes without significant losses in performance Robustness: The system must make efficient use of its data-storage resources so that the information stored is maintained regardless of individual devices faults. Consider the data replication. Portability: The system should run on various platforms. Security: The system must be able to operate across firewalls and provide
robust security models.
7.2.2 Use-cases
Use-cases diagram
Figure 7-1 presents the use-cases diagram.
70
user authentication
Researcher
store experiment results analyzes experiment results

manage resources utilization manage security and authentication
Administrator
Users description
Researcher: This user makes use of the shared resources on their

researches. This role is normally personified by a university professor or the research institutions technical staff.
Administrator: This is the user responsible for the management tasks on the
system. This role is typically personified by a network or system analyst or a data-base administrator.
User authentication: This use-case represents the procedure that all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation. Store experiment results: In this use-case, researchers store the results of
their researches in the grid. This task is accomplished as if researchers fetched a file (or a set of files) from their local file-system.
Analyzes experiment results: In this use-case, the researcher fetches the data
from the grid and analyzes it. As in the previous case, this task is accomplished as if researchers fetched a file (or a set of files) from their local file-system.
71
Manage resources utilization: In this use-case, the administrator performs monitoring and performance-tuning tasks to make sure that the grid facility is working in an optimized way. Typical activities that make part of these tasks are: checking activities logging, analyzing usage history, checking average system load, and setting caching parameters, among others. Manage security and authentication: This use-case includes all the
security-related issues about administrating the grid, such as managing user accounts, defining security policies, configuring network software and hardware components, and so on.
7.3 Case design

There are a number of ways in which a data-grid may be designed to fulfill the requirements presented in the previous section. In this chapter, we have chosen to adopt a particular implementation based on the Global File System (GFS) concept. The GFS is a high-performance shared-disk file system standard that provides data access from multiple nodes running different operating systems. Parallel and serial applications can readily access shared files using standard file system interfaces, and the same file can be accessed concurrently from multiple nodes. The GFS has been regarded as the next generation of networked file systems, and some of its implementations are already available on major Linux distributions. More on that can be found on:
http://www.globalfilesystem.org/
The reason why we have chosen to base our grid implementation on GFS is that it partially solves some of the issues associated with data-grid implementations. Through a GFS layer, it is possible to provide a single file-system view to the system applications and services so that only minor control has to be aggregated by a data-grid platform. This considerably eases the task of providing data management services that satisfy all the basic grid requirements.

Figure 7-2 presents the software architecture adopted in this implementation. Having this architecture in mind, the component model is presented in Figure 7-3.
72
Applications Layer
OGSA Layer
GFS Operating System Physical Infrastructure
Data grid layer
Figure 7-2 Software component architecture
G FS
GFS Storage N ode
OGSA
OGSA
N etw ork
G FS G FS
Grid Nodes
G FS Storage Node
OGSA
G FS
G FS Storage Nodes Grid N odes
73

The diagram presented in Figure 7-2 on page 73 illustrates the way that the main grid components are connected to each other and shows which roles they play in the proposed solution. In this diagram, we can see two basic distinct components:
GFS storage nodes: These are the computers whose storage devices take
part in the GFS schema. A virtual disk is mounted and made available to the remaining computing resources by the data network.
Grid nodes: These are the computers where the OGSA software is running for providing grid services over the GFS schema. In addition to assuring non-trivial qualities of service over the data storage facilities provided by GFS, the OGSA layer may also provide computing-grid and communication-grid services.
There may be computers that both contribute to the GFS system and provide OGSA-level grid services, as we can see in the bottom portion of Figure 7-3.

Table 7-1 presents the architectural decisions and product selections.
Table 7-1 Architectural decisions and product selection Component OGSA Toolkit Products Globus Toolkit 3 Chosen product Globus Toolkit was chosen as it is the default OGSA implementation available. IBM GPFS was chosen for this solution due to its high performance and scalability. Linux and IBM AIX v5 are chosen because they are robust and trustworthy operation systems.
GFS
Andrew File System (AFS) Avaki DataGrid IBM GPFS Linux IBM AIX v5
Operating System
The key to the high performance and scalability of the General Parallel File System (GPFS) implementation is its intrinsic parallelism. Files are not localized but striped across many disks on the file system. A computing node accessing a file can use multiple network paths to perform the access, avoiding network bottlenecks and increasing the availability of the data.
74
Another important characteristic of the GPFS is its compliance with the Portable Operating System Interface (POSIX) standard. The POSIX is a series of standards being developed by the IEEE that specify a Portable Operating System interface (the IX denotes the UNIX heritage of these standards). Its most important definition is a set of programming interface standards governing how to write application source code so that the applications are portable between operating systems.
GPFS 2.3 main characteristics

GPFS offers high scalability and high availability by allowing multiple servers and multiple disks to serve the same file system. If a server fails, the file system is still available as long as another server has access to the disks containing the data and a network path to the client. If a disk fails, GPFS will continue providing access to the file system as long as the data contained in the failed disk is not file system metadata but user data, or as long as it has been replicated. User and file system data can be replicated or mirrored to provide an even more highly available environment. It provides global access (or uniform access) to files. That is, it is possible to mount a GPFS file system from every node on a system, making applications much easier to write. Applications using standard system calls to manipulate files can take immediate advantage of GPFS. Nodes running GPFS: For these nodes, mounting a GPFS file system is the same as mounting any Journaling File System (JFS) file system. The mounting has no syntax difference with the local mounting done with JFS. At creation time, GPFS file systems can be set to be mounted automatically when nodes start up. Nodes not running GPFS: For these nodes, GPFS file systems are made available through Network File System (NFS). Nodes can mount GPFS file systems the same way they mount any NFS file system. To specify a GPFS file system and a server, these nodes must point to any node running GPFS. The GPFS file system must be exported to these nodes, just like any other local file system made available via NFS. As far as the OGSA layer is concerned, it has to provide: Security over the transactions performed on the GFS Full integration among independent GFS mountings Quality of service for reading and writing tasks In this implementation, custom grid services have been developed to accomplish this task.
75
More information about the GPFS can be found at:

http://www.ibm.com/software/sw-atoz/indexG.html http://www.ibm.com/redbooks
7.4 Implementation
This section presents the implementation level already achieved in this site and the next steps to be followed for a complete deployment.
In the first step of implementation of the grid solution, four research institutions that have IBM equipment only are being integrated. A high throughput network was built between the institutions as basic infrastructure. The following resources are available on the grid: Over 4000 nodes Integrated peak performance of 24 Teraflops 125 racks spread over 3 countries (Germany, France, Italy) Capacity to accumulate at 5-8 petabytes/year
Next steps
The next steps of implementation of this grid will be as follows: It will be expanded to other research institutions and universities using other equipment of other brands. An upgrade of the dedicated network interconnect (from 1 Gb/s to 10 Gb/s) is also scheduled. Tens of petabytes are planned by 2007-2008, with an exabyte approximately 5-7 years later.
7.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies dealing with massive quantities of data stored on high-storage capacity infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the research institutions could afford to have.
76
In this chapter, we presented a grid-based implementation for a high-capacity storage computing infrastructure suitable for most of these researching demands. This infrastructure should be able to deliver high-performance computing at a much lower cost, which makes grid computing so appealing to research institutions. In addition, this particular implementation made use of the GFS standard, which simplified greatly its implementation.
77
78
Chapter 8.
e-Learning
In this chapter we discuss the following topic: Implementation of a network grid supporting an e-learning infrastructure that embraces many of the requirements for exchanging information in the educational and research fields
79
8.1 Introduction
In this example we present a grid environment to support many educational and research requirements for exchanging information. Knowing the main ways that education can benefit from a grid, we can draw up the basic technological needs associated with the development of e-learning. The e-learning infrastructure presented in this chapter is based on Access Grid. Note: Access Grid is an ensemble of resources including multimedia large-format displays, presentation and interactive environments, and interfaces to grid middleware and to visualization environments.The AG technology was developed by the Futures Laboratory at Argonne National Laboratory and is deployed by the NCSA PACI Alliance. For more information about the AccessGrid, refer to the following Web site:
http://www.accessgrid.org/

Formal education is undeniably one of the most important aspects of social development; nowadays, public and private institutions regard education not only as a basic need, but also as a strategic field of investment for providing a better quality of life for the population. Additionally, information technologies play a major role in the future of education, as they break barriers and make educational content widely available. More specifically, the main ways that such technologies can leverage educational activities are as follows: Creating virtual classrooms by interconnecting lecturers to geographically scattered students Making educational material, such as tutorials and recorded lectures, available worldwide through high-storage infrastructures Digitalizing and making books available through high-storage infrastructures Integrating library search engines and digital content As one can easily see, there are two basic technological pillars supporting all these possibilities: connecting people live and making digital content widely available. Actually, nowadays computing infrastructures have already reached a fairly good level of integration and development, as a huge amount of content can be accessed over the Internet, but the establishment of a worldwide learning framework still poses major challenges to computer engineers and scientists.
80
This chapter discusses how grid technologies can be deployed to tackle the problem of building up such framework.

Knowing the main ways that education can benefit from information technology, we can easily draw up the basic technological needs associated with the development of e-learning. They are as follows: High throughput duplex communication: When talking about broadcasting lectures, one important thing that has to be stressed is the full-duplex nature of the communication links. Actually, as opposed to ordinary TV and radio broadcasts, in a lecture, audience feedback plays a major role as it guides the lecturer through the multiple topics that are being exposed. Different audiences may require slightly different approaches, and this is basically what makes live presentations much richer than pre-recorded ones. Additionally, if not for the audience feedback possibility, the lecture could be digitally recorded and made available through networks or cheap media devices (which would be more convenient for students as they would be able to choose their own time to watch it). High storage capacity: One of the easiest and fastest ways to make educational content available is by recording lectures. As mentioned in the previous item, pre-recorded lectures are not as rich as live broadcast ones, even though they can be a very useful source of educational material. However, the main drawback about recording lectures is that the better the quality of the recording, the larger the file that stores it; thus, a one hour lecture recorded with reasonable video and audio quality may result in hundreds of megabytes of data. A single undergraduate course may comprise several thousands of hours of lectures, and if all the educational content of a faculty were to be digitalized, then huge capacity storage devices would be required. What if all the educational content of major universities were to be made available through digital video files? System integration: Providing computational infrastructures for storing and broadcasting live lectures is a need, but a higher level system integration has to be accomplished among the educational institutions so that such lectures, and even more content, can be efficiently made available through a large audience. Two typical examples that clearly illustrate the need for a multi-institution system integration is organizing off-line material on a single comprehensive directory and scheduling live lectures to avoid or minimize collisions.
Chapter 8. e-Learning
81
Having these needs in mind, we can design a grid-based solution for leveraging e-learning.
8.2 Case analysis

There are a number of ways that technology can be deployed to solve the needs presented in the previous section: satellite broadcasting have been on the road for over 20 years, and nowadays, database technologies can handle virtually unlimited amounts of data efficiently. However, we believe that grid technologies, which rely on non-specialized existing computing resources, can provide a cost-effective solution for such problems. In this section, we analyze the requirements and use-cases of an e-learning framework.
8.2.1 Requirements
Functional
The main functional requirements for this solution are as follows:
Simplex video broadcasting: When broadcasting a lecture, transmitting a high-quality image of the lecturer along with a blackboard is generally a desired requirement. Although one can argue that such visual information can be substituted by written and graphical content, such as formulas and maps, it has been proven that the audience tends to lose the focus on the presentation more easily when they are unable to associate the content that is presented with the lecturer. On the other hand, such visual contact does not need to be established from the audience to the lecturer, which means that the video streaming goes only one-way.
In other collaborative environments, such as virtual conferences, all the parties may broadcast video signal, but this case is not covered in this example (although all its implications can be easily deduced from the context presented here).
Duplex audio broadcasting: In contrast to the video broadcasting philosophy,

audio information has to be transmitted in both directions so that an efficient contact can be established between the parties (lecturer and audience). This type of application requires high-quality audio channels, but they tend to consume much less network throughput than any quality video transmission.
82
Scheduling of live lectures: This system must offer users a comprehensive

framework for scheduling live lectures so that the collisions are avoided or minimized. This must include a comprehensive search engine for checking scheduled lectures, a proper interface for submitting new lectures, and a scheduler that best fits the lectures submission requirements with the availability of resources (broadcasting rooms).
Storage capacity: Off-line content also plays a major role in e-learning; this
includes pre-recorded video and/or audio lectures, tutorials, articles, books, and so on. As mentioned before, this requires huge storage capacity that may easily scale to several terabytes.
Unified access to stored content: The system must provide users with a
comprehensive and unified listing of all the content stored in its data-bases. This helps not only the access to such content but also makes storage more efficient, as avoids unnecessary data replications.
Non-functional
Security (authentication): The system must provide facilities for authorizing

users to access only the contents that they are eligible to. This not only secures sensitive content from unauthorized access, but also makes the system more scalable, as it avoids abusive and/or excessive user access.
Performance: The system must provide good audio and video quality for the
live lectures, and good network throughput for downloading off-line content. A minimum requirement is a stable flow of at least 50 kilobytes/sec from the lecturer to the audience and 10 kilobytes/sec going the other way.
Scalability: The system has to be scalable to a potentially unlimited number

of parties.
User-friendly operation interface: The system must be easily operated so

that non-specialized users, from lecturers to students, can access it without extra help.
8.2.2 Use-cases
Use-cases diagram
Figure 8-1 presents the use-cases diagram for this example:
83
schedule lecture present live lecture store off-line content user authentication subscribe to a live lecture participate in a live lecture m anage content organazation m anage security m onitor and m aintain system assist live lectures
O perator
Professor
Adm inistrator
Student
dow nload offline content
Users description
Professor: This is the person responsible for presenting lectures and preparing off-line content. This person is normally a professor, especially for lectures, or one of their assistants. Student: These are persons who attend lectures and download educational
material from the system. They can be anyone interested in getting education from the system, including professors.
Operator: This is the person responsible for assisting professors and

students during the presentation of a lecture as they prepare the environment for the broadcast (sending and receiving).
Administrator: This is the person responsible for performing the basic management tasks in the system. This role is typically personified by a system analyst or a network analyst.
84
User authentication: This use-case represents the procedure which all the users must go through before using the system resources. It is typically accomplished by typing in a username and password combination, and this may be done transparently, when the user is logging in to a workstation. Schedule lecture: In this use-case, the professor schedules a live lecture
using the systems interface. The process of scheduling a lecture is bound to such restrictions as the availability of broadcasting rooms and the availability of the lecturer and the target audience. Once the lecture is scheduled, it can be published to the target community.
Subscribe to a lecture: This is the process by which a student searches

through the set of available lectures and subscribes to attend to a desired lecture. Subscribing to a lecture may mean personal participation by getting connected via a personal computer, or participation by going to a classroom where the lecture is to be presented.
Present the lecture: In this use-case, the professor presents a lecture in a broadcasting room assisted by an operator. Besides presenting the subject to the audience using an ordinary blackboard, the professor may make use of interactive tools for including electronic graphics and text in their presentation. Participate in a live lecture: Depending on the lecture setup and/or the number of students attending it, students can join it by using a personal computer or by going to a classroom where the lecture is being broadcast. Assist live lectures: Operators may assist both professors in broadcasting
rooms and students in classrooms. In either case, the operators role is to assure that the technical infrastructure is correctly set for the lecture session.
Store off-line content: The professors may also store educational material for download by the students. Such material might be recorded live lectures, audio classes, documents, tutorials, etc. The submission itself has to be performed by an interactive interface so that the material is correctly documented and made available to the right audience. Manage content organization: The off-line content has to be organized in
subject-oriented forums and disciplines. The creation of such a structure should be requested from and performed by the administrator, who is the person responsible for maintaining it.
85
Manage security: The administrator is responsible for creating user accounts,

defining security policies, checking users activity, and other tasks necessary to assure the security of the system.
Monitor and maintain system: The administrator is the person responsible for
monitoring and tuning the system so that bottlenecks are detected and eliminated. Additionally, this person has to provide technical support for the users and make available all the tools needed to participate into the community.
8.3 Case design

As we can see, all the requirements presented in the previous section address problems typically tackled by grid solutions: high storage capacity, high network throughput, and system integration. There are more conventional technologies that could be applied to this implementation, but we strongly believe that grid technology offers the best cost-effective comprehensive solution. In the following sections we design a grid-based solution for such an e-learning framework.

Figure 8-2 illustrates the schema presented in the previous sections.
86
Figure 8-2 e-learning framework schema
Figure 8-3 presents the software component architecture used in this implementation.
Collaborative Applications
Basic Communication and Storage Grid Services OGSA
Operating System Physical Infrastructure

Figure 8-3 Software components architecture
87

The figures presented in the previous sub-section illustrate the general schema adopted in this framework and the software component architecture, respectively. Their main components are as follows. These are the components for the physical interconnection architecture:
Classroom: This is the physical environment to which the lecture is broadcast

and from which the students might join it. In addition to having the physical facilities for housing everyone attending the lecture, it has all the necessary equipment to integrate it into the lecture virtual room. Equipment includes a computer to which a projector and a microphone (or a video-camera) are connected, and these are managed by a specialized operator.
Individual site: This denotes the places from which the students may join the conference using personal computing resources. They might be widely scattered and must provide all the facilities for participating in the lecture. Broadcasting room: This is the place from which the professor broadcasts the
lecture to all the students sitting in classrooms or individual sites. The technical resources are the same as those found in classrooms, with the exception that a video-camera is imperative. An operator assists the professor with the management of such equipment (just like in the classrooms).
Grid portal: This is the portal by which users perform activities such as submitting lectures, subscribing to lectures, and uploading and downloading off-line content.
These are the components for the software component architecture:
Physical layer / Operating System: This is the layer that comprises all the
computers used for building up the e-learning framework as well as the basic operating system they run.
OGSA: This is the layer where the basic grid platform sits. This software is
responsible for providing the grid infrastructure upon which the basic grid services, for high performance storage and communication, are implemented. Thus, it offers the basic tools for open standard communication and storage throughout the grid.
Basic Communication and Storage Grid Services: These are the grid services
that implement the high-level storage and communications functionality. Thus, they offer the collaborative applications a standard interface for storing content and receiving and/or sending streaming audio and video signals.
88
Collaborative Applications: These are the applications that manage the

audio and video signals and all the other collaborative tools that are used in a lecture, like virtual black-boards, remote presentation engines, and so on. Such applications are connected to the grid by the software layers underneath.

Table 8-1 presents the product selection for this grid.
Table 8-1 Architectural decisions and product selection Component Physical infrastructure / operating system All Products Chosen product The grid layer should be multi-platform so that computing resources of every sort can join the grid. In this sense, virtually every type of computer running any OS is supported. In this specific implementation, Intel-based computers running RedHat Linux are preferred. Globus Toolkit was chosen as it is the default OGSA implementation available. These components are not available as general purpose end-user products in the market. In most implementations where they are required, they have to be developed for fulfilling specific needs. In this case, custom code was integrated with products like IBM DB2 and IBM DB2 Content Manager to deliver high-quality storage and communication services.
OGSA Toolkit
Globus Toolkit
Grid Services
None
89
Component Collaborative Applications
Products IBM Lotus Learning Management System (LMS) Distributed Power Point and Remote Power Point Peer-to-Group Media Broadcast or Kontiki
Chosen product There are a number of collaborative applications that might be set in place depending of the requirements of the specific lectures to be presented. The products listed are for managing off-line content, live-presentations, and collaborative virtual conferences. Grid System Gateway was chosen due to the reliability and technical support availability requirements.
Grid Portal
IBM WebSphere Application Server BEA WebLogic Server Grid System Gateway JBoss Tomcat
It is important to mention that the integration between the independent tools and products chosen in this design was possible due to the Open Grid Services Architecture - Data Access Integration (OGSA-DAI). OGSA-DAI is a project developed by UK Database Task Force, whose objective is to provide a standard interface for a distributed query processing system to access data in different data sources. For more information, refer to the following Web sites:
http://www.ogsa-dai.org.uk http://www.ibm.com/software/data/cm/
Among the collaborative applications adopted is an IBM solution for Learning Management Systems (LMS). With this tool, administrators can realize the management of several tasks related to the e-learning process, besides creating portals to ease the access of information to the users. For more information about Lotus Learning Management System, go to:
http://www.lotus.com/learning
For more information on the other collaborative tools, refer to the following Web sites:
http://www.accessgrid.org/agdp/guide/dppt.html http://scv.bu.edu/accessgrid/seminars/rppt.html
90
http://www-mice.cs.ucl.ac.uk/multimedia/software/nte/
8.4 Implementation
This section presents the implementation level already achieved and the next steps to be followed for a complete deployment.
The implementation level reached so far embraces: 4 educational institutions Over 100 users (among professors and regular students) 2 Mbps of average bandwidth for broadcasting lectures 5 terabytes of storage space for off-line educational content
Next steps
In the next two years, this implementation is expected to scale to: Around a dozen institutions across Europe participating Over 1000 users 5 Mbps of average bandwidth for broadcasting lectures 25 terabytes of storage space for off-line educational content
8.5 Conclusion
In this chapter we presented an example of how grid technologies can be used to build an e-learning framework able to connect a potentially unbounded number of professors and students. In this case, we believe that grid technologies are appealing due to the fact that the requirements for building such framework match very closely what a grid can offer in terms of computational resources. Additionally, a grid offers a much more cost-effective solution, as it employs existing low-cost and non-specialized computing resources to do the job. Finally, we strongly believe that the technological revolution that the grid is about to bring will definitely change the way that people deal with information and, ultimately, knowledge. In this sense, grid computing and e-learning form a perfect match.
91
92
Chapter 9.
Visualization
In this chapter we discuss the following topic: Grid implementation to support the field of advanced scientific visualization
93
9.1 Introduction
In this example we present a grid implementation to support the field of advanced scientific visualization. The area of visualization is evolving as it addresses emerging and continuing issues, from interactive and batch rendering of terascale data sets, through remote visualization. At the same time, universities in general have a lot of heterogeneity. This is due to using many low-cost resources from different suppliers, running different systems through advanced computing resources such as super computers, advanced visualization systems, etc., with most of them segregated in specific departments for local access only. Note: This example of grid implementation is inspired by the scientific visualization the requirements from the Americas largest campus grid, University of Texas at Austin. More information about this campus grid can be found at the following Web sites:
http://www.tacc.utexas.edu/ http://www.tacc.utexas.edu/projects/grid_vis.php http://www.ibm.com/grid/grid_press/pr_ut.shtml

Most of the time universities provide comprehensive advanced computing resources, in particular: high performance computing (HPC), massive data storage and archiving and advanced scientific visualization systems. However, those resources are not always available to people that really need them. Researchers from one department may not be able to access resources of another department, for instance. Resources sharing makes possible for an organization to plan computing resources based on global demand. Resources can be added anywhere and made available to the entire organization. Global policies for sharing can be implemented so that the university can optimize its computers usage. The project goal is to attempt to integrate all IT resources across the university campus creating a virtual organization in which users could access any resource regardless where it is. Within a virtual organization, researchers, educators, and students will be able to access a massive computing power for simulations, data sharing, and computational calculations in an easy and simple way.
94
Through a grid portal, users have a single point to submit jobs, to verify status jobs, and to submit input data, as can be seen in Figure 9-1.
Figure 9-1 A user s point of view
Chapter 9. Visualization
95

Integrating computing resources so that they can be made available throughout a multi-corporation environment is much more of a challenge than simply interconnecting scattered computing sites. Actually, the bigger and more complex a computing infrastructure gets, the more difficult it is to be administrated and used. Users could get so overwhelmed by the wide range of possibilities that they might not use them efficiently. Additionally, such a system has to be scalable from the administration perspective as an increasing number of resources and users usually means increasing administrative issues. In this sense, we look for a solution that integrates heterogeneous resources in such a way that users do not have to be concerned with this underlying heterogeneity, and administration can be coordinated in a scalable way.
9.2 Case analysis

In this section, we analyze how a grid computing solution may be built to tackle the problem described in the previous section. The idea is to create a collaboration environment encompassing the IT and, in particular, the visualization resources. The environment, called Virtual Organization (VO), will provide to researchers an easier, faster, and integrated way to handle their needs.
9.2.1 Requirements
Functional
To create a single view of entire IT resources, we must use and implement the following grid technologies: A portal to enable a single view of the entire resources: A virtualized view of the computing power devices A virtualized view of the data storage devices A scheduler to manage the distribution of jobs around the resources An engine to manage the distribution of data around the resources Integrate and leverage the use of visualization resources: Forward results to the specified visualization resource Support advanced reservations on the visualization resource
96
Non-functional
Performance: As far as visualization is concerned, the system should be able to manage the data in a way to minimize the latency between the end of a computation process and the analysis of the results. This normally implies that the system has to provide a persistent data store and a means to replicate that data store or pre-stage data. This provides the capability to place the data close to the consuming application for optimal performance. Scalability: The system must support the ability to add resources, thereby
increasing computing and/or storage capacity without significant loss of performance. The system architecture has to be scalable to support arbitrarily large virtual organizations.
Security: The system has to allow authenticated and authorized users access
to grid resources without requiring them to authenticate on each resource. The system has to prevent unauthorized access to data.
Data integrity: Changes in system data have to be propagated automatically to any replicas within the grid. Availability: The system must be available for job submission in the event of
failure of any of its resources. The system must support the ability to restart failed jobs and run them to completion on a similar resource in the event of a single system failure.
Reliability: The system must be reliable in the correct execution of a model in the event of server and/or disk failures. Also, the system must hold jobs in the queue until notified of successful completion, thus providing the ability to re-start failed jobs. Maintainability: The system must allow the use of heterogeneous hardware
platforms and operating systems. Also, the system must allow the use of geographically distributed resources.
97
9.2.2 Use-cases
This section presents the use-cases that define and illustrates the use of the proposed solution.
Use-cases diagram
Figure 9-2 illustrates the main users and how they interact with the main use-cases.
Logon
Create User Account
Submit Jobs
Grid User
Verify Job Status
Delete User Account
Grid Admin
Cancel Jobs Manage Resources Visualize results
Figure 9-2 Use-case diagram
Users description
Grid Portal User: This role refers to the researchers who are responsible for performing the tasks of everyday research. Grid Administrators: This is the person responsible for performing
management tasks, such as user account creation and deletion, grid resources management and so on.
98
The use-cases can be briefly described as follows:
Logon: The user logs on to the grid portal. As precondition, the user has to
have an account and a certificate signed by a trusted CA.
Submit job: The user (after a successful logon) submits a job to the grid
through the grid portal. Verify job status: The user may verify his jobs status; if it is done or not, for example.
Cancel job: The user cancels a specific job. Visualize results: The user, through a visualization tool, visualizes the results. Create user account: The administrator creates the user account, defining the
profile and which resources they are able to access (after a successful logon).
Delete user account: The administrator can also delete a user account when
this becomes necessary.
Manage resources: The administrator performs similar actions on numerous resources. The actions would typically include viewing resource information, adding an additional resource, editing resource information, or deleting a resource.
9.3 Case design

99

Figure 9-3 illustrates the model for this grid computing solution.
G r id U ser
G r id P o rta l
G R A M
G rl io b T P G d F u s
G S I
G IS
S c h e d u le r
C lu s te r s
Figure 9-3 Diagram model
100

Users, through a simple Web browser, access the grid portal to browse resources, to submit jobs, to obtain jobs status, and to cancel jobs. The portal interface uses static information about the resources to select the subset of grid resources that meet the job requirements. If multiple resources meet the job requirements, dynamic information is obtained from those resources to determine the current load, available resources, and queue depth. These and other factors are evaluated by the scheduler to determine the best resource for the job. Both static and dynamic information are obtained from the resources using an agent installed in each node. After a successful submission and the job has been processed, users may open their visualization tool to access the simulation and visualize the result.

Table 9-1 presents the product selection for this grid.
Table 9-1 Architectural decisions and product selection Component OGSA Toolkit Products IBM Grid Toolbox Chosen product Provides good management tools and has a built-in Web services engine. Integration and performance. Simple integration with portals.
Grid Portal Scheduler
IBM WebSphere Portal Platform LSF Multicluster
9.4 Implementation
The solution provided in this chapter is not implemented. Based on users demands and organization resources, the architecture was developed. After implementation, the time needed for users (researchers) to submit and visualize a simulation will decrease significantly.
Next steps
The next steps for this project are its implementation and deployment.
101
9.5 Conclusion
Nowadays, performing research on cutting-edge technologies quite often implies making use of heterogeneous tools and computing resources. Such diversity of building blocks has always imposed major difficulties to researchers and technical staffs that needed to accomplish time-critical and/or time-consuming tasks on a daily basis. In this chapter, we presented a grid-based implementation for integrating computing resources so that the everyday tasks performed in research institutions could be carried out more easily and efficiently. In particular, we analyzed the impact of such implementation over scientific visualization tasks, which traditionally are performed on heterogeneous software and hardware environments. Finally, we are sure that the grid philosophy will greatly affect the way that research is performed in research institutions worldwide.
102
10
Chapter 10.
Microprocessor design
In this chapter we discuss the following topic: A computational grid implementation that helps to reduce the microprocessor design cycle and also allows the design centers to share their resources more efficiently
103
10.1 Introduction
In this example we present a computational grid solution that helps to reduce the microprocessor development cycle and also allows the design centers to share their resources more efficiently. Microprocessor design and microprocessor verification simulation require a massive computational power. Note: A similar solution has been in place for more than 10 years in the Microprocessor Design Group at IBM Austin, TX. They design chips for the IBM Eserver high-performance systems, running thousands of simulations to verify timing closure. More information about this grid solution can be found at the following Web site:
http://www.ibm.com/software/success/cssdb.nsf/CS/BEMY-645U26?OpenDocument&Site=software

In recent years, competition of the manufacture industry has become fierce, driving industries toward higher quality production with a challenging time to market while reducing expenses. To succeed in accomplishing such tasks, one of the most important processes is testing simulation, and this process usually needs massive calculation, typically realized by huge computing power and high volume storage. In this example, a solution for testing simulation of a processor verification is described.

In this section, the specific needs imposed by the business context are described as a way to introduce the problem that is tackled. Profit-driven organizations are always under pressure to grow returns and to use their resources more productively. At the same time, a research and development center which supports such organizations is required to display a similar flexibility and agility, responding quickly to market signals. The production schedules are continually getting more aggressive to meet competitive pressures. The verification problem demands a sophisticated simulation infrastructure, encompassing both hardware and software components, that work in a fully integrated fashion.
104
10.2 Case analysis

10.2.1 Requirements
Functional
To satisfy the requirements described above, here are the recommendations: Store 5-10 terabytes of data on a distributed file system. Achieve peak computing power of approximately 2 teraflops. Provide tools to improve the bug removal rate. Provide tools to reduce human-resources cost needed to submit 100,000+ tests a day.
Non-functional
Non-functional requirements define the value-added goals of the project that are not defined in use-cases such as these:
Performance: The system should support the ability to add computational

resources, thereby increasing computing capacity.
Robustness: The system should be available for job submission in the event of
failure of any computing resource of the grid.
Resilient: The system should be reliable in the correct execution of a model in

the event of server and/or disk failures. Also, the system should hold jobs in the queue until notified of successful completion, thus providing the ability to re-start failed jobs.
Security: The system should prevent unauthorized access to data. Scalability: The system should support the ability to add resources, thereby
increasing computing and/or storage capacity. Also, the system architecture should be scalable to support arbitrarily large virtual organizations.
Chapter 10. Microprocessor design
105
10.2.2 Use-cases
Use-cases diagram
Figure 10-1shows the use-case diagram.
Submitter
Simulation
Debugger
Figure 10-1 Use-case diagram
Users description
Submitter : This is the person responsible to define the submission control, the
job requirements, the pass/fail thresholds, and the tests, setting up the parameters and test stimulus. Additionally, they are responsible for the submission of job requests.
Debugger : This is the engineer responsible for analyzing failed jobs to track
bugs and bottlenecks. This person is also responsible for rerunning models.
The use-cases can be briefly described as follows:
Simulation: The simulation action gets the simulation request, in which there are a number of parameters such as submission control, tests, and model location, and dispatches the request to the available machines for processing.
106
10.3 Case design

In this section, we design a grid computing solution that fulfills the requirements drawn up from the analysis presented in the previous section.

Figure 10-2 shows the component model diagram.
Grid computing
Austin
Burlington
Raleigh grid gateway Workstation San Jose

Engineer Distrbuted File System
POK

Software client: This component provides a general-purpose and user-friendly interface to the grid system. Job submission and receiving results are performed through this component. Grid Gateway: This component manages workload management and simulation submission on a fully-connected grid infrastructure. It manages pools of processing resources scaling to approximately 7,000 processors from all sites. Workstations: These components run the model design software, the client software to access the grid, and the grid agent.
107
Distributed File System (DFS): Every workstation in each site mounts a

directory tree over a DFS. Through the mounted tree, the nodes access the design model to perform the simulation. The DFS has an important role in the architecture, as the authentication and remote access is performed over it.

Table 10-1 takes into account all the information provided by the previous sections and presents the product selection for the proposed solution.
Table 10-1 A typical product selection Component Grid gateway / Job manager Workstation Products IBM LoadLeveller AIX Chosen product This product is best suited for AIX environments. All design tools are developed for AIX, since it has proved to be a robust system. Plug-ins mechanisms are the best way to add value to the Java UI. This is an advanced data management and file system with strong security feature.
Software client
Eclipse
File systems
IBM AFS (Andrew File System)
10.4 Implementation
This implementation involves several physical locations, and its first version was delivered in the middle 80's. It has been evolving, and nowadays there are over 7,000 processors joining this computational grid.
Next steps
These are the next steps for implementation of this project: Add Eclipse plug-ins to enhance usability. Investigate replacing current servers and database with a WebSphere/DB2 implementation. Investigate expanding scope of plug-ins to include other simulation tools.
108
10.5 Conclusion
Nowadays, performing research and development on cutting-edge technologies quite often implies making use of high-performance computing infrastructures. Such infrastructures have always been implemented by expensive and inflexible computing systems that only a small parcel of the institutions could afford to have. In this chapter, we presented a grid-based implementation for a high-performance computing infrastructure suitable for a highly technological application: the development of microprocessor chips. This infrastructure is able to deliver high-performance computing at a rather lower cost when compared to an equivalent super-computer. We recognize that a number of compute intensive applications might not benefit from this technology, but we are sure that the grid philosophy will greatly affect the way that research and development are performed by industries and research institutions.
109
110
Part 3
Part
Appendixes
This part of the book includes the following appendixes: Appendix A, TeraGrid on page 113 Appendix B, Research oriented grid on page 121
111
112
Appendix A.
TeraGrid
In this appendix we present the following topic: An overview of the TeraGrid project
113
Introduction
This cyber-infrastructure aims to solve the problem of emerging terascale applications. It encompasses computing intensive applications, requiring multiple teraflop computing systems (HPC). Data intensive systems, necessitating creation or mining of multi-terabyte data archives to extract insights (Visualization) and others, must be coupled to scientific instruments, such as microscopes and telescopes (remote instrumentation). Note: TeraGrid is a effort to build the world's largest and fastest grid environment, launched by the NSF in 2001. For more information about TeraGrid, refer to the following Web sites:
http://www.teragrid.org/ http://www.nsf.gov/ http://www.ibm.com/press/PressServletForm.wss?MenuChoice=pressreleases&Templ ateName=ShowPressReleaseTemplate&SelectString=t1.docunid=1137&TableName=Data headApplicationClass&SESSIONKEY=any&WindowTitle=Press+Release&STATUS=publish
The present status of the TeraGrid project is a combination of three programs within the NSF (National Science Foundation) Terascale initiative: Terascale Computing System (TCS), Distributed Terascale Facility (DTF), and Extensible Terascale Facility (ETF). It attempts to create an infrastructure of unbounded capability and scope connecting universities and organizations by a cross-country network backbone, the fastest research networks currently in existence. It enables rapid access to remote resources and allows users to hide latency via aggressive data staging.
Organization
The project currently integrates nine major super computing sites across the US, as seen in Figure A-1.
114
ANL
Purdue
CACR NCSA IU
PSC
ORNL SDSC TACC
Figure A-1 TeraGrid overview
Each of these sites contributes with their resources and expertise to create a cyber-infrastructure for scientific research. They are: The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign: It offers 10 teraflops of capability computing through its IBM Linux cluster, which consists of 1,776 Itanium2 processors. In addition to the processing power capability, the NCSA also includes 600 terabytes of secondary storage and 2 petabytes of archival storage capacity. The San Diego Supercomputer Center (SDSC) at the University of California at San Diego: It leads the TeraGrid data and knowledge management effort. It provides a data-intensive IBM Linux cluster based on Itanium processors, that reaches over 4 teraflops and 540 terabytes of network disk storage. In addition, a portion of SDSCs IBM 10-teraflops supercomputer is assigned to the TeraGrid. An IBM HPSS archive currently stores a petabyte of data. Argonne National Laboratory: It provides users with high-resolution rendering and remote visualization capabilities via a 1-teraflop IBM Linux cluster with parallel visualization hardware. The Center for Advanced Computing Research (CACR) at the California Institute of Technology (Caltech): It provides online access to very large scientific data collections in astronomy and high-energy physics. In addition, it provides application expertise in the fields of geophysics and neutron science.
Appendix A. TeraGrid
115
The Pittsburgh Supercomputing Center (PSC): It provides computational power via its 3,000-processor HP AlphaServer system, TCS-1, which offers 6 teraflops of capability coupled uniquely to a 21-node visualization system. It also provides a 128-processor, 512-gigabyte shared-memory HP Marvel system, a 150-terabyte disk cache, and a mass storage system with a capacity of 2.4 petabytes. Oak Ridge National Laboratory (ORNL): In this case it is more of a user than a provider. Their users of neutron science facilities (the High Flux Isotope Reactor and the Spallation Neutron Source) will be able to access TeraGrid resources and services for their data storage, analysis, and simulation. Purdue and Indiana University (IU): They provide 6 teraflops of computing capability, 400 terabytes of data storage capacity, visualization resources, access to life science data sets, and a connection to the Purdue Terrestrial Observatory. The Texas Advanced Computing Center (TACC) at The University of Texas at Austin: It provides a 1024-processor Cray/Dell Xeon-based Linux cluster, a 128-processor Sun E25K terascale visualization machine with 512 gigabytes of shared memory, for a total of 6.75 teraflops of computing/visualization capacity, in addition to a 50 terabyte Sun storage area network. Only half of the cycles produced by these resources are available to TeraGrid users. In addition, there are major companies, like IBM, that also contribute with expertise and products to the project. IBM, specifically, is the most important contributor, as it provides expertise on high performance computing, GPFS, Linux Clusters, and Power4 processors. Through its nine resource sites, the TeraGrid offers advanced computational, visualization, instrumentation, and data resources. Currently the sites are interconnected through a 40 Gbps network backbone at rates from 10 to 30 Gbps. Together, the sites are providing more than 40 teraflops of computing power and more than 1 petabyte of disk accessible data storage. This infrastructure is enabling scientists to work on advanced research, such as: Real time brain mapping Earthquake modeling Molecular dynamics simulation MCell, Monte Carlo simulation of cellular micro physiology Encyclopedia of life
116
High level architecture

TeraGrid uses grid technologies to support the integration of resources into a virtual organization (VO), but it is in fact formed by resources independently controlled by individual sites. The VO involves a set of service specifications (interfaces) that describe the capabilities and behavior of a resource without assigning it to a particular implementation. The interface oriented architecture enforces sites to adhere to its interfaces in order to join the TeraGrid. The interfaces are split into three foundation layers. Figure A-2 shows the layers relationship and describes their functionality and examples of implementation. The implementation of these interfaces is helped by the availability of a standard release, specifically, the NSF Middleware Initiative (NMI).
Functionality
Super schedulers, MIPCH-G2
Implementation
SRB, MPICH-G2, dsitributed Accounting
Advanced Grid Services
Information Service, job scheduling, monitoring
Core Grid Services
GASS, MDS, Condor-G, NWS
Basic Grid Services Authentication, Access Resource Allocation, Resource Information Service
GSI-SSH, GRAM, Condor, GridFTP, GRIS
Figure A-2 Layers diagram
117
Management
The TeraGrid sites are autonomously managed, but issues such as distributed accounting, authentication, certificates, sign-on and distributed applications management, are managed by the Coordinated TeraGrid Software and Services (CTSS) software. This drives a common user environment across the heterogeneous resources in TeraGrid as well as supporting Grid-based capabilities.
Security
TeraGrid uses a X.509 certificate-based authentication scheme based on the Grid Security Infrastructure (GSI) protocol. The TeraGrid project evaluated both a centralized approach (all users must obtain a TeraGrid authentication certificate from a central Certificate Authority, a.k.a. CA) and an approach that allows the acceptance of certificates from an approved CA. For better scalability, the TeraGrid does not set up a TeraGrid specific CA, but rather to define TeraGrid certificate policy requirements and to accept certificates from CAs that meet these requirements.
Network Infrastructure
The design is based on a 10 Gbps system as a minimum requirement. The main backbone is connected by 40 Gbps rates; it spans from Los Angeles to Chicago. Sites connected to this backbone must follow some rules regarding the network infrastructure. They must have an aggregation switch connected to the border router, where three such channels (10 Gbps) are passed through to the TeraGrid backplane. The aggregation switch and border routers are separate for two main reasons. 1. This separation allows for local configuration changes, outages, or experiments to be done without affecting the operation of the TeraGrid backplane. 2. The requirements for switching and routing network traffic over LANs versus WANs are quite different. Enterprise IP routers (such as are being used for the border routers and internal routers) are designed to handle the necessary buffering and associated requirements for long-delay, high bandwidth wide area networks. LAN switches, on the other hand, are optimized for short-delay, low-latency connectivity as would be expected in a LAN environment. Figure A-3 shows how a site connects to the TeraGrid.
118
Backbone (40 Gbps)

Border Router
(30 Gbps)
Aggregation Switch
Figure A-3 Typical connection between sites and the TeraGrid backplane
Beneficiaries
The TeraGrid infrastructure and its terascale computing system will enable scientists to study drug interactions with cancer cells, and thereby develop better cancer drugs. It will allow them to further study the human genome and how the brain works, and allow scientists to analyze weather data so quickly that they will be able to create real-time weather forecasts that can predict, down to the precise region, where a tornado or other severe storm is likely to hit. It will help engineers design better aircraft by allowing them to do realistic simulations of new designs, and it will help scientists understand the properties of our universe and how it formed. Several institutions across US already are to benefit from the TeraGrid infrastructure, such as: The Center for Imaging Science (CIS) at Johns Hopkins University: It has deployed its shape-based morphometric tools on the TeraGrid to support the Biomedical Informatics Research Network, a National Institutes of Health initiative involving 15 universities and 22 research groups whose work centers on brain imaging of human neurological disorders and associated animal models. For more information, refer to:
http://cis.jhu.edu
119
California Institute of Technology in Pasadena: It has a project to investigate the efficiency of detecting the decay of the Higgs boson into two energetic photons. The work involves generating, simulating, reconstructing, and analyzing tens of millions of proton-proton collisions. For more information, refer to:
http://cmsinfo.cern.ch/outreach
University of California, San Diego: It has a project to simulate the evolution of the universe, through an adaptive mesh renement code for cosmology simulations (Enzo). Once the code is ported to the TeraGrid, the simulation will be from shortly after the Big Bang, through the formation of gas clouds and galaxies, all the way to the present era. For more information, refer to:
http://casswww.ucsd.edu
University of Illinois, Urbana-Champaign: It has a project that uses massive parallelism on the TeraGrid for major advances in the understanding of membrane proteins. Another project is also harnessing the TeraGrid to attack problems in the mechanisms of bioenergetic proteins, the recognition and regulation of DNA by proteins, the molecular basis of lipid metabolism, and the mechanical properties of cells. For more information, refer to:
http://www.ks.uiuc.edu/#res
How to join
From the technical point of view, there are two issues to be addressed in order to join the TeraGrid infrastructure.
Software perspective
Prospective TeraGrid sites must implement the interfaces specified to use TeraGrid resources or define interfaces, within TeraGrid specifications, to enable others sites to use their resources. The sites are encouraged to use the NMI software release to implement their interfaces and accomplish the TeraGrid needs.
Network perspective
Prospective TeraGrid sites must follow the network architecture defined in Figure A-3 on page 119. This means that sites must have a separate network infrastructure to join the TeraGrid backplane. This is explained in more detail in Network Infrastructure on page 118. A complete document can be found at:
http://www.teragrid.org/about/TeraGrid-Primer-Sept-02.pdf
120
Appendix B.
Research oriented grid

In this appendix we present the following topics: A hypothetical example of a research organization Involvement of multiple schedulers and multiple different components and services
121
Introduction
This is an example of an institutional research organization that has several Research and Development sites worldwide. Each site has its IT infrastructure in place, which can vary in terms of network topology, server platform, operating systems (such as Linux, Mac OS X, AIX, Windows, and OS/390), directory servers, etc. In the sites, there are a number of researches being developed that pose varying degrees of computing demand, from fast processing through huge storage capacity. As the sites are based on diverse platforms, it is not an easy task to provide comprehensive sharing of resources among them. Grid computing is taken here as the technology that fills this gap, so that sites will no longer be limited to their own capacity to perform their research. The goal of this appendix is to provide architects and technologists in general with meaningful information about applying grid computing technologies to a typical scenario. For this, all necessary steps are presented and discussed in detail. Note: The grid infrastructure presented in this chapter is based on concepts adapted from University of Texas at Austin, where they have multiple different platforms, schedulers, and heterogeneous cluster types. More information about this campus grid can be found at the following Web sites:
http://www.tacc.utexas.edu/ http://www.tacc.utexas.edu/projects/grid_vis.php http://www.ibm.com/grid/grid_press/pr_ut.shtml
The next sections cover the following aspects of the grid architecture: In Business requirements on page 122, we describe the business context in which the problem arises. In High level design on page 124, we present the design of the solution that fulfills the requirements. In Products used on page 131, we present the list of products used in this example.
Business requirements
It is the goal of the institution or company to integrate the numerous divisions and resources within the organization to share a common infrastructure for research and computation. The grid to be designed will unify and simplify the usage of the
122
diverse computational, storage, visualization, data, and instrument resources of the organization to facilitate new, powerful paradigms for research and development. This will include resources from the following hypothetical research centers: Tokyo Research Lab Singapore Research Lab Lisbon Research Lab Paris Research Lab Nowadays, each lab works in a quite independent fashion: they have their own budget, their own technical staff, and their own computational infrastructure that is managed by local administrators and is used in the research activities that take place locally. For these sites to be fully integrated, there must be implemented a grid platform capable of virtualizing both storage capacity and computing power from highly scattered computing resources. Some non-functional requirements of such a solution, such as scalability and performance, are discussed in the next section.
Non-functional requirements
This section describes the non-functional technical requirements for the solution.
Performance: The system should be able to provide computing power and

storage capacity services without the burdens of physically distributed systems, such as huge time lags due to data transmissions across distinct organizations.
Scalability: The system must be able to unlimited growth without significant loss of performance. Availability: The system must be available on a 24/7 basis, even when individual resources are unavailable. Reliability: The system must not become unavailable when one or more resources are out-of-work, no matter the reason. Maintainability: The system should be fully manageable in a scalable way,
meaning that its growth doesnt imply a proportional growth on its management complexity.
Security: The system should provide services for user authentication and authorization as well as secure information exchange between two computing resources.
Appendix B. Research oriented grid
123
Current status
In the current environment, the users have made use of the following infrastructure: IBM Eserver pSeries 690 cluster based on IBM POWER4 running IBM AIX IBM Eserver pSeries 655 nodes based on IBM POWER4+ running IBM AIX Intel Xeon based nodes running Linux operating system Intel Pentium III nodes running Linux operating systems Figure B-1 illustrates the different clusters that will be pulled together to form the grid environment.
Future B
Singapore
Tokyo
Virtual Environment
Future C
Future A Lisbon Paris
Figure B-1 Virtual environment
High level design

The IBM team and the research organizations IT team started to plan how to use grid technology and middleware in the market to create an environment where resources can be shared efficiently based on access policies. This is an attempt by the team to create a Virtual Organization (VO) where users can access any resources without knowing the exact location of the resources. Within virtual organizations, researchers and other personnel will be able to access all the available resources in a simple and easy way, as illustrated in Figure B-2.
124
G rid U sers
F irew all
G rid A d m inistra tion
G rid P ortal
Figure B-2 Virtualization organization
Components of grid technology

One of the core grid components to be built is the users interface by which jobs are submitted to the schedulers. Job schedulers will then deliver the jobs to the back-end systems running in the research centers. Additionally, there is also a small group of users who still access the system by means of old-fashioned command lines, here called node users. Having this in mind, we can state that the portal must provide facilities to: Queue a job that a user submits. Evaluate the job requirements. Dispatch the job to a suitable machine to run. Monitor and inform the user when the job is successfully completed. As we can see in Figure B-3, the architecture adopted in this solution is rather standard and addresses the particular issues of this environment with a well-known architecture.
125
Grid User Portal Meta Schedule Grid Node User

Figure B-3 High level component diagram
LL Cluster PBS Cluster Condor
Having to manage all the clusters and schedulers simultaneously, we need a common mechanism to coordinate job submissions to all schedulers. This is where the meta-scheduler comes in. In this case, the portal and the nodes will interact with a meta-scheduler, that retrieves information about the resources requirement for each job using the information providers bundled with the Globus Toolkit, that has been adopted in this solution. Figure B-4 illustrates how the information are pulled from the a cluster and queried by the meta-scheduler.
Globus Toolkit Grid User Portal Meta Schedule LL Cluster
Information Provider Grid Node User
PBS Cluster
Index Service
Condor
GPIR
Figure B-4 Globus Toolkit and meta-scheduler
126
The Community Scheduler Framework (CSF) is an open source add-on donated by Platform Computing to the Globus Toolkit Version 3.0 for the development of community schedulers. Community schedulers are commonly referred to as meta-schedulers, which are designed to accept user requests to run jobs, and map them to the resources. Community Scheduler Framework provides an intelligent, policy based meta-scheduling for building grids where there are multiple types of job schedulers involved. It is also often used in the environment, in preparation for growth. CSF is a grid meta-scheduling middleware solution that continues to provide local control over how the resources are shared, while providing transparency and interoperability among the various existing job schedulers in the research centers. It is an OGSI-compliant scheduling framework compatible with the Globus Toolkit version 3.0. Figure B-5 below shows the design and the interface between the various schedulers used in the research organization.
G r id U s e r P o r ta l
J o b S e r v ic e
G l o b u s T o o l k i t
CSF
Q u e u e S e r v ic e
LL
PBS
C ondor
In d e x S e r v ic e
LL
PBS
C ondor
Figure B-5 Submitting a job through Community Scheduler Framework
127
Apart from the broader availability of computing resources, such a solution might also leverage the way that researchers deal with their everyday computing activities: In most cases, the researchers do not just run a single job, they have a sequence of jobs to run to complete their research. Usually, what they do is wait for the first job to complete, then submit the second job, and so on. By having a workflow of the jobs to be run, researchers are free to submit the job only once. The workflow will then execute the jobs in sequence or according to the workflow created. As part of this solution, Gridports job sequencer is used, where researchers can create a workflow of the different jobs to run. In this solution, the job sequencer is a portlet within Gridport that creates and manages sequences of tasks to be submitted. A sample sequence could consists of job submission, file transfer, and resubmit the job to another resource. GSI authentication is used by the sequencer to execute the job submitted by the user. Figure B-6 illustrates the job sequencer.
Portal
Globus Toolkit CSF
Grid Port
Job Sequence User Service Step 1 Step 2 Step 3 Step 4
LL Cluster
PBS Cluster
Condor
Grid Port Information Repository

Figure B-6 Job sequencer and gridport
128
Putting this together, Figure B-7 illustrates the different components in the design for this organization. The grid portal interacts with the Community Scheduler Framework to submit, manage, and retrieve job information. The CSF interacts with MMJFS within Globus Toolkit to query the clusters in the grid infrastructure for the load information. The users can also build a workflow of a set of chained jobs to be executed via grid port, using a job sequencer.
Grid User Portal
Globus Toolkit GRAM MMJFS GRAM PBS
GPIR
Grid Port Job Sequence
Job Factory Service
GRAM LL
Globus Toolkit CSF Queue
Index Service
Index
Service
Provider
RM Scheduler Plug-in RM RM
PBS Cluster LL Cluster Condor
Queue
Figure B-7 Overall architecture diagram
User experience
By placing the clusters under the management of a grid portal, it is imperative that the user feels comfortable using it, as that implies that the users are no longer allowed to access all the servers to run the jobs. The scheduler will be running the job on the users behalf via the grid security proxy. With all the multiple components that we mentioned above, the workflow that a user follows is as illustrated in Figure B-8.
129
Access Portal
Terminate
Successful Login? Y Create Workflow Y Create Data Location N
Create Job
Submit Job
Grid Select
Appropriate Resource Y
Grid Scheduled Job
Successful
Data Output for User
Figure B-8 Workflow of a research working on the grid
130
Products used
The grid infrastructure is based on multiple products. The next list summarizes this set. Globus Globus Toolkit 3.0 is used as the middleware for the portal and other components to retrieve resource information. GRAM is used to retrieved the resource allocation to the different cluster within the infrastructure. The GSI component within Globus is used for the single sign on infrastructure. Portal The portal could be implemented by products such as IBM WebSphere or Tomcat. It will be interfacing with Globus to retrieve information about their grid environment and manage and use them interactively. GridPort GridPort, which is a software package based on JBOSS, designed to aid in the development of science portals and application interfaces on a computational grid, is used in the solution. Actually, two of its components were used in this solution: GridPort Information Repository (GPIR) and Job Sequencer. GPIR caches grid and portal related data (for example, those captured by MDS) and make them available by standard Web services. In case the data are not found, or newer data are required, GPIR will retrieved it instantly. Job sequencer within GridPort allows a sequence of jobs to be created so that users do not need to resubmit the job after each job is completed. Community Scheduler Framework (CSF) Community Scheduler Framework is used as the meta-scheduler to interface with the different types of scheduler used by the different clusters. It also interfaces with GridPort and Globus to retrieve grid resources information.
Conclusion
In this appendix, we presented a brief overview of a grid-based software implementation that aimed at integrating scattered computing resources from regional labs of a world-wide corporation.
131
132
Glossary
AFS. Andrew file system (AFS) is a distributed networked file system developed by Carnegie Mellon University as part of their Andrew Project. It is named for Andrew Carnegie and Andrew Mellon. Its primary use is in distributed computing. BLAST. Basic Local Alignment Search Tool is a guide that provides a methods for rapid searching of homology search like search of nucleotide and protein database BRICK. Emerging business countries; Brazil, Russia, India, China and Korea. CA. A Certificate Authority is (1) an instance or external institute to issue authority certificates to identify the certificate holder to use certain services; (2) in e-commerce, an organization that issues certificates, authenticates the certificate owner's identity and the services that the owner is authorized to use, issues new certificates, renews existing certificates, and revokes certificates belonging to users who no longer exist. CADD. Computer-Aided Drug Discovery CADe. Computer Aided Detection are systems created to aid doctors in the task of disease diagnostics CLI. Command Line Interface. DEISA. A consortium of leading national supercomputing centers in Europe DFS. Distributed File System
DHCP .Dynamic Host Configuration Protocol is a

client-server networking protocol. A DHCP server provides configuration parameters specific to the DHCP client host requesting, generally, information required by the host to participate on the Internet network. DHCP also provides a mechanism for allocation of IP addresses to hosts. DICOM. Digital communication in Medicine. DICOM is a standard developed by the American College of Radiology, the National Electrical Manufactures Association and other members and defines a pattern to describe medical images and other related informations, like the patient identity, the time and place at which the image was taken, the region imaged, and so on.
DNS The Domain Name System is a system that stores information about host names and domain names on networks, such as the Internet.
DPPT. Distributed PowerPoint application provides a mechanism by which a presenter can control a Microsoft PowerPoint presentation on multiple sites from a single machine. FC/IP. Fibre Channel over IP is an Internet Protocol -based storage networking technology developed by the Internet Engineering Task Force. FC/IP mechanisms enable the transmission of Fibre Channel information by tunneling data between storage area network (SAN) facilities over IP networks GFS. Global File System. It provides a system of storing files on a computer. It functions as a shared-storage journaled cluster file system.
Computed Tomography .Technique that uses

special x-ray equipment to obtain many images from different angles, and then join them together to show a cross-section of body tissues and organs. Daemon. In Unix and other computer operating systems, a daemon is a particular class of computer program that runs in the background, rather than under the direct control of a user; they are usually instantiated as processes.
133
GGF. The Global Grid Forum was founded in 2001 when the merger of regional grid organizations created a single worldwide one. Globus. A collaborative project centered at Argonne National Laboratory that is focused on enabling the application of grid concepts to computing. GPFS. General Parallel File System is a type of mountable networked file systems. GridFTP. A high-performance, secure, robust data transfer mechanism GSI. The Grid Security Infrastructure contains components to secure your grid network. H.323. H.323 is a recommendation from the ITU-T, that defines the protocols to provide audio-visual communication sessions on any packet networks.
Magnetic Resonance . A method of creating

images of the inside of opaque organs in living organisms as well as detecting the amount of bound water in geological structures. Mammography. Screening and diagnostic technique that uses low-dose X-rays to find tumors in the breast. Metadata. In storage management terminology, the information about files when the control information flows in a different path than the data. MPEG. Moving Picture Experts Group a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video. MPI. The Message Passing Interface is a computer communications protocol. It is a de facto standard for communication among the nodes running a parallel program on a distributed memory system. Multicast. The delivery of information to multiple destinations simultaneously using the most efficient strategy to deliver the messages over each link of the network only once and only create copies when the links to the destinations split. NCSA. The National Center for Supercomputing Applications is a US government institution for supercomputing. NFS. Network File System is a protocol originally developed by Sun Microsystems in 1984 and defined in RFC 1094, 1813, (3010) and 3530, as a file system which allows a computer to access files over a network as if they were on its local disks. NSD. Name Server Daemon is a server program for the Domain Name System. It was developed by NLnet Labs of Amsterdam as an authoritative name server. NTE. Network Text Editor is a shared text editor designed for use over a network.
HPC High Performance Computing comprises

computing applications on (parallel) supercomputers and computer clusters. IEEE. Institute of Electrical and Electronics Engineers is a non-profit, professional organization based in the United States. One of its most important roles is in establishing standards for computers formats and devices. iSCSI. Internet SCSI uses the SCSI protocol over a TCP/IP network JFS. A journaling file system created by IBM. It is available under an open source license. There are versions for AIX, OS/2 and Linux. Journaling file system. A file system that includes built-in backup/recovery capabilities. Changes to the index are written to a log file before the changes take effect so that if the index is corrupted (by a power failure during the index write, for example), the index can be rebuilt from the log, including the changes. LMS Learning Management System is a software system designed to facilitate management and student involvement in e-learning.
134
OGSA. The Open Grid Services Architecture is a standard setting the base for communication in grids across virtual organizations. OGSA marries open standards and grid computing protocols with Web Services, bringing together the ability to share computing resources with the ability to provide application interoperability over the Internet. OGSA-DAI. Open Grid Services Architecture Data Access Integration, project developed by UK Database Task Force whose objective is to provide a standard interface for a distributed query processing system to access data in different databases. P2G. Peer to Group. One of the topologies enhanced from the Peer-to-Peer network. The Group is dynamically formed with logical or topographical maps. PACI. Partnerships for an Advanced Computational Infrastructure is a program of the National Science Foundation's Directorate for Computer and Information Science and Engineering Particle accelerator. A particle accelerator is a scientific equipment that uses electric fields to propel charged particles to great energies. Everyday applications are found in TV sets and X-ray generators.
QoS. Quality of Service, a term used in a Service Level Agreement that denotes a guaranteed level of performance (for example, response times less than 1 second). QSAR. Quantitative Structure Activity Relationship are mathematical models that represent the relationship between a given property and the structural attributes of a chemical compound RPPT. Remote PowerPoint application, or RPPT, provides a mechanism by which a presenter can control a Microsoft PowerPoint presentation on multiple sites from a single machine. SAN. A Storage Area Network is a high-speed special-purpose network (or subnetwork) that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users.
SIMD .Single Instruction, Multiple Data it is a

computing term that refers to a set of operations for efficiently handling large quantities of data in parallel, as in a vector processor or array processor SOA. Service-Oriented Architecture is an architecture that represents software functionality as discoverable services on the network. SSL. Secure Socket Layer is a security protocol that provides communication privacy. SSL enables client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, and message forgery. SSL was developed by Netscape Communications Corp. and RSA Data Security.
Peer to Peer .Any network that does not rely on

dedicated servers for communication but instead mostly uses direct connections between clients (peers). A pure peer-to-peer network does not have the notion of clients or servers, but only equal peer nodes that simultaneously function as both clients and servers to the other nodes on the network. POSIX. Portable Operating System Interface. Although originated to refer to the original IEEE Std. 1003.1-1988, the name POSIX more correctly refers to a family of related standards: IEEE Std. 1003.n (where n is a number) and the parts of ISO/IEC 9945.
Glossary
135
TeraGrid. TeraGrid is a multi-year effort to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research. The TeraGrid project was launched by the National Science Foundation in August 2001 with $53 million in funding to four sites: the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign, the San Diego Supercomputer Center (SDSC) at the University of California, San Diego, Argonne National Laboratory in Argonne, IL, and Center for Advanced Computing Research (CACR) at the California Institute of Technology in Pasadena. Virtual Organization. A virtual entity whose users and servers are geographically apart but share their resources collectively as a larger grid. The users of the grid can be organized dynamically into a number of virtual organizations, each with different policy requirements. VPN. Virtual Private Network, a network that is constructed by using public wires to connect nodes, using encryption and other security mechanisms to ensure that only authorized users can access the network and that the data cannot be intercepted. WAN. Wide Area Network is a computer network covering a wide geographical area Web Services. A way of providing computational capabilities using standard Internet protocols and architectural elements. X.509. In cryptography, X.509 is a standard for public key infrastructure. X.509 specifies, amongst other things, standard formats for public key certificates and a certification path validation algorithm. XML. Extensible Markup Language is a W3C recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet.
Xpath. XML Path Language is a terse (non-XML) syntax for addressing portions of an XML document.
136
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see How to get IBM Redbooks on page 143. Note that some of the documents referenced here may be available in softcopy only. A CICS-to-Linux Grid Implementation, REDP-3758-00 An Information Grid Proof of Concept using Avaki Data Grid Software, REDP-3853-00 Configure Grid Security in the IBM Grid Toolbox using the Globus Certificate Service, TIPS0409 Enabling Applications for Grid Computing with Globus, SG24-6936-00 Fundamentals of Grid Computing, REDP-3613-00 Globus Toolkit 3.0 Quick Start, REDP-3697-00 GPFS: A Parallel File System, SG24-5165-00 Grid Computing with the IBM Grid Toolbox, SG24-6332-00 Grid Services Programming and Application Enablement, SG24-6100-00 Introduction to Grid Computing with Globus, SG24-6895-01
Other publications
These publications are also relevant as further information sources: Ahmar Abbas - Grid Computing: A Practical Guide to Technology and Applications - Charles River Media - 2004 - ISBN 1584502762 Alistair Cockburn - Writing Effective Use Cases - Addison-Wesley - 2001 ISBN 0201702258 Fran Berman (Editor), Geoffrey Fox (Editor), Anthony J.G. Hey (Editor) - Grid Computing: Making The Global Infrastructure a Reality - Wiley Seires in Communications, Networking & Distributed System - 2003 - ISBN 0470853190
137
GPFS: A Shared-Disk File System for Large Computing Clusters by Frank Schmuck, Roger Haskin in Proceedings of the Conference on File and Storage Technologies, 2002 Ian Foster, Carl Kesselman (editors), et altrii - The Grid 2: Blueprint for a New Computing Infrastructure - Elsevier Science - 2004 - ISBN 1558609334 IBM involvement in DEISA by Luigi Brochard in IBM Deep Computing, 2004 Jack Dongarra (Editor), Ian Foster, Geoffrey Fox (Editor), Ken Kennedy, Andy White, Linda Torczon, William Gropp (Editor) - The Sourcebook of Parallel Computing - Elsevier Science - 2003 - ISBN 1558608710 Joshy Joseph, Craig Fellenstein - Grid Computing (On Demand Series) - IBM Press - ISBN 0131456601 Linux on zSeries by Richard Seeley in Z JOURNAL, April/May 2003
Online resources
These Web sites and URLs are also relevant as further information sources: AccessGrid
http://www.accessgrid.org
Audience Penetration
http://www.mediainfocenter.org/compare/penetration/
ChinaGrid
http://www.chinagrid.edu.cn
Customer Reference Case AIST

http://www.ibm.com/grid/jp/customer/aist.html
Customer Reference Case Hosei University

http://www.ibm.com/grid/jp/customer/hosei.html
Customer Reference Case RIKEN

http://www.ibm.com/grid/jp/customer/riken.html
DICOM
http://medical.nema.org
Distributed Meeting Tools

http://www.gfdl.noaa.gov/products/vis/meeting/
eDiamond Project
http://www.ediamond.ox.ac.uk/index.html
138
Global Grid Forum

http://www.gridforum.org/ http://www.ggf.org/
GridPort
http://gridport.net
GridSphere Portal Framework

http://www.gridsphere.org
H.323 - IEC On-Line Education

http://www.iec.org/online/tutorials/h323/
H.323 Protocols Suite

http://www.protocols.com/pbook/h323.htm
IBM Grid Portal Solutions

http://www.ibm.com/grid/jp/solutions/portal.shtml
IBM uses grid computing solutions to become faster, more flexible

http://www.ibm.com/software/success/cssdb.nsf/CS/BEMY-645U26?OpenDocument& Site=software
INTERNET GROWTH
http://www.internetworldstats.com/emarketing.htm
OASIS
http://oasis-open.org
Open Grid Services Architecture Data Access and Integration OGSA-DAI

http://www.ogsadai.org.uk
OpenH323 Project
http://www.openh323.org/
Technology of TRL 10th/2002

http://www.trl.ibm.com/news/ibm_users/trltech_10.htm
The DEISA project

http://www.deisa.org
The Globus Project

http://www.globus.org/
Wikipedia, the free encyclopedia

http://www.wikipedia.org
139
AccessGrid 2.2 Installation Instructions

http://www.hlrs.de/organization/vis/people/braitmaier/ag/
AccessGrid Description
http://www.csm.ornl.gov/~bernhold/tcf/ag-info.html
Access Grid: Immersive Group-to-Group Collaborative Visualization

http://citeseer.ist.psu.edu/
Access Grid Technology Helps Taiwan Doctors Quarantined By SARS

http://www.gridcomputingplanet.com/news/article.php/2216671
Access Grid Update

http://www.vide.net/conferences/spr2003/presentations/day_one/robert_olson/ vide-2003-03.ppt
Access Grid: Immersive Group-to-Group Collaborative Visualization

http://www-unix.mcs.anl.gov/fl/publications/childers00.pdf
ActiveSpaces on the Grid: The Construction of Advanced Visualization and interaction Environments
http://www-unix.mcs.anl.gov/fl/publications/activespaces-pdc.pdf
A simple guide to Access Grid at Southampton

http://www.hpcc.ecs.soton.ac.uk/~dan/WindowsTips/accessgrid.html
A Primer on the H.323 Series Standard

http://www.packetizer.com/voip/h323/papers/primer/
Customer Success Story AIST

http://www.ibm.com/grid/pdf/AIST.pdf
Customer Success Story University of Florida

http://www.ibm.com/grid/pdf/uf.pdf
DEISA Project Description Document - (preliminary version)

http://www.deisa.org/Documents/Deisa-TA-Public.pdf
DEISA: Integrating HPC Infrastructures in Europe

http://www.deisa.org/Presentations/Deisa_21042004_Cork.pdf
Digital Mammography: A world without film?

http://clermont2004.healthgrid.org/documents/slides_pdf/Sharon_Lloyd.pdf
E-Business on demand and business on demand and Grid computing

http://www.ibm.com/grid/
140
Everything you always wanted to know about the Grid and never dared to ask
http://www.grid2002.org/pgclasssummer03/PGGridPart2jul03.ppt
Evolution of the Internet

http://www.nzte.govt.nz/common/files/far-north-appendix18.pdf
GPFS Boosts Grids

http://www.enterpriseitplanet.com/networking/news/article.php/3114131
GPFS, Programming, Configuration and Performance Perspectives - Part 2 IBM

http://www.spscicomp.org/ScicomP7/Presentations/Paden-ScicomP7gpfs.v4Tutori alsp2.pdf
GRASP-Grid Accessed Data and Computational Scientific Portal

http://www.doc.ic.ac.uk/~sjn5/INDOUK/ES-GRASP.pdf
Grids in Europe and the LCG Project

http://conferences.fnal.gov/lp2003/program/GRID/bird_grid.ppt
How To Install and Configure AG 2.1 on a Single Machine Node (PIG), for Windows
http://www.accessgrid.org/agdp/howto/ag2-0-install-pig/1.3/html/book1.html
Human Factors
http://charlie.dgrc.crc.ca/cgi-bin/Sylvie/Blog/casarch.pl?2004/00/23/9.txt
Linux Based Access Grid

http://www1.qpsf.edu.au/pragma/slides/Access_Grid-Pragma4.ppt
Installing Access Grid 2.0 on Red Hat 7.3

http://www.ccs.uky.edu/docs/ag.html
Installing Access Grid Toolkit 2.0

http://hpcnc.cpe.ku.ac.th/moin/InstallingAccessGrid
Introduction to Grid computing and overview of the EU Data GridProject Data Grid - 2004
http://www.twgrid.org/event/isgc2003/ISGC_pdf/The_Architecture_of_EDG.pdf
ITR: Distance Collaboration - Education and Training on the Access Grid

http://www.motlabs.com/user/papers/crystam-ProjectSummaryFinal.pdf
Motivations, Strategies, Technologies - DEISA

http://www.deisa.org/Presentations/Deisa-25062004-ISC2004.pdf
141
Needs Assessment Workshop for Grid Techniques in Introductory Physics Classroom Projects
http://www-ed.fnal.gov/uueo/documents/Grid_Report_04.pdf
QuarkNet Cosmic Ray Studies & the Grid: Probing Extensive Showers
http://www.opensciencegrid.org/events/meetings/NSF-OSG-091703/Bardeen-Quark netGrid.pdf
So, What Can I do with Grid Computing Technologies

http://www.softwaresummit.com/2003/speakers/GiangarraGridTechnologies.pdf
The Access Grid: Prototyping Workspaces of the Future

http://mirror.mcs.anl.gov/nmi/tech_reports/reports/P1064.pdf
The Anatomy of the Grid - Enabling Scalable Virtual Organizations

http://www.globus.org/research/papers/anatomy.pdf
The Use of IBM Data Management products in the eDiaMoND Project

http://www.nesc.ac.uk/talks/401/eDiaMoND.ppt
Tutorial 1: How to Build and Install an Access Grid Node (AGN) An Elementary Guide for Technical Users
http://www.apan.net/home/training/ag/Tutorial1/T1.htm
Videoconferencing update
http://hepwww.rl.ac.uk/sysman/july2004/talks/hepsysman-2004-07-videoconf. ppt
Virtual Environments over the Access Grid

http://web.ics.purdue.edu/~dagonzal/tech519v/download.html
Grid projects in the world

http://gridcafe.web.cern.ch/gridcafe/
Healthcare and Life Sciences

http://www.ibm.com/industries/healthcare/
inSORS Integrated Communications

http://www.insors.com/
Computational Science Research Center in Hosei University

http://www.hosei.ac.jp/english/
National Institute of Advanced Industrial Science and Technology, AIST

http://www.aist.go.jp/index_en.html
Advanced Center for Computing and Communication, RIKEN

http://accc.riken.jp/E/index_e.html
142
How to get IBM Redbooks

You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:
ibm.com/redbooks
Help from IBM

IBM Support and downloads
ibm.com/support
IBM Global Services

ibm.com/services
143
144
Index
A
Access Grid 80 AFS 64, 74, 108 Andrew File System see AFS Apache Tomcat 131 Architectural decisions and product selection Big Science 74 Drugs Discovery 63 e-learning 89 Medical Images 54 Microprocessor Design 108 Scientific Simulation 44 Visualization 101 authentication 128 Avaki DataGrid 64, 74 CAD 11 CADD 5758, 65 CADe 51 Certificate Authority see CA cluster 124 collaborative learning 8 Component model description Big Science 74 Drugs Discovery 62 e-learning 88 Medical Images 52 Microprocessor Design 107 Scientific Simulation 43 Visualization 101 Component model diagram Big Science 72 Drugs Discovery 62 e-learning 86 Medical Images 52 Microprocessor Design 107 Scientific Simulation 42 Visualization 100 computational grids 5, 19, 69 Computer Aided Design see CAD Computer Aided Detection see CADe Computer-Aided Drug Discovery see CADD content manager 53 CSF 127, 129, 131
B
Basic Local Alignment Search Tool see BLAST BEA WebLogic Server 90 BLAST 45 Breast cancer 48 Business context Big Science 68 Drugs Discovery 58 e-learning 80 Medical Images 48 Microprocessor Design 104 Scientific Simulation 38 Visualization 94 Business needs Big Science 69 Drugs Discovery 58 e-learning 81 Medical Images 48 Microprocessor Design 104 Scientific Simulation 39 Visualization 96
D
data grids 6, 10, 19 data-mining 9, 56 DEISA 68 delivery grids see network grids Deploying code deployment 25 data deployment 25 DFS 108
C
CA 99, 118
145
DHCP 24 DICOM 52 Digital Imaging and Communications in Medicine see DICOM Distributed File System see DFS Distributed Power Point 90 DNS 24
see HPC HPC 9, 37, 94, 114
I
IBM AIX 54, 74, 108, 122, 124 IBM DB2 54, 89 IBM DB2 Content Manager 5455, 89 IBM eServer 45 IBM eServer pSeries 124 IBM eServer pSeries 655 124 IBM GPFS 34, 64, 7475, 116 IBM Grid Toolbox 54, 101 IBM LoadLeveller 108 IBM Lotus Learning Management System 90 IBM pSeries 54 IBM Tivoli Workload Scheduler 44 IBM Visual Age C++ 55 IBM WebSphere Application Server 44, 54, 90, 131 IBM WebSphere Portal 101 IBM xSeries 45, 54 IEEE 75 image recognition 52 Implementation Big Science 76 Drug Discovery 64 e-learning 91 Medical Images 55 Microprocessor Design 108 Scientific Simulation 44 Visualization 101 Intel Pentium III 124 Intel Xeon 124
E
Eclipse 55, 108 eDiaMoND 48
F
Federated database 53 Firefox 44 firewalls 24, 70
G
GFS 6364, 72, 74 Global File System see GFS Globus Toolkit 4445, 54, 64, 74, 89, 126, 129 version 3 54, 74, 127, 131 GRAM 131 Grid architecture logical architecture 22 physical architecture 22 Grid implementation Approaches Bottom-up implementation 17 Incremental growing 17 Hardware requirements 19 Human-resource requirements 21 Software requirements 20 Grid Port 128 Grid Portal 10, 88, 101 grid portal 129 Grid Security Infrastructure see GSI grid services 11, 7475, 88 Grid System Gateway 44, 90 GridPort 131 GSI 118, 128, 131
J
Java 108 JBoss 131 JBoss Application Server 54, 90 JFS 75 job submission 43, 97, 107 Journaling File System see JFS
K
Kontiki 90
H
High-Performance Computing
L
Learning Management System
146
see LMS legacy systems 2627 Linux 54, 72, 74, 115116, 122, 124 LMS 90
M
Mac OS X 122 Maintaining grids Grid application 28 Grid platform 27 mammography 4849 massive data-storage systems see data grids MDS 131 metadata 53, 75 meta-scheduler 126, 131 Microsoft Internet Explorer 44 MMJFS 129 Mozilla 44 MySQL 54
N
Netscape Navigator 44 Network File System see NFS network grids 6, 10 see communication grids NFS 75
O
OGSA 4344, 5355, 7475, 8889, 101 OGSA-DAI 53, 55, 90 OGSI 127 on demand 20 Open Grid Services Architecture see OGSA Open Grid Services Architecture - Data Access Integration see OGSA-DAI open source 127 Oracle 54 OS/390 122
P
Peer-to-Group Media Broadcast 90 peer-to-peer 8 Platform LSF 4445
Platform LSF Multicluster 101 Portable Operating System Interface see POSIX portlet 128 POSIX 75 PostgreSQL 54 Products AFS 64, 74, 108 Apache Tomcat 131 Avaki DataGrid 64, 74 BEA WebLogic Server 90 BLAST 45 Distributed Power Point 90 Eclipse 55, 108 Firefox 44 Globus Toolkit 4445, 54, 64, 74, 89 Grid System Gateway 44, 90 GridPort 131 IBM AIX 54, 74, 108 IBM DB2 54, 89 IBM DB2 Content Manager 5455, 89 IBM eServer 45 IBM eServer pSeries 124 IBM eServer pSeries 655 124 IBM GPFS 34, 64, 7475, 116 IBM Grid Toolbox 54, 101 IBM LoadLeveller 108 IBM Lotus Learning Management System 90 IBM pSeries 54 IBM Tivoli Workload Scheduler 44 IBM Visual Age C++ 55 IBM WebSphere Application Server 44, 54, 90 IBM WebSphere Portal 101 IBM xSeries 45, 54 Intel Pentium III 124 Intel Xeon 124 Java 108 JBoss 131 JBoss Application Server 54, 90 Linux 54, 72, 74, 115116 Microsoft Internet Explorer 44 Microsoft Windows 122 Mozilla 44 MySQL 54 Netscape Navigator 44 NFS 75 Oracle 54 Platform LSF 4445 Platform LSF Multicluster 101
Index
147
PostgreSQL 54 RedHat Linux 45, 89 Remote Power Point 90 Scheduling CSF 127, 129, 131 Tomcat 44, 90 Web Services Core/Hosting IBM WebSphere Application Server 131
S
Service Oriented Architecture see SOA SIMD 25 single sign-on 131 single-instruction-multiple-data see SIMD SOA 53 Streaming 82, 88 streaming applications 10 streaming servers 11
Q
QoS 10 QSAR 59, 64 quality of service see QoS
T
Tomcat 44, 90 Types of grids computational grids 5 data grids 6 network grids 6
R
Redbooks Web site 143 Contact us xxiii RedHat Linux 45, 89 Remote Instrumentation 114 Remote Power Point 90 Requirements Big Science 70 Drugs Discovery 59 e-learning 82 Medical Images 49 Microprocessor Design 105 Scientific Simulation 39 Visualization 96 Research Areas Astronautics and aerospace 9 Astrophysics 9 Automotive 9 Biology and genetics 9 Chemistry 9 Distance Education 8 Earthquake Modeling 116 Economics analysis 9 Encyclopedia of Life 116 Environmental studies 9 High energy physics 9 Higher Education 8 Life Science 8 Materials 9 Medicine imagery 9 Molecular Dynamics simulation 116 Real Time Brain Mapping 116 Resource balancing 8
U
Use-cases Big Science 70 Drugs Discovery 60 e-learning 83 Medical Images 50 Microprocessor Design 106 Scientific Simulation 40 Visualization 98
V
virtual organization 124 Virtual Organizations see VO Virtual Private Network see VPN Virtual resources 8 VO 8, 96, 117 see virtual organization VPN 24
W
workload management 43, 107
X
X.509 118 XML 55
148
Xpath 55
Index
149
150
(0.2spine) 0.17<->0.473 90<->249 pages
Back cover

Grid in Research Institutions Grid in Universities Examples
This IBM Redbook, Grid Computing in Research and Education, belongs to a series of documents related to grid computing that IBM is presenting to the community to enrich the IT industry and all its players: customers, industry leaders, emerging enterprises, universities, and producers of technology. It is mainly oriented to IT architects or those who have the responsibility of analyzing the capabilities to build in a grid solution. Part 1 presents the basics about what, why, and how grid can be applied to the research and education fields. Part 2 presents a collection of examples of real-world grid implementations that have been accomplished in the research and education world. Part 3 describes the Teragrid project, a cyber-infrastructure that aims to solve the problem of emerging terascale applications. It also provides a hypothetical example of a research oriented grid involving multiple schedulers and multiple different components and services.
INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION
BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.
For more information: ibm.com/redbooks

SG24-6649-00 ISBN 0738491756

SG 246649

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

SG 246649

Caricato da

Copyright:

Formati disponibili

Front cover

Grid Computing in g Research and Education

Copyright IBM Corp. 2005. All rights reserved.

Grid Computing in Research and Education

Grid Computing in Research and Education

Grid Computing in Research and Education

Copyright IBM Corp. 2005. All rights reserved.

Grid Computing in Research and Education

Copyright IBM Corp. 2005. All rights reserved.

Grid Computing in Research and Education

Copyright IBM Corp. 2005. All rights reserved.

Grid Computing in Research and Education

Part 1, Introduction on page 1

Part 2, Grid by examples on page 31

Scientific simulation on page 37

Medical images on page 47

Computer-Aided Drug Discovery on page 57

Copyright IBM Corp. 2005. All rights reserved.

Big Science on page 67

Microprocessor design on page 103

Part 3, Appendixes on page 111

The team that wrote this redbook

Grid Computing in Research and Education

Grid Computing in Research and Education

Grid Computing in Research and Education

Thanks to the following institutions for their contributions:

Become a published author

Grid Computing in Research and Education

Send your comments in an e-mail to:

Grid Computing in Research and Education

Copyright IBM Corp. 2005. All rights reserved.

Grid Computing in Research and Education

Introduction to grid concepts

Copyright IBM Corp. 2005. All rights reserved.

1.1 Beginning of the grid concept

Grid Computing in Research and Education

Figure 1-1 Heterogeneous and independent computing resources

In the next section we describe the types of grids available.

Chapter 1. Introduction to grid concepts

1.1.1 Research and education on grid context

Grid Computing in Research and Education

1.2.1 Why use grids in research and education?

Chapter 1. Introduction to grid concepts

Grid Computing in Research and Education

1.2.2 Leveraging research activities with grids

Chapter 1. Introduction to grid concepts

1.2.3 Leveraging educational activities with grids

Grid Computing in Research and Education

1.3 What will the future bring?

1.3.1 What exists today

Chapter 1. Introduction to grid concepts

1.3.2 What is the potential for grids

A global grid infrastructure is evolving to be readily available in the near future.

Grid Computing in Research and Education

1.3.3 What is likely to happen

Chapter 1. Introduction to grid concepts

Grid Computing in Research and Education

How to implement a grid

Copyright IBM Corp. 2005. All rights reserved.

2.1.1 The main difficulties

Grid Computing in Research and Education

Chapter 2. How to implement a grid