Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Andrzej Ryl
1
Outline
• What is Globus? Why was it created?
○ VO
○ current solutions’ drawbacks
• Globus architecture
• Grid services
• Network protocols
• Globus Nexus
○ features
○ architecture
○ use cases
○ evaluation
• Future plans
• Summary
• References
2
What is Globus? Why was it created?
Globus toolkit features[1][2][5][6]:
Virtual Organization (VO) - set of individuals defined by
● sharing TBs of data
specific sharing rules like access to computers, software
● authentication
and other resources helping in collaborative research
● registry of resources
controlled by some access rules[1]
● access to broad range of services
● durability
[1]I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable Virtual Organizations (2001)
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002)
[5]C. Severance: Ian Foster and the Globus Project (2014)
[6]C. Kesselman, I. Foster: GLOBUS: A METACOMPUTING INFRASTRUCTURE TOOLKIT (1997) 3
Grid architecture/Globus features[1]
Collective - ‘merges’ multiple Resource
layer’s hooks providing APIs and SDKs to
collections of data (between VOs as well)
Fabric - practically resources that users want to get access to. We have computational resources, storage
resources, network resources, code repositories and catalogs. Globus toolkit mainly uses provided fabric layer.
However it can include this functionality if vendor does not provide it
Connectivity - basically all that’s connected to communication and authentication. Communication is based on
TCP/IP. Authentication provides single sign on, delegation[2] (programs run by user can ‘touch’ only his
resources), integration (with existing cryptographic solutions), trust relationships (user can have a ‘tree’ of security
credentials without the need to ask for permissions all the time). Globus toolkit uses TLS, Kerberos and X.509
certificates for security
Resource - uses connectivity layer and build an abstraction over it with the help of APIs and SDKs. It also
provides abstraction over Fabric layer in order to simplify and unify access to data. In Globus toolkit we have
GRIP (protocol for modeling data), GRAM(HHTP based protocol for computational resources), GridFTP (pure
data access), LDAP(catalog access). APIs are written in C and Java
[1]I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable Virtual Organizations (2001) 4
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002)
Grid services[2]
“OGSA defines what we call a Grid service: a Web service that provides a set of well-defined interfaces
and that follows specific conventions”[2]
● standard interfaces
○ grid service - allows user to gather info about grid itself (it’s state, keys etc.) and to
terminate it (or set termination time)
○ notification source - user can subscribe to notifications for any specific service
○ notification sink - special service which delivers notifications to subscribers
○ registry - user can register or unregister a service handle
○ factory - user can create his own service
○ handleMap - serves as a storage of services. User can get a reference to the service
based on handle gathered from registry
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002) 5
Grid services -different approaches[2]
Grid services can be created and used with the help of Factory, Registry, GridService and Handle Map
services.
[3]K. Chard, S. Tuecke, I. Foster : Globus: Recent Enhancements and Future Plans 8
Globus nexus[4]
● finally REST API instead of SOAP WS
● service is stateless
● entire state is kept in Amazon RDS
● entire communication has several of fail-over mechanisms to
provide make data access totally fail-proof
● single sign on mechanism poses a lot of challenges
(scientists have many data providers which have different
security policies and authentication mechanisms)
● another problem is entire engine responsible for ‘roles’ and
profiles in the system. Users can share data on many
different levels. It’s handled similarly to UNIX group UUIDs
mechanism
9
[4]K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I. Foster: Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
Globus nexus - security identities[4]
11
[4]K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I. Foster: Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
Future plans (2016)[3]
● Advanced data search
○ with the increase of active users, there is a huge growth of data to be queried. That poses a
problem with time needed to go through this data
○ globus wants not only to provide the possibility of searching through file metadata but internal
structure of files as well (users are used to that)
○ scientific files are often complex so some great indexing mechanism is needed
● Policy-based data collection model
○ currently access policies are based on groups and shared endpoints
○ user cannot give permission only to one specific file or just it’s part
○ globus wants to integrate access policies with collections in order to give users full control over
permissions needed to access files in their collections
● Active data management
○ many users write their own scripts using Globus API to automate their work (scripts like data
archiving, transfer, analysis etc.)
○ Globus wants to build a data management environment which will enable users to specify their
own rules
○ rules could be expressed via periodic tasks
○ rules could be expressed via tasks invoked by events produced by Globus services
○ rules are to be easily setup (many default rules will be available for usage or further development)
[3]K. Chard, S. Tuecke, I. Foster : Globus: Recent Enhancements and Future Plans 12
Summary
Globus came a long way full of changes and crucial decisions. Firstly this system was purely built to
make use of unlimited amounts of PCs to offload scientific calculations. Later on - probably on the verge
of I-WAY experiment - this system migrated into a grid platform allowing users to access supercomputers
and share their results with each other.
Grid technology allowed Globus toolkit to become a huge and complex platform making possible for
millions of scientists to share their data and cooperate on many experiments. SOAP WS served as a
gateway between different computing environments and were crucial in how this system worked.
However when REST became popular authors of Globus toolkit decided to rewrite everything which
proves that they’re open for changes which is a good thing for such a complex system.
Huge advantage of the system is it’s architecture based on micro-services. Thanks to that solution
whenever given technology becomes obsolete, authors can rewrite services using it and the system stays
active all the time. There is no need to change entire code - just the one managing the service.
Another feature crucial for scientists is a single sign-on. With constantly increasing number of data
providers running experiments on shared data would be a total nightmare if users would have to ask for
permissions all the time. Globus’s solution merges all identities into one allowing scientists to have
access to all data with just one authentication session.
Globus toolkit has many more advanced features like fast & secure data transfer, easy role management
etc. However there is still a long way ahead of it as i.e. they have to come up with better mechanism for
running complex queries on such large amounts of data.
13
References
14