Sei sulla pagina 1di 14

Evolution of the Globus system - Ian Foster

Andrzej Ryl

Department of Computer Science


AGH University of Science and Technology
Kraków, Poland

1
Outline
• What is Globus? Why was it created?
○ VO
○ current solutions’ drawbacks
• Globus architecture
• Grid services
• Network protocols
• Globus Nexus
○ features
○ architecture
○ use cases
○ evaluation
• Future plans
• Summary
• References

2
What is Globus? Why was it created?
Globus toolkit features[1][2][5][6]:
Virtual Organization (VO) - set of individuals defined by
● sharing TBs of data
specific sharing rules like access to computers, software
● authentication
and other resources helping in collaborative research
● registry of resources
controlled by some access rules[1]
● access to broad range of services
● durability

Problems with current solutions[1]:


● resource sharing only inside one Project I-WAY[6]:
organization
● no single sign-on There was an experiment in 1996
● if you want to outsource your services to which resulted in connecting
somebody he most likely has to be supercomputers and other sites
connected to your network via VPN providing science resources into one
● no easy way to make use of standard PCs metacomputer allowing scientists to
(different environments) run experiments.
That was first incentive for building
Globus (back than supercomputers
were unreachable for most scientists)

[1]I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable Virtual Organizations (2001)
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002)
[5]C. Severance: Ian Foster and the Globus Project (2014)
[6]C. Kesselman, I. Foster: GLOBUS: A METACOMPUTING INFRASTRUCTURE TOOLKIT (1997) 3
Grid architecture/Globus features[1]
Collective - ‘merges’ multiple Resource
layer’s hooks providing APIs and SDKs to
collections of data (between VOs as well)

Applications - applications created to


access and modify provided data with the
use of APIs and SDKs provided by
Collective/Resource layers or straight from
Fabric layer. Currently Globus has a set of
web applications

Fabric - practically resources that users want to get access to. We have computational resources, storage
resources, network resources, code repositories and catalogs. Globus toolkit mainly uses provided fabric layer.
However it can include this functionality if vendor does not provide it

Connectivity - basically all that’s connected to communication and authentication. Communication is based on
TCP/IP. Authentication provides single sign on, delegation[2] (programs run by user can ‘touch’ only his
resources), integration (with existing cryptographic solutions), trust relationships (user can have a ‘tree’ of security
credentials without the need to ask for permissions all the time). Globus toolkit uses TLS, Kerberos and X.509
certificates for security

Resource - uses connectivity layer and build an abstraction over it with the help of APIs and SDKs. It also
provides abstraction over Fabric layer in order to simplify and unify access to data. In Globus toolkit we have
GRIP (protocol for modeling data), GRAM(HHTP based protocol for computational resources), GridFTP (pure
data access), LDAP(catalog access). APIs are written in C and Java
[1]I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable Virtual Organizations (2001) 4
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002)
Grid services[2]
“OGSA defines what we call a Grid service: a Web service that provides a set of well-defined interfaces
and that follows specific conventions”[2]

● upgradeability and transport


○ service has to be easily upgradeable. That’s why Globus uses versioning and display
compatibility of each service with other ones. Each upgrade has to finish without crashing
user applications
○ services communicate with each other via messages. Each message will be received via
given service only once or not at all (transport protocols make sure that message do not
get duplicated)

● standard interfaces
○ grid service - allows user to gather info about grid itself (it’s state, keys etc.) and to
terminate it (or set termination time)
○ notification source - user can subscribe to notifications for any specific service
○ notification sink - special service which delivers notifications to subscribers
○ registry - user can register or unregister a service handle
○ factory - user can create his own service
○ handleMap - serves as a storage of services. User can get a reference to the service
based on handle gathered from registry

[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002) 5
Grid services -different approaches[2]
Grid services can be created and used with the help of Factory, Registry, GridService and Handle Map
services.

Simple hosting env


● one set of resources
● one registry Virtual hosting env
● one HandleMap service ● geographically distributed Collective services
● multiple factories resources ● factories can create higher-
● multiple higher-level level services composed of
User can issue a request to factories sending requests services produced by
create Grid service which will to lower-level ones lower-level factories
cause factory to create env- ● higher-level registry that ● higher-level registry knows
specific instance, assign it a knows about lower-level about lower-level and
handle, register it in registry and ones higher-level services
make it available via handleMap. Logic the same but based on
higher-level elements
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002) 6
Network protocols[2]
Requirements
Reliable transport
Most of data travelling through grid is often huge (sizes circle around TBs) so multiple
problems like connection dropping has to be addressed and solved
Authentication and delegation
Users can have access only to specific data. However if they have access to data outside of
VO they are currently working on no additional permissions should be requested
Ubiquity
The goal is to make any pair of services able to interact with each other. It’s possible thanks
to unified messages building service communication
GSR Format
Special format of service description is required for unification. Back when [2] was written
WSDL document (like in SOAP Web Service) was the standard
Protocols
GRAM Protocol allowing user to securely create and
use/manage srvices[1]
GridFTP Extended version of FTP (includes security
from Connectivity layer) [1]
MDS Meta Directory Service allowing user to search for
specific service[1]
HTTP Web protocol[1]
FTP File transfer protocol[1]
LDAP Protocol used in addressing Internet (in Globus
used in discovery of services)[1]
TLS/GSI Security protocols[1]
IP Basic communication protocol in Internet[1]
[1]I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable Virtual Organizations (2001) 7
[2]I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration (06/22/2002)
Globus evolution[3]

[3]K. Chard, S. Tuecke, I. Foster : Globus: Recent Enhancements and Future Plans 8
Globus nexus[4]
● finally REST API instead of SOAP WS
● service is stateless
● entire state is kept in Amazon RDS
● entire communication has several of fail-over mechanisms to
provide make data access totally fail-proof
● single sign on mechanism poses a lot of challenges
(scientists have many data providers which have different
security policies and authentication mechanisms)
● another problem is entire engine responsible for ‘roles’ and
profiles in the system. Users can share data on many
different levels. It’s handled similarly to UNIX group UUIDs
mechanism

9
[4]K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I. Foster: Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
Globus nexus - security identities[4]

- security guidelines fulfilled (password strength, uniqueness of login)


- one identity (globus one) to access resources from different providers
- LDAP & OAuth 2 allow third-party solutions to confirm user’s identity via globus
- easy group management allowing users to bind multiple identities
- branded site deployment (admin panel tailored specially for the given project)
10
[4]K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I. Foster: Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
Globus nexus - performance[4]

11
[4]K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I. Foster: Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
Future plans (2016)[3]
● Advanced data search
○ with the increase of active users, there is a huge growth of data to be queried. That poses a
problem with time needed to go through this data
○ globus wants not only to provide the possibility of searching through file metadata but internal
structure of files as well (users are used to that)
○ scientific files are often complex so some great indexing mechanism is needed
● Policy-based data collection model
○ currently access policies are based on groups and shared endpoints
○ user cannot give permission only to one specific file or just it’s part
○ globus wants to integrate access policies with collections in order to give users full control over
permissions needed to access files in their collections
● Active data management
○ many users write their own scripts using Globus API to automate their work (scripts like data
archiving, transfer, analysis etc.)
○ Globus wants to build a data management environment which will enable users to specify their
own rules
○ rules could be expressed via periodic tasks
○ rules could be expressed via tasks invoked by events produced by Globus services
○ rules are to be easily setup (many default rules will be available for usage or further development)

[3]K. Chard, S. Tuecke, I. Foster : Globus: Recent Enhancements and Future Plans 12
Summary
Globus came a long way full of changes and crucial decisions. Firstly this system was purely built to
make use of unlimited amounts of PCs to offload scientific calculations. Later on - probably on the verge
of I-WAY experiment - this system migrated into a grid platform allowing users to access supercomputers
and share their results with each other.

Grid technology allowed Globus toolkit to become a huge and complex platform making possible for
millions of scientists to share their data and cooperate on many experiments. SOAP WS served as a
gateway between different computing environments and were crucial in how this system worked.
However when REST became popular authors of Globus toolkit decided to rewrite everything which
proves that they’re open for changes which is a good thing for such a complex system.

Huge advantage of the system is it’s architecture based on micro-services. Thanks to that solution
whenever given technology becomes obsolete, authors can rewrite services using it and the system stays
active all the time. There is no need to change entire code - just the one managing the service.

Another feature crucial for scientists is a single sign-on. With constantly increasing number of data
providers running experiments on shared data would be a total nightmare if users would have to ask for
permissions all the time. Globus’s solution merges all identities into one allowing scientists to have
access to all data with just one authentication session.

Globus toolkit has many more advanced features like fast & secure data transfer, easy role management
etc. However there is still a long way ahead of it as i.e. they have to come up with better mechanism for
running complex queries on such large amounts of data.

13
References

1. I. Foster, C. Kasselman, S. Tuecke: The Anatomy of the Grid Enabling Scalable


Virtual Organizations (2001)
2. I. Foster, C. Kasselman, S. Tuecke, J. Nick: The Physiology of the Grid An Open Grid
Services Architecture for Distributed Systems Integration (06/22/2002)
3. K. Chard, S. Tuecke, I. Foster : Globus: Recent Enhancements and Future Plans
4. K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, I.
Foster: Globus Nexus: A Platform-as-a-Service provider of research identity,
profile, and group management (01/04/2015)
5. C. Severance: Ian Foster and the Globus Project (2014)
6. C. Kesselman, I. Foster: GLOBUS: A METACOMPUTING INFRASTRUCTURE TOOLKIT
(1997)
7. C. Kesselman, I. Foster: The History of the Grid (2011)

14

Potrebbero piacerti anche