Sei sulla pagina 1di 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2955831

The Vision Of Autonomic Computing

Article  in  Computer · February 2003


DOI: 10.1109/MC.2003.1160055 · Source: IEEE Xplore

CITATIONS READS

4,617 519

2 authors, including:

Jeffrey O. Kephart
IBM
180 PUBLICATIONS   12,931 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Self-aware Computing Systems View project

Biology-inspired computing View project

All content following this page was uploaded by Jeffrey O. Kephart on 09 February 2013.

The user has requested enhancement of the downloaded file.


COVER FEATURE

The Vision of
Autonomic
Computing
Systems manage themselves according to an administrator’s goals. New
components integrate as effortlessly as a new cell establishes itself in the
human body. These ideas are not science fiction, but elements of the grand
challenge to create self-managing computing systems.

Jeffrey O.

I
n mid-October 2001, IBM released a manifesto figure, optimize, maintain, and merge. And there
Kephart observing that the main obstacle to further will be no way to make timely, decisive responses to
progress in the IT industry is a looming soft- the rapid stream of changing and conflicting
David M.
ware complexity crisis.1 The company cited demands.
Chess
IBM Thomas J. applications and environments that weigh in
Watson Research at tens of millions of lines of code and require AUTONOMIC OPTION
Center skilled IT professionals to install, configure, tune, The only option remaining is autonomic com-
and maintain. puting—computing systems that can manage them-
The manifesto pointed out that the difficulty of selves given high-level objectives from admini-
managing today’s computing systems goes well strators. When IBM’s senior vice president of
beyond the administration of individual software research, Paul Horn, introduced this idea to the
environments. The need to integrate several het- National Academy of Engineers at Harvard
erogeneous environments into corporate-wide com- University in a March 2001 keynote address, he
puting systems, and to extend that beyond company deliberately chose a term with a biological conno-
boundaries into the Internet, introduces new levels tation. The autonomic nervous system governs our
of complexity. Computing systems’ complexity heart rate and body temperature, thus freeing our
appears to be approaching the limits of human conscious brain from the burden of dealing with
capability, yet the march toward increased inter- these and many other low-level, yet vital, functions.
connectivity and integration rushes ahead unabated. The term autonomic computing is emblematic of
This march could turn the dream of pervasive a vast and somewhat tangled hierarchy of natural
computing—trillions of computing devices con- self-governing systems, many of which consist of
nected to the Internet—into a nightmare. Pro- myriad interacting, self-governing components that
gramming language innovations have extended the in turn comprise large numbers of interacting,
size and complexity of systems that architects can autonomous, self-governing components at the next
design, but relying solely on further innovations in level down. The enormous range in scale, starting
programming methods will not get us through the with molecular machines within cells and extending
present complexity crisis. to human markets, societies, and the entire world
As systems become more interconnected and socioeconomy, mirrors that of computing systems,
diverse, architects are less able to anticipate and which run from individual devices to the entire
design interactions among components, leaving Internet. Thus, we believe it will be profitable to
such issues to be dealt with at runtime. Soon sys- seek inspiration in the self-governance of social and
tems will become too massive and complex for even economic systems as well as purely biological ones.
the most skilled system integrators to install, con- Clearly then, autonomic computing is a grand

0018-9162/03/$17.00 © 2003 IEEE Published by the IEEE Computer Society January 2003 41
upgrades worthwhile, the system will install them,
reconfigure itself as necessary, and run a regression
test to make sure all is well. When it detects errors,
the system will revert to the older version while its
automatic problem-determination algorithms try to
xxxxx xxxxx
isolate the source of the error. Figure 1 illustrates
xxxxx xxxxx
xxxxx xxxxx how this process might work for an autonomic
accounting system upgrade.
xxxxx xxxxx
xxxxx xxxxx Old New IBM frequently cites four aspects of self-man-
xxxxx xxxxx
$167.43 $167.43 agement, which Table 1 summarizes. Early auto-
$51.27 $51.27 nomic systems may treat these aspects as distinct,
$1102.98 $102.98 with different product teams creating solutions that
address each one separately. Ultimately, these
xxxxx xxxxx
xxxxx xxxxx aspects will be emergent properties of a general
xxxxx xxxxx
xxxxx xxxxx architecture, and distinctions will blur into a more
xxxxx xxxxx
xxxxx xxxxx general notion of self-maintenance.
The journey toward fully autonomic computing
will take many years, but there are several impor-
Figure 1. Problem diagnosis in an autonomic system upgrade. The upgrade intro- tant and valuable milestones along the path. At
duces five software modules (blue), each an autonomic element. Minutes after first, automated functions will merely collect and
installation, regression testers find faulty output in three of the new modules (red aggregate information to support decisions by
outlines), and the system immediately reverts to its old version. A problem deter- human administrators. Later, they will serve as
miner, an autonomic element, obtains information about interelement dependen- advisors, suggesting possible courses of action for
cies (lines between elements) from a dependency analyzer, another autonomic humans to consider. As automation technologies
element that probes the system periodically (not shown). Taking into account its improve, and our faith in them grows, we will
knowledge of interelement dependencies, the problem determiner analyzes log entrust autonomic systems with making—and act-
files and infers which of the three potentially bad modules is the culprit (red X). It ing on—lower-level decisions. Over time, humans
generates a problem ticket containing diagnostic information and sends it to a will need to make relatively less frequent predomi-
software developer, who debugs the module and makes it available for future nantly higher-level decisions, which the system will
upgrades. carry out automatically via more numerous, lower-
level decisions and actions.
challenge that reaches far beyond a single organi- Ultimately, system administrators and end users
zation. Its realization will take a concerted, long- will take the benefits of autonomic computing for
term, worldwide effort by researchers in a diversity granted. Self-managing systems and devices will
of fields. A necessary first step is to examine this seem completely natural and unremarkable, as will
vision: what autonomic computing systems might automated software and middleware upgrades.
look like, how they might function, and what The detailed migration patterns of applications or
obstacles researchers will face in designing them data will be as uninteresting to us as the details of
and understanding their behavior. routing a phone call through the telephone net-
work.
SELF-MANAGEMENT
The essence of autonomic computing systems is Self-configuration
self-management, the intent of which is to free sys- Installing, configuring, and integrating large,
tem administrators from the details of system oper- complex systems is challenging, time-consuming,
ation and maintenance and to provide users with a and error-prone even for experts. Most large Web
machine that runs at peak performance 24/7. sites and corporate data centers are haphazard
Like their biological namesakes, autonomic sys- accretions of servers, routers, databases, and other
tems will maintain and adjust their operation in the technologies on different platforms from different
face of changing components, workloads, demands, vendors. It can take teams of expert programmers
and external conditions and in the face of hardware months to merge two systems or to install a major
or software failures, both innocent and malicious. e-commerce application such as SAP.
The autonomic system might continually monitor Autonomic systems will configure themselves
its own use, and check for component upgrades, for automatically in accordance with high-level poli-
example. If it deems the advertised features of the cies—representing business-level objectives, for

42 Computer
Table 1. Four aspects of self-management as they are now and would be with autonomic computing.
Concept Current computing Autonomic computing

Self-configuration Corporate data centers have multiple Automated configuration of components and systems
vendors and platforms. Installing, follows high-level policies. Rest of system adjusts
configuring, and integrating systems is automatically and seamlessly.
time consuming and error prone.
Self-optimization Systems have hundreds of manually set, Components and systems continually seek
nonlinear tuning parameters, and their opportunities to improve their own performance and
number increases with each release. efficiency.
Self-healing Problem determination in large, complex System automatically detects, diagnoses, and repairs
systems can take a team of programmers localized software and hardware problems.
weeks.
Self-protection Detection of and recovery from attacks System automatically defends against malicious
and cascading failures is manual. attacks or cascading failures. It uses early warning
to anticipate and prevent systemwide failures.

example—that specify what is desired, not how it can take teams of programmers several weeks to
is to be accomplished. When a component is intro- diagnose and fix, and sometimes the problem dis-
duced, it will incorporate itself seamlessly, and the appears mysteriously without any satisfactory
rest of the system will adapt to its presence—much diagnosis.
like a new cell in the body or a new person in a pop- Autonomic computing systems will detect, diag-
ulation. For example, when a new component is nose, and repair localized problems resulting from
introduced into an autonomic accounting system, bugs or failures in software and hardware, perhaps
as in Figure 1, it will automatically learn about and through a regression tester, as in Figure 1. Using
take into account the composition and configura- knowledge about the system configuration, a prob-
tion of the system. It will register itself and its capa- lem-diagnosis component (based on a Bayesian
bilities so that other components can either use it or network, for example) would analyze information
modify their own behavior appropriately. from log files, possibly supplemented with data
from additional monitors that it has requested.
Self-optimization The system would then match the diagnosis
Complex middleware, such as WebSphere, or against known software patches (or alert a human
database systems, such as Oracle or DB2, may have programmer if there are none), install the appro-
hundreds of tunable parameters that must be set priate patch, and retest.
correctly for the system to perform optimally, yet
few people know how to tune them. Such systems Self-protection
are often integrated with other, equally complex Despite the existence of firewalls and intrusion-
systems. Consequently, performance-tuning one detection tools, humans must at present decide how
large subsystem can have unanticipated effects on to protect systems from malicious attacks and inad-
the entire system. vertent cascading failures.
Autonomic systems will continually seek ways Autonomic systems will be self-protecting in two
to improve their operation, identifying and seizing senses. They will defend the system as a whole
opportunities to make themselves more efficient against large-scale, correlated problems arising from
in performance or cost. Just as muscles become malicious attacks or cascading failures that remain
stronger through exercise, and the brain modifies uncorrected by self-healing measures. They also will
its circuitry during learning, autonomic systems anticipate problems based on early reports from
will monitor, experiment with, and tune their own sensors and take steps to avoid or mitigate them.
parameters and will learn to make appropriate
choices about keeping functions or outsourcing ARCHITECTURAL CONSIDERATIONS
them. They will proactively seek to upgrade their Autonomic systems will be interactive collections
function by finding, verifying, and applying the lat- of autonomic elements—individual system con-
est updates. stituents that contain resources and deliver services
to humans and other autonomic elements. Auto-
Self-healing nomic elements will manage their internal behavior
IBM and other IT vendors have large depart- and their relationships with other autonomic ele-
ments devoted to identifying, tracing, and deter- ments in accordance with policies that humans or
mining the root cause of failures in complex other elements have established. System self-man-
computing systems. Serious customer problems agement will arise at least as much from the myriad

January 2003 43
of this information, the autonomic manager will
relieve humans of the responsibility of directly man-
aging the managed element.
Fully autonomic computing is likely to evolve as
designers gradually add increasingly sophisticated
autonomic managers to existing managed elements.
Ultimately, the distinction between the autonomic
manager and the managed element may become
merely conceptual rather than architectural, or it
may melt away—leaving fully integrated, auto-
nomic elements with well-defined behaviors and
interfaces, but also with few constraints on their
Autonomic manager internal structure.
Each autonomic element will be responsible for
Analyze Plan
managing its own internal state and behavior and
for managing its interactions with an environment
that consists largely of signals and messages from
Monitor Knowledge Execute other elements and the external world. An element’s
internal behavior and its relationships with other
elements will be driven by goals that its designer
Managed element has embedded in it, by other elements that have
authority over it, or by subcontracts to peer ele-
ments with its tacit or explicit consent. The element
may require assistance from other elements to
achieve its goals. If so, it will be responsible for
obtaining necessary resources from other elements
Figure 2. Structure of an autonomic element. Elements interact with other and for dealing with exception cases, such as the
elements and with human programmers via their autonomic managers. failure of a required resource.
Autonomic elements will function at many levels,
interactions among autonomic elements as it will from individual computing components such as
from the internal self-management of the individual disk drives to small-scale computing systems such
autonomic elements—just as the social intelligence as workstations or servers to entire automated
of an ant colony arises largely from the interactions enterprises in the largest autonomic system of all—
among individual ants. A distributed, service-ori- the global economy.
ented infrastructure will support autonomic ele- At the lower levels, an autonomic element’s range
ments and their interactions. of internal behaviors and relationships with other
As Figure 2 shows, an autonomic element will elements, and the set of elements with which it can
typically consist of one or more managed elements interact, may be relatively limited and hard-coded.
coupled with a single autonomic manager that con- Particularly at the level of individual components,
trols and represents them. The managed element well-established techniques—many of which fall
will essentially be equivalent to what is found in under the rubric of fault tolerance—have led to the
ordinary nonautonomic systems, although it can development of elements that rarely fail, which is
be adapted to enable the autonomic manager to one important aspect of being autonomic. Decades
monitor and control it. The managed element could of developing fault-tolerance techniques have pro-
be a hardware resource, such as storage, a CPU, or duced such engineering feats as the IBM zSeries
a printer, or a software resource, such as a data- servers, which have a mean time to failure of sev-
base, a directory service, or a large legacy system. eral decades.
At the highest level, the managed element could At the higher levels, fixed behaviors, connections,
be an e-utility, an application service, or even an and relationships will give way to increased
individual business. The autonomic manager dis- dynamism and flexibility. All these aspects of auto-
tinguishes the autonomic element from its nonau- nomic elements will be expressed in more high-
tonomic counterpart. By monitoring the managed level, goal-oriented terms, leaving the elements
element and its external environment, and con- themselves with the responsibility for resolving the
structing and executing plans based on an analysis details on the fly.

44 Computer
Hard-coded behaviors will give way to behav- uninstallation or replacement. Each of these
iors expressed as high-level objectives, such as stages has special issues and challenges.
“maximize this utility function,” or “find a rep- Design, test, and verification. Programming an Autonomic
utable message translation service.” Hardwired autonomic element will mean extending Web elements will
connections among elements will give way to in- services or grid services with programming provide a service
creasingly less direct specifications of an element’s tools and techniques that aid in managing only if it is
partners—from specification by physical address relationships with other autonomic elements.
to specification by name and finally to specification Because autonomic elements both consume
consistent with
by function, with the partner’s identity being and provide services, representing needs and their goals.
resolved only when it is needed. preferences will be just as important as rep-
Hard-wired relationships will evolve into flexi- resenting capabilities. Programmers will need
ble relationships that are established via negotia- tools that help them acquire and represent
tion. Elements will automatically handle new policies—high-level specifications of goals and con-
modes of failure, such as contract violation by a straints, typically represented as rules or utility
supplier, without human intervention. functions—and map them onto lower-level actions.
While service-oriented architectural concepts like They will also need tools to build elements that can
Web and grid services2,3 will play a fundamental establish, monitor, and enforce agreements.
role, a sufficient foundation for autonomic com- Testing autonomic elements and verifying that
puting requires more. First, as service providers, they behave correctly will be particularly chal-
autonomic elements will not unquestioningly honor lenging in large-scale systems because it will be
requests for service, as would typical Web services harder to anticipate their environment, especially
or objects in an object-oriented environment. They when it extends across multiple administrative
will provide a service only if providing it is consis- domains or enterprises. Testing networked appli-
tent with their goals. Second, as consumers, auto- cations that require coordinated interactions
nomic elements will autonomously and proactively among several autonomic elements will be even
issue requests to other elements to carry out their more difficult.
objectives. It will be virtually impossible to build test sys-
Finally, autonomic elements will have complex life tems that capture the size and complexity of real-
cycles, continually carrying on multiple threads of istic systems and workloads. It might be possible
activity, and continually sensing and responding to to test newly deployed autonomic elements in situ
the environment in which they are situated. by having them perform alongside more established
Autonomy, proactivity, and goal-directed interac- and trusted elements with similar functionality.
tivity with their environment are distinguishing char- The element’s potential customers may also want
acteristics of software agents. Viewing autonomic to test and verify its behavior, both before estab-
elements as agents and autonomic systems as multi- lishing a service agreement and while the service is
agent systems makes it clear that agent-oriented provided. One approach is for the autonomic ele-
architectural concepts will be critically important.4 ment to attach a testing method to its service
description.
ENGINEERING CHALLENGES Installation and configuration. Installing and config-
Virtually every aspect of autonomic computing uring autonomic elements will most likely entail a
offers significant engineering challenges. The life bootstrapping process that begins when the ele-
cycle of an individual autonomic element or of a ment registers itself in a directory service by pub-
relationship among autonomic elements reveals lishing its capabilities and contact information.
several challenges. Others arise in the context of The element might also use the directory service
the system as a whole, and still more become appar- to discover suppliers or brokers that may provide
ent at the interface between humans and autonomic information or services it needs to complete its ini-
systems. tial configuration. It can also use the service to seek
out potential customers or brokers to which it can
Life cycle of an autonomic element delegate the task of finding customers.
An autonomic element’s life cycle begins with its Monitoring and problem determination. Monitoring
design and implementation; continues with test and will be an essential feature of autonomic elements.
verification; proceeds to installation, configuration, Elements will continually monitor themselves to
optimization, upgrading, monitoring, problem ensure that they are meeting their own objectives,
determination, and recovery; and culminates in and they will log this information to serve as the

January 2003 45
basis for adaptation, self-optimization, and various phases of completion, proactively seeking
reconfiguration. They will also continually inputs from other elements, and so on. They will
The vision of monitor their suppliers to ensure that they need to schedule and prioritize their myriad activ-
autonomic systems are receiving the agreed-on level of service ities, and they will need to represent their life cycle
as a complex supply and their customers to ensure that they are so that they can both reason about it and commu-
web makes problem not exceeding the agreed-on level of demand. nicate it to other elements.
Special sentinel elements may monitor other
determination both
elements and issue alerts to interested par- Relationships among autonomic elements
easier and harder ties when they fail. In its most dynamic and elaborate form, the ser-
than it is now. When coupled with event correlation and vice relationship among autonomic elements will
other forms of analysis, monitoring will be also have a life cycle. Each stage of this life cycle
important in supporting problem determina- engenders its own set of engineering challenges and
tion and recovery when a fault is found or sus- standardization requirements.
pected. Applying monitoring, audit, and verification Specification. An autonomic element must have
tests at all the needed points without burdening sys- associated with it a set of output services it can per-
tems with excessive bandwidth or processing form and a set of input services that it requires,
demands will be a challenge. Technologies to allow expressed in a standard format so that other auto-
statistical or sample-based testing in a dynamic envi- nomic elements can understand it. Typically, the
ronment may prove helpful. element will register with a directory service such
The vision of autonomic systems as a complex as Universal Description, Discovery, and Inte-
supply web makes problem determination both eas- gration6 or an Open Grid Services Architecture
ier and harder than it is now. An autonomic element (OGSA) registry,3 providing a description of its
that detects poor performance or failure in a sup- capabilities and details about addresses and the
plier may not attempt a diagnosis; it may simply protocols other elements or people can use to com-
work around the problem by finding a new supplier. municate with it.
In other situations, however, it will be necessary Establishing standard service ontologies and a
to determine why one or more elements are fail- standard service description syntax and semantics
ing, preferably without shutting down and restart- that are sufficiently expressive for machines to inter-
ing the entire system. This requires theoretically pret and reason about is an area of active research.
grounded tools for tracing, simulation, and prob- The US Defense Advanced Research Projects
lem determination in complex dynamic environ- Agency’s semantic Web effort7 is representative.
ments. Particularly when autonomic elements—or Location. An autonomic element must be able to
applications based on interactions among multi- locate input services that it needs; in turn, other ele-
ple elements—have a large amount of state, recov- ments that require its output services must be able
ering gracefully and quickly from failure or to locate that element.
restarting applications after software has been To locate other elements dynamically, the element
upgraded or after a function has been relocated to can look them up by name or function in a direc-
new machines will be challenging. David Patterson tory service, possibly using a search process that
and colleagues at the University of California, involves sophisticated reasoning about service
Berkeley, and Stanford University have made a ontologies. The element can then contact one or
promising start in this direction.5 more potential service providers directly and con-
Upgrading. Autonomic elements will need to verse with them to determine if it can provide
upgrade themselves from time to time. They might exactly the service they require.
subscribe to a service that alerts them to the avail- In many cases, autonomic elements will also need
ability of relevant upgrades and decide for them- to judge the likely reliability or trustworthiness of
selves when to apply the upgrade, possibly with potential partners—an area of active research with
guidance from another element or a human. many unsolved fundamental problems.
Alternatively, the system could create afresh Negotiation. Once an element finds potential
entirely new elements as part of a system upgrade, providers of an input service, it must negotiate with
eliminating outmoded elements only after the new them to obtain that service.
ones establish that they are working properly. We construe negotiation broadly as any process
Managing the life cycle. Autonomic elements will by which an agreement is reached. In demand-for-
typically be engaged in many activities simultane- service negotiation, the element providing a ser-
ously: participating in one or more negotiations at vice is subservient to the one requesting it, and the

46 Computer
provider must furnish the service unless it does not course, the parties agree to terminate it, free-
have sufficient resources to do so. Another simple ing their internal resources for other uses and System-level
form of negotiation is first-come, first-served, in terminating agreements for input services
which the provider satisfies all requests until it runs that are no longer needed. The parties may engineering issues
into resource limitations. In posted-price negotia- record pertinent information about the ser- include security,
tion, the provider sets a price in real or artificial vice relationship locally, or store it in a data- privacy, and trust,
currency for its service, and the requester must take base a reputation element maintains. and new types of
it or leave it.
services to serve
More complex forms of negotiation include Systemwide issues
bilateral or multilateral negotiations over multiple Other important engineering issues that the needs of other
attributes, such as price, service level, and priority, arise at the system level include security, pri- autonomic
involving multiple rounds of proposals and coun- vacy, and trust, and the emergence of new elements.
terproposals. A third-party arbiter can run an auc- types of services to serve the needs of other
tion or otherwise assist these more complex autonomic elements.
negotiations, especially when they are multilateral. Autonomic computing systems will be sub-
Negotiation will be a rich source of engineering ject to all the security, privacy, and trust issues that
and scientific challenges for autonomic computing. traditional computing systems must now address.
Elements need flexible ways to express multiat- Autonomic elements and systems will need to both
tribute needs and capabilities, and they need mech- establish and abide by security policies, just as
anisms for deriving these expressions from human human administrators do today, and they will need
input or from computation. They also need effec- to do so in an understandable and fail-safe manner.
tive negotiation strategies and protocols that estab- Systems that span multiple administrative
lish the rules of negotiation and govern the flow of domains—especially those that cross company
messages among the negotiators. There must be boundaries—will face many of the challenges that
languages for expressing service agreements—the now confront electronic commerce. These include
culmination of successful negotiation—in their tran- authentication, authorization, encryption, signing,
sient and final forms. secure auditing and monitoring, nonrepudiation,
Efforts to standardize the representation of data aggregation and identity masking, and com-
agreements are under way, but mechanisms for pliance with complex legal requirements that vary
negotiating, enforcing, and reasoning about agree- from state to state or country to country.
ments are lacking, as are methods for translating The autonomic systems infrastructure must let
them into action plans. autonomic elements identify themselves, verify the
Provision. Once two elements reach an agree- identities of other entities with which they com-
ment, they must provision their internal resources. municate, verify that a message has not been altered
Provision may be as simple as noting in an access in transit, and ensure that unauthorized parties do
list that a particular element can request service in not read messages and other data. To satisfy pri-
the future, or it may entail establishing additional vacy policies and laws, elements must also appro-
relationships with other elements, which become priately protect private and personal information
subcontractors in providing some part of the that comes into their possession. Measures that
agreed-on service or task. keep data segregated according to its origin or its
Operation. Once both sides are properly provi- purpose must be extended into the realm of auto-
sioned, they operate under the negotiated agree- nomic elements to satisfy policy and legal require-
ment. The service provider’s autonomic manager ments.
oversees the operation of its managed element, Autonomic systems must be robust against new
monitoring it to ensure that the agreement is being and insidious forms of attack that use self-man-
honored; the service requester might similarly agement based on high-level policies to their own
monitor the level of service. advantage. By altering or otherwise manipulating
If the agreement is violated, one or both elements high-level policies, an attacker could gain much
would seek an appropriate remedy. The remedy greater leverage than is possible in nonautonomic
may be to assess a penalty, renegotiate the agree- systems. Preventing such problems may require a
ment, take technical measures to minimize any new subfield of computer security that seeks to
harm from the failure, or even terminate the agree- thwart fraud and the fraudulent persuasion of
ment. autonomic elements.
Termination. When the agreement has run its On a larger scale, autonomic elements will be

January 2003 47
agents, and autonomic systems will in effect that are inconsistent, implausible, dangerous, or
be multiagent systems built on a Web services unrealizable with the resources at hand. Autonomic
To satisfy privacy
or OGSA infrastructure. Autonomic systems systems will subject such inputs to extra validation,
policies and laws, will be inhabited by middle agents8 that serve and when self-protective measures fail, they will
elements must as intermediaries of various types, including rely on deep-seated notions of what constitutes
appropriately directory services, matchmakers, brokers, acceptable behavior to detect and correct problems.
protect auctioneers, data aggregators, dependency In some cases, such as resource overload, they will
managers—for detecting, recording, and pub- inform human operators about the nature of the
information that licizing information about functional depen- problem and offer alternative solutions.
comes into dencies among autonomic elements—event
their possession. correlators, security analysts, time-stampers, SCIENTIFIC CHALLENGES
sentinels, and other types of monitors that The success of autonomic computing will hinge
assess the health of other elements or of the on the extent to which theorists can identify uni-
system as a whole. versal principles that span the multiple levels at
Traditionally, many of these services have been which autonomic systems can exist—from systems
part of the system infrastructure; in a multiagent, to enterprises to economies.
autonomic world, moving them out of the infra-
structure and representing them as autonomic ele- Behavioral abstractions and models
ments themselves will be more natural and flexible. Defining appropriate abstractions and models
for understanding, controlling, and designing emer-
Goal specification gent behavior in autonomic systems is a challenge
While autonomic systems will assume much of at the heart of autonomic computing. We need fun-
the burden of system operation and integration, it damental mathematical work aimed at under-
will still be up to humans to provide those systems standing how the properties of self-configuration,
with policies—the goals and constraints that gov- self-optimization, self-maintenance, and robustness
ern their actions. The enormous leverage of auto- arise from or depend on the behaviors, goals, and
nomic systems will greatly reduce human errors, adaptivity of individual autonomic elements; the
but it will also greatly magnify the consequences of pattern and type of interactions among them; and
any error humans do make in specifying goals. the external influences or demands on the system.
The indirect effect of policies on system configu- Understanding the mapping from local behavior
ration and behavior exacerbates the problem to global behavior is a necessary but insufficient con-
because tracing and correcting policy errors will be dition for controlling and designing autonomic sys-
very difficult. It is thus critical to ensure that the tems. We must also discover how to exploit the
specified goals represent what is really desired. Two inverse relationship: How can we derive a set of
engineering challenges stem from this mandate: behavioral and interaction rules that, if embedded in
Ensure that goals are specified correctly in the first individual autonomic elements, will induce a desired
place, and ensure that systems behave reasonably global behavior? The nonlinearity of emergent behav-
even when they are not. ior makes such an inversion highly nontrivial.
In many cases, the set of goals to be specified will One plausible approach couples advanced search
be complex, multidimensional, and conflicting. and optimization techniques with parameterized
Even a goal as superficially simple as “maximize models of the local-to-global relationship and the
utility” will require a human to express a compli- likely set of environmental influences to which the
cated multiattribute utility function. A key to reduc- system will be subjected. Melanie Mitchell and col-
ing error will be to simplify and clarify the means leagues9 at the Santa Fe Institute have pioneered
by which humans express their goals to comput- this approach, using genetic algorithms to evolve
ers. Psychologists and computer scientists will need the local transformation rules of simple cellular
to work together to strike the right balance between automata to achieve desired global behaviors. At
overwhelming humans with too many questions or NASA, David Wolpert and colleagues10 have stud-
too much information and underempowering them ied algorithms that, given a high-level global objec-
with too few options or too little information. tive, derive individual goals for individual agents.
The second challenge—ensuring reasonable sys- When each agent selfishly follows its goals, the
tem behavior in the face of erroneous input—is desired global behavior results.
another facet of robustness: Autonomic systems These methods are just a start. We have yet to
will need to protect themselves from input goals understand fundamental limits on what classes of

48 Computer
global behavior can be achieved, nor do we have assume a stationary environment have been
practical methods for designing emergent system observed to fail pathologically in multiagent Optimization
behavior. Moreover, although these methods estab- systems,12 therefore they must either be
lish the rules of a system at design time, autonomic revamped or replaced with new methods. techniques that
systems must deal with shifting conditions that can assume a
be known only at runtime. Control theoretic Negotiation theory stationary
approaches may prove useful in this capacity; some A solid theoretical foundation for negoti- environment must
autonomic managers may use control systems to ation must take into account two perspec-
either be
govern the behavior of their associated managed tives. From the perspective of individual
elements. elements, we must develop and analyze algo- revamped or
The greatest value may be in extending distrib- rithms and negotiation protocols and deter- replaced with
uted or hierarchical control theories, which con- mine what bidding or negotiation algorithms new methods.
sider interactions among independently or are most effective.
hierarchically controlled elements, rather than From the perspective of the system as a
focusing on an individual controlled element. whole, we must establish how overall system
Newer paradigms for control may be needed when behavior depends on the mixture of negotiation
there is no clear separation of scope or time scale. algorithms that various autonomic elements use
and establish the conditions under which multilat-
Robustness theory eral—as opposed to bilateral—negotiations among
A related challenge is to develop a theory of elements are necessary or desirable.
robustness for autonomic systems, including defi-
nitions and analyses of robustness, diversity, redun- Automated statistical modeling
dancy, and optimality and their relationship to one Statistical models of large networked systems will
another. The Santa Fe Institute recently began a let autonomic elements or systems detect or predict
multidisciplinary study on this topic (http://discuss. overall performance problems from a stream of sen-
santafe.edu/robustness). sor data from individual devices. At long time
scales—during which the configuration of the sys-
Learning and optimization theory tem changes—we seek methods that automate the
Machine learning by a single agent in relatively sta- aggregation of statistical variables to reduce the
tic environments is well studied, and it is well sup- dimensionality of the problem to a size that is
ported by strong theoretical results. However, in amenable to adaptive learning and optimization
more sophisticated autonomic systems, individual techniques that operate on shorter time scales.
elements will be agents that continually adapt to their
environment—an environment that consists largely
of other agents. Thus, even with stable external con- s it possible to meet the grand challenge of auto-
ditions, agents are adapting to one another, which
violates the traditional assumptions on which single-
agent learning theories are based.
I nomic computing without magic and without
fully solving the AI problem? We believe it is, but
it will take time and patience. Long before we solve
There are no guarantees of convergence. In fact, many of the more challenging problems, less auto-
interesting forms of instability have been observed mated realizations of autonomic systems will be
in such cases.11 Learning in multiagent systems is a extremely valuable, and their value will increase
challenging but relatively unexplored problem, substantially as autonomic computing technology
with virtually no major theorems and only a hand- improves and earns greater trust and acceptance.
ful of empirical results. A vision this large requires that we pool exper-
Just as learning becomes a more challenging prob- tise in many areas of computer science as well as
lem in multiagent systems, so does optimization. The in disciplines that lie far beyond computing’s tra-
root cause is the same—whether it is because they ditional boundaries. We must look to scientists
are learning or because they are optimizing, agents studying nonlinear dynamics and complexity for
are changing their behavior, making it necessary for new theories of emergent phenomena and robust-
other agents to change their behavior, potentially ness. We must look to economists and e-commerce
leading to instabilities. Optimization in such an envi- researchers for ideas and technologies about nego-
ronment must deal with dynamics created by a col- tiation and supply webs. We must look to psy-
lective mode of oscillation rather than a drifting chologists and human factors researchers for new
environmental signal. Optimization techniques that goal-definition and visualization paradigms and

January 2003 49
for ways to help humans build trust in autonomic 5. D. Patterson et al., Recovery-Oriented Computing
systems. We must look to the legal profession, since (ROC): Motivation, Definition, Techniques, and
many of the same issues that arise in the context of Case Studies, tech. report CSD-02-1175, Computer
e-commerce will be important in autonomic sys- Science Dept., Univ. of Calif., Berkeley, Calif., Mar.
tems that span organizational or national bound- 2002.
aries. 6. Ariba, IBM, and Microsoft, “UDDI Technical White
Bridging the language and cultural divides Paper,” 2000; http://www.uddi.org/whitepapers.html.
among the many disciplines needed for this 7. T. Berners-Lee, J. Hendler, and O. Lassila, “The
endeavor and harnessing the diversity to yield suc- Semantic Web,” Scientific American, May 2001, pp.
cessful and perhaps universal approaches to auto- 28-37.
nomic computing will perhaps be the greatest 8. H. Wong and K. Sycara, “A Taxonomy of Middle
challenge. It will be interesting to see what new Agents for the Internet,” Proc. 4th Int’l Conf. Mul-
cross-disciplines develop as we begin to work tiagent Systems, IEEE CS Press, 2000, pp. 465-466.
together to solve these fundamental problems. ■ 9. R. Das et al., “Evolving Globally Synchronized Cel-
lular Automata,” Proc. 6th Int’l Conf. Genetic Algo-
rithms, L. Eshelman, ed., Morgan Kaufmann, 1995,
Acknowledgments pp. 336-343.
We are indebted to the many people who influ- 10. D. Wolpert, K. Wheeler, and K. Tumer, Collective
enced this article with their ideas and thoughtful Intelligence for Control of Distributed Dynamical
criticisms. Special thanks go to David Chambliss for Systems, tech. report NASA-ARC-IC-99-44, NASA,
contributing valuable thoughts on human-computer Ames, Iowa, 1999.
interface issues. We also thank Bill Arnold, David 11. J.O. Kephart and G.J. Tesauro, “Pseudo-Convergent
Bantz, Rob Barrett, Peter Capek, Alan Ganek, Q-Learning by Competitive Pricebots,” Proc. 17th
German Goldszmidt, James Hanson, Joseph Int’l Conf. Machine Learning, Morgan Kaufmann,
Hellerstein, James Kozloski, Herb Lee, Charles 2000, pp. 463-470.
Peck, Ed Snible, and Ian Whalley for their helpful 12. J.O. Kephart et al., “Pricing Information Bundles in
comments, and the members of an IBM Academy of a Dynamic Environment,” Proc. 3rd ACM Conf.
Technology team for their extensive written and ver- Electronic Commerce, 2001, ACM Press, pp. 180-
bal contributions: Lisa Spainhower and Kazuo 190.
Iwano (co-leaders), William H. Tetzlaff (Technology
Council contact), Robert Abrams, Sam Adams,
Steve Burbeck, Bill Chung, Denise Y. Dyko, Stuart
Feldman, Lorraine Herger, Mark Johnson, James
Kaufman, David Kra, Ed Lassettre, Andreas Maier,
Timothy Marchini, Norm Pass, Colin Powell, Jeffrey O. Kephart manages the Agents and Emer-
Stephen A. Smithers, Daniel Sturman, Mark N. gent Phenomena group at the IBM Thomas J. Wat-
Wegman, Steve R. White, and Daniel Yellin. son Research Center. His research focuses on the
application of analogies from biology and eco-
nomics to massively distributed computing systems,
References particularly in the domains of autonomic comput-
1. IBM, “Autonomic Computing: IBM’s Perspective on ing, e-commerce, and antivirus technology. Kephart
the State of Information Technology”; http://www-1. received a BS from Princeton University and a PhD
ibm.com/industries/government/doc/content/resource/ from Stanford University, both in electrical engi-
thought/278606109.html. neering. Contact him at kephart@us.ibm.com.
2. H. Kreger, “Web Services Conceptual Architecture,”
v. 1.0. 2001; http://www-4.ibm.com/software/
solutions/webservices/pdf/WSCA.pdf.
3. I. Foster et al., “The Physiology of the Grid: An Open
Grid Services Architecture for Distributed Systems David M. Chess is a research staff member at the
Integration,” Feb. 2002; http://www.globus.org/ IBM Thomas J. Watson Research Center, working
research/papers/ogsa.pdf. in autonomic computing and computer security.
4. N.R. Jennings, “On Agent-Based Software Engi- He received a BA in philosophy from Princeton
neering,” Artificial Intelligence, vol. 177, no. 2, 2000, University and an MS in computer science from
pp. 277-296. Pace University. Contact him at chess@us.ibm.com.

50 Computer

View publication stats

Potrebbero piacerti anche