Thesis PDF

Feedback Control Real-Time Scheduling
A Dissertation
Presented to the Faculty of the School of Engineering and Applied Science
University of Virginia

In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Computer Science

by

Chenyang Lu
May 2001

2

Copyright by
Chenyang Lu
All Rights Reserved
May 2001

3

Approvals
This dissertation is submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Computer Science

__________________________________
Chenyang Lu

Approved:

__________________________________
John A. Stankovic (Advisor)
__________________________________
Sang H. Son (Chair)
__________________________________
Tarek F. Abdelzaher
__________________________________
Marty Humphrey
__________________________________
Jrg Liebeherr
__________________________________
Gang Tao (Minor Representative)

Accepted by the School of Engineering and Applied Science:

__________________________________
Richard W. Miksad (Dean)

May 2001

4
Abstract
We develop Feedback Control real-time Scheduling (FCS) as a unified framework to
provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-
business servers on the Internet). FCS includes four major components. First, novel
scheduling architectures provide performance control to a new category of QoS critical
systems that cannot be addressed by traditional open loop scheduling paradigms. Second,
we derive dynamic models for computing systems for the purpose of performance
control. These models provide a theoretical foundation for adaptive performance control.
Third, we apply established control methodology to design scheduling algorithms with
proven performance guarantees, which is in contrast with existing heuristics-based
solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-
based performance specifications characterizes the efficiency, accuracy, and robustness
of QoS guarantees.
The generality and strength of FCS are demonstrated by its instantiations in three
important applications with significantly different characteristics. First, we develop real-
time CPU scheduling algorithms that guarantees low deadline miss ratios in systems
where task execution times may deviate from estimations at run-time. We solve the
saturation problems of real-time CPU scheduling systems with a novel integrated control
structure. Second, we develop an adaptive web server architecture to provide relative and
absolute delay guarantees to different service classes with unpredictable workloads. The
adaptive architecture has been implemented by modifying an Apache web server.
Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server
provides robust relative/absolute delay guarantees despite of instantaneous changes in the
user population. Third, we develop a data migration executor for networked storage
systems that migrate data on-line while guaranteeing specified I/O throughput of
concurrent applications.

5
Acknowledgements
First, thanks to my advisor, Jack Stankovic, for being a great mentor to me both
personally and professionally. His encouragement, support, and advice are greatly
appreciated. My thanks go to Tarek Abdelzaher, Sang Son, and Gang Tao for sharing
their ideas and insights on research.
Thanks to Guillermo Alvarez, John Wilkes, Michael Hobbs, Ralph Becker-Szendy,
Simon Towers, and all other members of the storage systems program at HP Labs for
offering me a great research environment and their collaborations during my internship at
HP Labs.
Thanks to Jrg Liebeherr and Marty Humphrey for serving on my dissertation
committee and their valuable suggestions on my dissertation.
Thanks to Jrgen Hansson, Victor Lee, Michael Marley, John Regehr, and all other
members of the real-time systems group for interesting and stimulating discussions.
Thanks to all of my friends for providing invaluable moral support. I want to
especially thank Hainan Lin for helping me through the years at Charlottesville.
Finally but not least, I want to thank my parents and my wife for their understanding
and support of my research endeavors and accompanying me through all the happy and
sad days.

6
Table of Contents

1. I ntroduction............................................................................................................ 15
1.1. Motivation................................................................................................. 15
1.2. Contributions............................................................................................. 19
2. Related Work.......................................................................................................... 26
2.1. Classical Real-Time Scheduling............................................................... 27
2.2. Real-Time Scheduling for Embedded Digital Control Systems ............... 28
2.3. QoS Adaptation......................................................................................... 28
2.4. Service Delay Guarantee in Web Servers................................................. 30
2.5. Data Migration in Storage Systems .......................................................... 31
3. Feedback Control Real-Time Scheduling Framework........................................ 32
3.1. Feedback Control Scheduling Architecture.............................................. 33
3.1.1. Control Related Variables..................................................................... 33
3.1.2. Feedback Control Loop......................................................................... 35
3.2. Performance Specifications and Metrics .................................................. 36
3.2.1. Performance Profile .............................................................................. 37
3.2.2. Load Profile .......................................................................................... 39
3.3. Control Theory Based Design Methodology............................................ 42
4. Real-Time CPU Scheduling.................................................................................. 45
4.1. Feedback Control Real-Time Scheduling Architecture............................ 47
4.1.1. Task Model ........................................................................................... 48

7
4.1.2. Control Related Variables..................................................................... 49
4.1.3. Feedback Control Loop......................................................................... 51
4.1.4. Basic Scheduler..................................................................................... 52
4.2. Performance Specifications and Metrics .................................................. 53
4.2.1. Performance Profile .............................................................................. 53
4.2.2. Load Profile .......................................................................................... 55
4.3. Modeling the Controlled Real-Time System............................................ 56
4.4. Design of FC-RTS Algorithms ................................................................. 60
4.4.1. Design of the Controller........................................................................ 61
4.4.2. Closed-Loop System Model ................................................................. 62
4.4.3. Control Tuning and Analysis................................................................ 64
4.4.4. FC-RTS Algorithms.............................................................................. 73
4.5. Experiments .............................................................................................. 80
4.5.1. FECSIM Real-Time System Simulator ................................................ 81
4.5.2. Scheduling Policy of the Basic Scheduler ............................................ 81
4.5.3. Workload............................................................................................... 82
4.5.4. QoS Actuator ........................................................................................ 84
4.5.5. Profiling the Controlled Real-Time Systems........................................ 85
4.5.6. Controller Parameters ........................................................................... 87
4.5.7. Performance References ....................................................................... 88
4.5.8. Evaluation Experiment A: Arrival Overload........................................ 90
4.5.9. Evaluation Experiment B: Arrival/Internal Overload........................... 96
4.6. Comparison of Real-Time Scheduling Algorithms in Overload ............ 108

8
4.7. Summary................................................................................................. 109
5. Web Server with Delay Guarantees..................................................................... 111
5.1. Introduction............................................................................................. 111
5.2. Background............................................................................................. 116
5.3. Semantics of Service Delay Guarantees ................................................. 118
5.4. A Feedback Control Architecture for Web Server QoS ......................... 120
5.4.1. Connection Scheduler ......................................................................... 121
5.4.2. Server Processes.................................................................................. 123
5.4.3. Monitor ............................................................................................... 123
5.4.4. Controllers........................................................................................... 123
5.5. Design of the Controller.......................................................................... 127
5.5.1. Performance Specifications ................................................................ 128
5.5.2. Modeling the Web Server: A System Identification Approach.......... 129
5.5.3. Root-Locus Design ............................................................................. 133
5.6. Implementation ....................................................................................... 136
5.7. Experimentation...................................................................................... 138
5.7.1. Comparing Connection Delays and Response Times......................... 139
5.7.2. System Identification .......................................................................... 141
5.7.3. Evaluation of the Adaptive Web Server ............................................. 143
5.8. Summary................................................................................................. 150
6. Online Data Migration in Storage Systems........................................................ 152
6.1. Introduction and Motivations.................................................................. 152
6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration 156

9
6.2.1. Migration Planner ............................................................................... 156
6.2.2. LV Mover............................................................................................ 157
6.2.3. QoS guarantees ................................................................................... 158
6.2.4. The Feedback Control Loop ............................................................... 160
6.2.5. The Monitor ........................................................................................ 161
6.2.6. The Controller..................................................................................... 161
6.2.7. The Actuator ....................................................................................... 163
6.3. Design and Analysis of the Controller.................................................... 163
6.3.1. The Dynamic Model ........................................................................... 164
6.3.2. Controller Tuning and Analysis.......................................................... 165
6.4. Implementation ....................................................................................... 168
6.5. Experiments ............................................................................................ 169
6.5.1. Experiment Configurations................................................................. 170
6.5.2. Migration Penalty................................................................................ 171
6.5.3. System Profiling.................................................................................. 175
6.5.4. Performance Evaluation...................................................................... 177
6.6. Conclusion and Future Work.................................................................. 185
7. General I ssues...................................................................................................... 187
7.1. Granularity of Performance Control ....................................................... 187
7.2. Sampling Period and Overhead .............................................................. 189
7.3. Robustness of Linear Models and PI Control ......................................... 191
8. Conclusions and Future Work............................................................................ 193
Reference.................................................................................................................. 197

10
List of Figures
Figure 3.1 The FCS Architecture..33
Figure 3.2 Control Theory based Design Methodology for FCS Algorithms...41
Figure 4.1 Feedback Control Real-Time Scheduling Architecture.................................. 47
Figure 4.2 The Model of the Controlled System.............................................................. 57
Figure 4.3 Closed-Loop System Model for Real-Time CPU Scheduling ........................ 62
Figure 4.4 System Response to Reference Input .............................................................. 69
Figure 4.5 System Response to Disturbance Input ........................................................... 70
Figure 4.6 Settling Time vs. Process Gain........................................................................ 72
Figure 4.7 The FC-UM Algorithm.................................................................................... 76
Figure 4.8 The FECSIM Simulator................................................................................... 81
Figure 4.9 Controlled Variables vs. Total Requested Utilization..................................... 86
Figure 4.10 Response to Arrival Overload SL(0, 150%) (DM/PA).................................. 89
Figure 4.11 Response to Arrival Overload SL(0, 150%) (EDF/P) ................................... 90
Figure 4.12 Execution Time Factor G
a
in Experiment B................................................. 96
Figure 4.13 Response to Arrival/Internal Overload (DM/PA) ......................................... 97
Figure 4.14 Response to Arrival/Internal Overload (EDF/P) ........................................... 98
Figure 4.15 Average Performance of FC-RTS algorithms and the Baseline.................. 107
Figure 5.1 The Feedback-Control Architecture for Delay Guarantees .......................... 120
Figure 5.2 Architecture for system identification .......................................................... 131
Figure 5.3 The Root Locus of the web server model ..................................................... 136
Figure 5.4 Connection delay and response time............................................................. 139

11
Figure 5.5 System identification results for Relative Delay .......................................... 141
Figure 5.6 System Identification Results for Absolute Delay........................................ 143
Figure 5.7 Evaluation Results of Relative Delay Guarantees between Two Classes..... 146
Figure 5.8 Evaluation Results of Relative Delay Guarantees for Three Classes ........... 147
Figure 5.9 Evaluation of Absolute Delay Guarantees.................................................... 150
Figure 6.1 Aqueduct: The Feedback Control Architecture for Data Migration............. 160
Figure 6.2 Step Response of Aqueduct...167
Figure 6.3 Device iops during data migration................................................................ 172
Figure 6.4 Migration Penalty in Experiment 1173
Figure 6.5 Migration Penalty in Experiment 2173
Figure 6.6 Relationship between migration speed and migration speed176
Figure 6.7 Device iops and control input of Aqueduct...180
Figure 6.8 Average iops of AFAP and Aqueduct, and Aqueduct in steady state .......... 181
Figure 6.9 QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State ... 183
Figure 6.10 QoS violation ratio using 0.98IS..183
Figure 6.11 Worst QoS Violations of AFAP, Aqueduct, Aqueduct in steady state....184
Figure 6.12 Execution Time of Migration Plan.......185

12
List of Tables
Table 4.1 Testing Configurations.................................................................................... 82
Table 4.2 Controller Parameters of FC-RTS Algorithms ............................................... 87
Table 4.3 Performance References of FC-RTS Algorithms ........................................... 88
Table 4.4 The Performance Profiles of FC-U in Experiment B.................................... 100
Table 4.5 The Performance Profiles of FC-M in Experiment B....103
Table 4.6 The Performance Profiles of FC-UM in Experiment B.105
Table 4.7 Comparison of Real-Time Scheduling Paradigms in Overload Conditions .109
Table 5.1 Variables and Parameters of the Absolute Delay Controller CA
k
................. 124
Table 5.2 Variables and Parameters of the Relative Delay Controller CR
k
.................. 126

13
List of Symbols
C(k) a controlled variable
C
S
a performance reference
U(k) a manipulated variable
T
S
the settling time
C
O
the overshoot
E
SC
the steady-state error
S
P
the sensitivity with regard to a system parameter P
SL(L
n
, L
m
) the step load that increases instantaneously from L
n
to L
m
RL(L
n
, L
m
, T
R
) the ramp load that increases linearly from L
n
to L
m
within T
R
sec
D
i
[j] the relative deadline of task i at QoS level j
EE
i
[j] the estimated execution time of task i at QoS level j
AE
i
[j] the actual execution time of task i at QoS level j
V
i
[j] the value of task i at QoS level j
P
i
[j] the invocation period of periodic task i at QoS level j
EI
i
[j] the estimated inter-arrival-time of aperiodic task i at QoS level j
AI
i
[j] the average inter-arrival-time of aperiodic task i at QoS level j
B
i
[j] the estimated CPU utilization of task i at QoS level j
A
i
[j] the actual CPU utilization of task i at QoS level j
G
a
(k) the utilization ratio in the k
th
sampling period
G
A
the worst-case utilization ratio
G
m
(k) the miss ratio factor in the k
th
sampling period
G
M
the worst-case miss ratio factor
A
th
(k) the schedulable utilization threshold G
A
in the k
th
sampling period
W
k
the absolute or relative connection delay guarantee of service class k
C
k
(m) the connection delay of class k in the m
th
sampling period
B
k
(m) the process budget of class k in the m
th
sampling period
R
m
(k) the inter-submove-time in the k
th
sampling period
I
i
(k) the number of I/O per sec of device i in the k
th
sampling period

14
List of Abbreviations
FCS Feedback Control real-time Scheduling
RM Rate Monotonic scheduling policy
EDF Earliest Deadline First scheduling policy
DM Deadline Monotonic scheduling policy

15

Chapter 1
Introduction
1.1. Motivation
Real-time scheduling algorithms fall into two categories: static and dynamic scheduling.
In static scheduling, the scheduling algorithm has complete knowledge of the task set and
its constraints, such as deadlines, computation times, precedence constraints, and future
release times. The Rate Monotonic (RM) algorithm and its extensions [40][48] are static
scheduling algorithms and represent one major paradigm for real-time scheduling. In
dynamic scheduling, however, the scheduling algorithm does not have the complete
knowledge of the task set or its timing constraints. For example, new task activations, not
known to the algorithm when it is scheduling the current task set, may arrive at a future
unknown time. Dynamic scheduling can be further divided into two categories:
scheduling algorithms that work in resource sufficient environments and those that work
in resource insufficient environments. Resource sufficient environments are systems
where the system resources are sufficient to a priori guarantee that, even though tasks
arrive dynamically, at any given time all the tasks are schedulable. Under certain

16
conditions, Earliest Deadline First (EDF) [48][71] is an optimal dynamic scheduling
algorithm in resource sufficient environments. EDF is a second major paradigm for real-
time scheduling. While real-time system designers try to design the system with
sufficient resources, because of cost and unpredictable environments, it is sometimes
impossible to guarantee that the system resources are sufficient. In this case, EDFs
performance degrades rapidly in overload situations. The Spring scheduling algorithm
[79] can dynamically guarantee incoming tasks via on-line admission control and
planning and thus is applicable in resource insufficient environments. Many other
algorithms [71] have also been developed to operate in this way. These admission-
control-based algorithms represent the third major paradigm for real-time scheduling.
However, despite the significant body of results in these three paradigms of real-time
scheduling, many real world problems are not easily supported. While algorithms such as
EDF, RM and the Spring scheduling algorithm can support sophisticated task set
characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.),
they are all "open loop" scheduling algorithms. Open loop refers to the fact that once
schedules are created they are not "adjusted" based on continuous feedback. While open-
loop scheduling algorithms can perform well in predictable environments in which the
workloads can be accurately modeled (e.g., traditional process control systems), they can
perform poorly in unpredictable environments, i.e., systems whose workloads cannot be
accurately modeled. For example, the Spring scheduling algorithm assumes complete
knowledge of the task set except for their future release times. Systems with open-loop
schedulers such as the Spring scheduling algorithm are usually designed based on worst-
case workload parameters. When accurate system workload models are not available,

17
such an approach can result in a highly underutilized system based on extremely
pessimistic estimation of workload.
In recent years, a new category of soft real-time applications executing in open and
unpredictable environments is rapidly growing [69]. Examples include open systems on
the Internet such as online trading and e-business servers, and data-driven systems such
as smart spaces, agile manufacturing, and many defense applications such as C4I. For
example, in an e-business server, neither the resource requirements nor the arrival rate of
service requests are known a priori. However, performance guarantees are required in
these applications. Failure to meet performance guarantees may result in loss of
customers, financial damage, liability violations, or even mission failures. For these
applications, a system design based on open loop scheduling and estimation of worst-case
resource requirements can result in an extremely expensive and underutilized system.
As a cost-effective approach to achieve performance guarantees in unpredictable
environments, several adaptive scheduling algorithms have been recently developed (e.g.,
[5][8][9][24][44][46][55]). While early research on real-time scheduling was concerned
with guaranteeing complete avoidance of undesirable effects such as overload and
deadline misses, adaptive real-time systems are designed to handle such effects
dynamically. There remain many open research questions in adaptive real-time
scheduling. In particular, how can a system designer specify the performance requirement
of an adaptive real-time system? And how can he systematically design a scheduling
algorithm to satisfy its performance specifications? The design methodology for
automatic adaptive systems has been developed in feedback control theory [32][34].
However, feedback control theory has been mostly applied in mechanical and electrical

18
systems. The modeling, analysis and implementation of adaptive real-time systems lead
to significant research challenges.
Recently, several works applied control theory to computing systems. For example,
several papers [4][13][22][23][28][58][63][66][73][75] presented flexible or adaptive
real-time (CPU) scheduling techniques to improve digital control system performance.
These techniques are tailored to the specific characteristics of digital control systems
instead of general adaptive real-time computing systems. Several other papers [6][19]
[44][63][64][74] presented adaptive CPU scheduling algorithms or QoS management
architectures for computing systems such multimedia and communication systems.
Transient and steady state performance of adaptive real-time systems has received special
attention in recent years. For example, Brandt et. al. [19] evaluated a dynamic QoS
manager by measuring the transient performance of applications in response to QoS
adaptations. Rosu et. al. [64] proposed a set of performance metrics to capture the
transient responsiveness of adaptations and its impact on applications. The paper
proposed metrics that is similar to settling time and steady-state error metrics found in
control theory.
However, to our best knowledge, no unified framework exists to date for designing an
adaptive system from performance specifications of desired dynamic response. In this
thesis, we establish feedback control real-time scheduling (FCS) [53], a unified
framework of adaptive real-time systems based on feedback control theory. Our control
theoretical framework includes the following elements:

Feedback control scheduling architectures that map the feedback control structure

19
to adaptive resource scheduling in real-time systems [52],
A set of performance specifications and metrics to characterize transient and
steady state performance of adaptive real-time systems [51], and
A control theory based design methodology for resource scheduling algorithms to
satisfy their performance specifications [50][53].

In contrast with ad hoc approaches that rely on laborious design/tuning/testing
iterations, our framework enables system designers to systematically design adaptive
real-time systems with established analytical methods to achieve desired performance
guarantees in unpredictable environments.
1.2. Contributions
Specifically, the main contributions of this thesis work are as follows:
A control-theoretical foundation for adaptive real-time systems: We apply
control theory to provide a theoretical foundation for adaptive real-time
scheduling. In contrast with some existing scheduling algorithms that utilize
feedback control in an ad hoc manner, we provide theoretical understanding of
feedback control scheduling and develop a systematic design methodology for
adaptive real-time systems with analytically proven performance guarantees in
unpredictable environments.

Design methodology for real-time systems in unpredictable environments:
While traditional design methods for real-time system design depend on a priori
known workloads parameters (e.g., worst-case execution times, worst case arrival

20
rates, and blocking factors due to resource contentions), our control theory based
design methodology provides robust performance guarantees when accurate
characterizations of the workloads are not available. This feature makes our
design framework especially valuable for performance critical systems in
unpredictable environments, e.g., open systems on the Internet such as online
trading and e-business servers, and data-driven systems such as smart space, agile
manufacturing, and many defense applications.

Software architecture for feedback performance control: We develop a
general software architecture for adaptive performance control in unpredictable
environments. Our architecture facilitates control theory based design and
analysis of an adaptive real-time system by mapping it to the structure of
feedback control systems. This architecture includes a set of control-related
variables (performance references, controlled variables and manipulated
variables), and software components such as monitor, actuator, and controller.
Our architecture has been implemented as three instances tailored the specific
characteristics and performance requirements of different applications including
real-time CPU scheduling, a web server, and data migration in networked storage
systems. These successful instantiations demonstrate the general applicability of
our architecture in software systems in unpredictable environments.

Performance specifications and guarantees: While hard real-time systems
require absolute guarantees, such guarantees are infeasible and unnecessary for

21
many soft real-time systems in unpredictable environment. We adopt a set of
performance metrics and specifications in control theory to characterize the
transient and steady state performance of adaptive real-time systems. Transient
state performance (including settling time and overshoot) of an adaptive system
represents the responsiveness and efficiency of adaptation in response to
environmental variations, and steady-state performance (including stability,
steady state error, and sensitivity) describes a system's long-term performance. In
contrast, traditional metrics such as average miss-ratio cannot capture the
transient behavior of the system in response to load variations.

Modeling real-time computing systems: Unlike traditional control systems such
as electrical and mechanical systems, real-time computing systems do not have
readily available differential/difference equations that can be used in control
analysis. In this thesis work, we apply analytical approach and system
identification techniques to the modeling of three computing systems, a generic
CPU-bound real-time system, a modified Apache web server, and a networked
storage system. In the analytical approach, a system designer describes a system
directly with mathematical equations based on the knowledge of the system
dynamics. When such knowledge is not available (as in the case of the Apache
web server), we use system identification [11] to estimate the system model based
on system input/output from profiling experiments. This modeling methodology
and established analytical models provide a basis for the application of control
theory to adaptive real-time scheduling.

22

Handling non-linearities of real-time systems: The control design of an
adaptive resource scheduler is non-trivial due to the non-linearities and unknown
or random factors in many real-time computing systems. We solved these
problems with model linearization techniques and novel control structures based
on the particular characteristics of real-time systems. Our work demonstrates that
robust performance control can be achieved despite of the intrinsic non-linearities
and uncertainties of real-time systems.

Practical FCS implementation in three applications: Using our design
framework, we develop practical resource scheduling algorithms that can provide
robust (steady state and transient) performance guarantees in unpredictable
environments, while traditional scheduling algorithms fail to provide such
guarantees. We develop FCS algorithms for three application domains including
real-time CPU scheduling, web servers, and storage systems. These applications
are significantly different in terms of semantics of performance guarantees,
scheduled resources, monitor/actuator mechanisms, and system models. Our
evaluation experiments demonstrate that our FCS algorithms based on the FCS
framework successfully achieved robust performance guarantees in all three
applications. The success in these applications demonstrates that FCS is a unified
framework for adaptive computing systems.
Real-Time CPU Scheduling: We develop a set of feedback control real-
time scheduling (FCS) algorithms that guarantees low deadline miss ratio

23
and high CPU utilization by dynamically adjusting task QoS levels and
CPU requirements. Simulation experiments demonstrate that our FCS
algorithms provide robust steady and transient state performance
guarantees in terms of deadline miss ratio even when the task execution
time varied considerably from the estimation and when the systems
schedulable utilization bound is unknown.
Connection Scheduling in Web Servers: We develop adaptive connection
scheduling algorithms that provide relative, absolute and hybrid service
delay guarantees for different service classes on web servers under HTTP
1.1. The scheduling algorithms feature feedback control loops that
enforce delay guarantees for classes via dynamic connection scheduling
and server process reallocation. The scheduling algorithms have been
implemented by modifying an Apache web server. Experimental results
demonstrate that our adaptive server provides robust delay guarantees
when web workload varies significantly. Properties of our adaptive web
server also include guaranteed stability, and satisfactory efficiency and
accuracy in achieving desired delay or delay differentiation. Our new real-
time web server will be particularly useful for e-business and e-trading
applications, where a priori QoS guarantees is desirable in face of bursty
and unpredictable workloads from the Internet.
On-line Data Migration in Storage Systems: We have extended our work
to a non-real-time application, on-line data migration in storage systems.
On-line data migration is necessary in large-scale storage systems (e.g.,

24
data centers of e-business and large organizations, and multimedia service
centers such as video-on-demand) due to performance optimization and
load balancing, and back-up operations. However, data migration can
cause unacceptable performance degradations in concurrent applications
due to excessive resource contentions on the storage system. We develop
an adaptive data migration executor with a feedback control architecture
that guarantees desired I/O throughput for applications by dynamically
regulating the speed of data migration. The migration executor has been
implemented and evaluated at a storage testbed at HP Labs. Our
evaluation experiments demonstrate that our adaptive migration executor
achieved specified I/O throughput of all devices at the cost of slowing
down data migration. Our work on storage systems demonstrates the
generality of our control-theory-based framework in non-real-time
systems.

Technology Impact: Not only have we produced several research papers
[6][50][51][52][53][70], parts of this thesis work have also been transferred to
other university research groups. We have sent our real-time CPU scheduling
simulator FECSIM and the feedback control CPU scheduling algorithms to a
group in Sweden for them to study the algorithms. We have transferred the source
code of our adaptive web server and system identification software to Professor
Lui Shas group at UIUC and given them inputs on modeling of web servers. The
project of online data migration in networked storage systems was conducted

25
when the author was a research intern in the Storage Systems Program at Hewlett
Packard Laboratories (Palo Alto). Hewlett Packard is in the process of applying
the feedback control data migration technique developed in the Aqueduct project
for a patent.

The rest of the thesis is organized as follows. We discuss the state-of-the-art in
Chapter 2. In Chapter 3, we present the general control-theory based design methodology
for adaptive real-time systems. The first case study, feedback control real-time CPU
scheduling, is presented in Chapter 4. The second case study, adaptive connection
scheduling for service delay guarantees in web servers, is presented in Chapter 5. The
third case study, on-line data migration with I/O throughput guarantees on concurrent
applications in storage systems, is presented in Chapter 6. After summarizing several
general issues in Chapter 7, we conclude the thesis at Chapter 8.

26

Chapter 2
Related Work

A general trend of real-time resource scheduling has evolved from static to dynamic and
adaptive while the target application environments becomes increasingly unpredictable.
While classical real-time scheduling that concerns with absolute guarantees in highly
predictable environments, more recent research aims at developing more flexible,
adaptive and cost-effective solutions to handle unpredictable environments. This thesis
work establishes a theoretical foundation and unified framework for achieving a new
category of performance guarantees in unpredictable environments with adaptive real-
time resource scheduling. In this chapter, we summarize the work related to this thesis
research. The classical results on real-time scheduling is described in Section 2.1. A
category of flexible and adaptive real-time scheduling algorithms tailored for digital
control systems is summarized in Section 2.2. In Section 2.3, we then describe existing
QoS adaptation techniques and compare them with our FCS framework. Related works
on web server delay guarantees and storage systems are summarized in Sections 2.4 and
2.5, respectively.

27
2.1. Classical Real-Time Scheduling
Classical real-time scheduling algorithms depend on a priori characterization of
workload and systems to provide performance guarantees in predictable environments
(e.g., embedded process control and avionics). For example, Rate Monotonic (RM)
[40][48] and Earliest Deadline First (EDF) [48][71] require complete knowledge about
the task set such as resource requirements, precedence constraints, resource contention,
and future arrival times. Dynamic real-time systems [71] pioneered by the Spring project
[79] provide guarantees upon new task arrivals with on-line admission control and
planning. Unlike earlier systems based on RM or EDF, the dynamic real-time systems do
not require future task arrival time to be known a priori. However, the on-line admission
control and planning in the above dynamic systems still depend on a priori task set
characterizations including resource requirements, precedence constraints, and resource
contention. While classical algorithms such as EDF, RM and the Spring scheduling
algorithm can support sophisticated task set characteristics, they cannot provide
performance guarantees in systems operating in unpredictable environments where an
accurate workload model is not available. Such systems include Internet servers (e.g., on-
line stock trading and e-business) and data-driven systems (e.g., smart spaces and agile
manufacturing). A key observation that motivated this thesis work is that a fundamental
reason for the inadequacy of classical real-time scheduling in unpredictable environments
lies in their open loop nature. Because they do not adjust schedules based on continuous
performance feedback, open loop schedulers schedule tasks and system resource based on
worst-case workload estimations. When accurate system workload models are not
available, the open loop approach may result in a highly underutilized system based on

28
extremely pessimistic estimation of workload. In contrast, feedback control real-time
scheduling provides robust performance guarantees in unpredictable environments with a
closed loop approach.
2.2. Real-Time Scheduling for Embedded Digital Control Systems
There have been several results that have applied feedback control theory to the design of
real-time computing systems. For example, several papers [30][58][65][66] presented co-
design methods for real-time scheduling algorithms and embedded digital control
systems. The co-design methods trade-off the quality of control performance and its
computation requirements to produce more cost-effective system designs than separate
design of control and scheduling. There approaches are off-line solutions and their on-
line scheduling algorithms are still classical open-loop algorithms such as EDF and RM.
Several other papers presented on-line scheduling algorithms [4][16][22][23][30][73] to
improve the robustness of digital control system by dynamically relaxing the timing
constraints within the tolerable range of the digital control system in overload conditions.
However, these techniques require a priori knowledge of the tasks such as execution
times. Furthermore, these techniques are tailored to CPU-bound digital controllers and
are not applicable to other computing systems such as e-business servers and on-line
trading where the performance bottleneck may not be the CPU.
2.3. QoS Adaptation
The concept of using performance feedback to adjust the schedule has been incorporated
in general-purpose operating systems in the form of multi-level feedback queue
scheduling [18]. The system adjusts a tasks priority based on whether it consumes a time

29
slice or is blocked due to I/O. This type of feedback control is based on intuitive solutions
rather than systematic control derivation to achieve performance guarantees.
In recent years, QoS adaptation architectures and algorithms have been developed to
support applications such as communication subsystems [8], multimedia [19][24],
distributed visual tracking [46] and operating systems [55][61][63][69][78]. Some of
these techniques [55][61][63] include optimization algorithms to optimize the value in
QoS adaptation. However, their optimization algorithms assume that the resource
requirement of every QoS level is a priori known. In contrast, our FCS framework
provides performance guarantees even when the resource requirements are unknown or
deviate from the estimations. Several other works [8][21][25][78] developed feedback
based adaptation algorithms that do not depend on completely accurate knowledge about
workloads. However, their feedback loops were based on heuristics and they did not
establish time domain analysis on the efficiency of QoS adaptation in response to run-
time variations. Our FCS framework provides a unified framework to design adaptive
real-time systems with proven transient state performance.
Li and Nahrstedt utilized control theory to develop a feedback control loop to
guarantee desired network packet rate in a distributed visual tracking system [46]. Hollot,
Misra, Towsley, and Gong In [36] apply control theory to analyze a congestion control
algorithm on IP routers. While these works also uses control theory analysis on
computing systems, they do not address timing constraints and service delays on end
server systems , which is the focus of this thesis.
Transient and steady state performance of QoS adaptation has received special
attention in recent years (e.g., [19][64][75]). For example, Brandt et. al. [19] evaluated a

30
dynamic QoS manager by measuring the transient performance of applications in
response to QoS adaptations. Rosu et. al. [64] proposed a set of performance metrics to
capture the transient responsiveness of adaptations and its impact on applications.
However, they did not provide a methodology to design a system from its performance
specifications in terms of above metrics. Instead they only used the metrics in system
testing. In contrast, by extending and mapping these metrics to the dynamic response of
control systems, our FCS framework provide a control-theory-based methodology to
design a system to analytically satisfy its performance specifications.
2.4. Service Delay Guarantee in Web Servers
Support for different classes of service on the Web (with special emphasis on server
delay differentiation) has been investigated in recent literature. For example, the authors
of [28] proposed and evaluated an architecture in which restrictions are imposed on the
amount of server resources (such as threads or processes), which are available to basic
clients. In [9][10] admission control and scheduling algorithms are used to provide
premium clients with better service. In [17] a server architecture is proposed that
maintains separate service queues for premium and basic clients, thus facilitating their
differential treatment. While the above differentiation approach usually offers better
service to premium clients, it does not provide any guarantees on the service and hence
can be called the best effort differentiation model.
Notably, a feedback control loop was used in [5][6][9] to control the desired CPU
utilization of a web server with adaptive admission control. Their CPU utilization control
can be extended to guarantee the desired absolute delay in web servers under HTTP 1.0
protocol and when CPU is the bottleneck resource. This technique is not applicable to

31
servers under HTTP 1.1 protocol, which can be handled by our adaptive server described
in Chapter 5. A least squares estimator was used in [1] for automatic profiling of resource
usage parameters of a web server. However, the work did not establishing a dynamic
model for the server.
Several other works such as [13][26] developed kernel level mechanism to achieve
overload protection and proportional resource allocations in server systems. Their work
did not utilize feedback control, nor did they provide any relative or absolute delay
guarantees. Supporting proportional differentiated services in network routers have been
investigated in [26][47]. Their work did not address end systems such as web servers.
2.5. Data Migration in Storage Systems
An old approach to performing backups and data relocations is to do them at night, while
the system is idle. As discussed, this does not help with many current applications such
as e-business that require continuous operation and adaptation to quickly changing
system/workload conditions. The approach of bringing the whole (or parts of the) system
offline is also impractical due to the substantial business costs that it incurs. Online
migration and backup are still in their infancy in the current state of the art. Some
existing tools such as the Veritas Volume Manager [75] can guarantee consistent access
to each piece of data while its being migrated. However, we are not aware of any
existing solution that handles concurrent accesses while bounding the impact of
migration on concurrent applications.

32

Chapter 3
Feedback Control Real-Time Scheduling
Framework
In this chapter, we describe feedback control real-time scheduling (FCS), a unified
framework of adaptive real-time systems based on feedback control theory. The FCS
framework includes the following elements:

A feedback control scheduling architecture that maps adaptive resource
scheduling in real-time systems [52] to feedback control loops,
A set of performance specifications and metrics [51] to characterize transient and
steady state performance of adaptive real-time systems, and
A control theory based design methodology [50][53] for resource scheduling
algorithms to satisfy their performance specifications.

A key feature of the FCS framework is its use of feedback control theory (rather than
ad hoc solutions) as a scientific underpinning. The FCS framework enables system
designers to systematically design adaptive real-time systems with established analytical

33
methods to achieve analytically provable performance guarantees in unpredictable
environments. To our best knowledge, this is the first unified framework that provides a
fundamental theory and analytical design methodology for adaptive real-time systems to
achieve specified performance guarantees in unpredictable environments. In this chapter,
we describe the elements of the general FCS framework at a high level. The specific
technical challenges and solutions are described with its concrete instantiations in three
different application domains: real-time CPU scheduling (Chapter 4), web servers
(Chapter 5), and networked storage systems (Chapter 6).
3.1. Feedback Control Scheduling Architecture
The major components of our FCS architecture are a set of control related variables and a
feedback control loop that maps a feedback control system structure to real-time resource
scheduling.
Actuator
Monitor
performance
reference
control
input
controlled
variable
manipulated
variable
Real-Time System
+
-
error
control
function
Controller
Scheduler
sample

Figure 3.1. The FCS Architecture
3.1.1. Control Related Variables
A first step in designing the FCS architecture is to decide the following key variables of a
real-time system in terms of control theory.

34

Controlled variable C(k): the performance metric that characterizes the system
performance defined over a sampling period ((k-1)W, kW), where W is a
application specific constant called the sampling window. The scheduler controls
the controlled variable in order to achieve the desired performance. The choice of
controlled variables depends on the performance guarantees that need to be
provided to the specific application of a system. For example, if an absolute delay
guarantee is required in an Internet server (e.g., critical stock trading operations in
an on-line trading system), the (absolute) service delays of HTTP requests should
be defined as the controlled variable. On the other hand, if proportional
differentiated service is required in an Internet server (e.g., e-commerce stores
where customers are classified into different service classes depending on their
monthly fees), the relative delays of service classes become the appropriate
controlled variables. For another example, the deadline miss ratio and the CPU
utilization are typical controlled variables for soft real-time systems (e.g.,
multimedia streaming, process control, and robotics) where explicit timing
constraints need to be respected.

Performance reference C
S
: the desired system performance in term of a controlled
variable C(k). The performance reference defines a contract established between
the adaptive resource scheduler and the users such that the performance reference
should be enforced. The difference between the performance reference and the
value of the corresponding controlled variable is called the error E
C
(k) = C
S

35
C(k). For example, if a system set its performance to a deadline miss ratio of C
S
=
2%, and the current miss ratio is 10%, the system has an error E
C
(k) = -8%.

Manipulated variable U(k): a system attribute that is dynamically changed by the
scheduler. The manipulated variable should be effective for performance control,
e.g., changing its value should affect the systems controlled variable(s). The
choice of manipulated variable should reflect the resource bottleneck of a system.
For example, although the total requested utilization should be used as a
manipulated variable if CPU is the bottleneck resource of a web server; it should
not be used as the manipulated variable if CPU is not the bottleneck resource
(e.g., in the case of HTTP 1.1 as described in Section 5.2).
3.1.2. Feedback Control Loop
The FCS architecture has a feedback control loop that is invoked at every sampling
instant k. Each feedback control loop is composed of a Monitor, a Controller, and an
Actuator.

1) The Monitor measures the controlled variables and feeds the samples back to the
Controller.

2) The Controller compares the performance references with corresponding controlled
variables to get the current errors, and calls control algorithms to compute a control
input, the new value of the manipulated variable based on the errors. The control
algorithm is a critical component with significant impacts on the system performance
and hence is the centerpiece of the design of an FCS algorithm. Note that control

36
theory enables us to derive the control algorithm and analytically prove that the
algorithm can provide the desired performance guarantees.

3) The Actuator changes the manipulated variable based on the newly computed control
input. The Actuator implements a mechanism that dynamically reallocates
(reschedules) the resource corresponding to the manipulated variable. For example,
corresponding to a manipulated variable of the total requested CPU utilization, we
design a QoS Actuator to dynamically adjust task QoS levels (different QoS levels
have different execution times and/or invocation periods).
3.2. Performance Specifications and Metrics
We now describe the second element of the FCS framework, the performance
specifications and metrics for adaptive real-time systems. While early research on real-
time computing was concerned with guaranteeing complete avoidance of undesirable
effects such as overload and deadline misses, adaptive real-time systems are designed to
handle such effects dynamically. Using a control theory framework, we characterize the
dynamic performance of an adaptive real-time system in both transient and steady state
upon load or resource changes. Transient behavior of an adaptive system represents the
responsiveness and efficiency of adaptation in reacting to changes in run-time conditions,
and steady-state behavior describes a system's long-term performance after its transient
response settles. In contrast, traditional metrics such as the average miss-ratio often fails
to capture the transient behavior of the system in response to load variations. Another
important advantage of our metrics is that they formulate the performance of real-time
systems as dynamic responses in control theory, and therefore enable the use of control

37
design methods to satisfy the specifications. Our performance specifications and metrics
consist of a set of performance profiles
1
in terms of the controlled variables. We also
present a set of representative load profiles adapted from control theory [32].
Corresponding to signals widely used in control theory, our load profiles can be used to
provide guidance for control design and generate canonical system response to variations
of run-time conditions.
3.2.1. Performance Profile
The performance profile characterizes important transient and steady state properties of a
system in terms of its controlled variables. Note that when the sampling window W is
small, a controlled variable C(k) approximates the instantaneous system performance at
the sampling instant k. In contrast, traditional metrics for real-time systems such as
average miss-ratio and average utilization are defined based on a much larger time
window than the sampling period W. The average metrics are often inadequate metric in
characterizing the dynamics of the system performance [50]. From the control theory
point of view, a real-time system transits from the steady state to the transient state when
a controlled variable deviates significantly from its steady state value in response to
variation in its run-time condition. After a time interval in the transient state, the system
may settle down to a new steady state after the feedback control loop converges the
controlled variable to the vicinity of a new value. The steady state is defined as a state
when the controlled variable C(k) stays within % of its performance reference C
S
. The
performance profile includes the following elements.

1
The performance profile has been called the miss-ratio profile in [50] when deadline miss ratio is used as
the controlled variable.

38

Stability: A system is Bounded-Input-Bounded-Out stable if its controlled
variables are always bounded for bounded performance references and
disturbances. Note that the performance of an unstable system can severely and
persistently diverge from the desired performance so as to cause system
malfunctioning and even complete system failure. Stability is a necessary
condition for achieving the desired performance reference. Stability is especially
an important requirement for FCS algorithms because a poorly designed
Controller can overreact to performance errors and push a real-time system to
unstable conditions.

Transient-state response represents the responsiveness and efficiency of adaptive
resource scheduling in reacting to changes in run-time conditions.
Settling time T
s
: The time it takes the system to settle down to a steady
state from the start of a transient state. The settling time represents how
fast the system can regain desired performance after a change in its run-
time condition.
Overshoot C
o
: The maximum amount that a controlled variable overshoots
its reference divided by its reference, i.e., C
o
= (C
M
C
S
) / C
S
where C
M
is
the maximum value of the controlled variable during its transient state.
Overshoot characterizes the worst-case transient performance degradation
of a system. A system may require a low overshoot because severe
transient performance degradation may lead to system failure. For

39
example, in media players, a high transient deadline miss-ratio can cause
buffer overflows [19].

Steady-state error E
SC
: The difference between the average value of a controlled
variable in steady state and its reference. The steady state error characterizes how
precise the system can enforce desired performance in steady state.

Sensitivity S
P
: Relative change of a controlled variable in steady state with respect
to the relative change of a system parameter P. For example, assuming the
controlled variable is deadline miss ratio, the systems sensitivity with respect to
the task execution time S
AE
represents how significantly the change in the task
execution time affects the system miss-ratio. Sensitivity describes the robustness
of the system with regard to workload or system variations.

The performance profile establishes a set of metrics of adaptive real-time systems based
on the specification of dynamic response in control theory. The metrics enables system
designers to apply established control theory techniques to achieve stability, and meet
transient and steady state specifications.
3.2.2. Load Profile
According to control theory, the performance profile of an adaptive system may be
specified assuming representative load profiles including step load and ramp load. The
step load represents the worst case of load variation that overloads the system
instantaneously, while the ramp load represents a nominal form of load variation. The

40
load profiles are defined as follows.

Step-load SL(L
n
, L
m
): a load profile that instantaneously jumps from a nominal
load L
n
to a higher load L
m
> L
n
and stays constant after the jump. Instantaneous
load change such as the step load is more difficult to handle than gradual load
change.

Ramp-load RL(L
n
, L
m
, T
R
): a load profile that increases linearly from the nominal
load L
n
to a higher load L
m
> L
n
during a time interval of T
R
sec. Compared with
the step load, the ramp signal represents a less severe load variation scenario.

One key advantage of using the above load profiles for performance specification is
that they are amenable to well-established design and analysis methods in control theory
and, therefore, fits well with our control theoretical framework. This means that a system
designer can use control theory method to analytically design the system to satisfy a
performance profile in response to a load profile as defined above. Specifically, a load
profile can be modeled as disturbance signals in the form of a step or ramp signal (see
Section 4.4). Based on control theory, a linear systems dynamic properties can be
determined by its dynamic response to a step signal or a ramp load regardless of its
parameters including the magnitude of load variation (L
m
-L
n
) and the ramp duration T
R
. If
a real-time system can be approximated with a linear model in its operation conditions,
its performance profile can be determined by stressing the system with a step load, i.e.,
the system can achieve satisfactory performance under any combinations of step and

41
ramp load if its performance profile in response to a step load or ramp load satisfies its
specifications.
Unfortunately, if a real-time system is non-linear in its operation conditions, the
dynamic response of a system in response of any load variations cannot be determined by
its response to a single step load or a single ramp load because the system performance
depends on the specific parameters of the load profiles. In this case, the performance
profiles in response to specific load profiles are only indications of the system
performance in general. In this case, the load profiles are application-specific based on a
set of expected load characteristics and system requirement.
We should also note that load profile is an abstraction of the workload, and there can
be many possible instantiations of the same load profile. The instantiation of a load
profile should incorporate the knowledge of the workload, and, therefore, the load profile
should be viewed as an enhancement to existing benchmarks (e.g., [37][40][41][42]
[75][77]). For example, the system load can be interpreted as the total requested CPU
utilization in the system where CPU is the bottleneck resource. For another example, the
load of an Internet server may be interpreted as the number of concurrent users.
Controller
Design
Requirement
Analysis
Modeling
System Model FCS algorithms
Performance
Specifications
Satisfy

Figure 3.2. Control Theory based Design Methodology for FCS Algorithms

42
3.3. Control Theory Based Design Methodology
The third element of our FCS framework is the control theory based design methodology
(see Figure 3.2). Based on the scheduling architecture and the performance specifications,
we now establish a design methodology based on feedback control theory. Using this
design methodology, a system designer can systematically design an adaptive resource
scheduler to satisfy the systems performance specifications with established analytical
methods. This methodology is in contrast with existing ad hoc approaches that depend on
laborious design/tuning/testing iterations. Our design methodology works as follows.

1) The system designer specifies the desired dynamic behavior with transient and
steady state performance metrics. This step maps the performance requirements of
an adaptive real-time system to the dynamic response specification of a control
system.

2) The system designer establishes a dynamic model of the real-time system for the
purposes of performance control. A dynamic model describes the mathematical
relationship between the control input and the controlled variables of a system
with differential/difference equations or state matrices. Modeling is important
because it provides a basis for the analytical design of the Controller. However,
modeling has been a major challenge for applying control theory to real-time
systems due to the lack of established differential/difference equations to describe
real-time systems. Two different approaches can be used to establish the dynamic
model of a real-time system. The analytical approach directly describes a system

43
with mathematical equations based on the knowledge of the system dynamics.
When such knowledge is not available, the system identification approach [11]
can be used to estimate the system model based on profiling experiments. In this
thesis work, we apply the analytical approach to model a generic CPU-bound
real-time system and a storage system, and developed a system identification tool
to model a web server whose dynamics is less clear. Our work represents a first
step in modeling real-time systems using rigorous mathematical equations. Our
modeling methodology and established analytical models provide a foundation for
the application of control theory to adaptive real-time systems in this thesis work
and future works in this area.

3) Based on the performance specs and system model from step 1) and 2), the
system designer applies established mathematical techniques (i.e., the Root Locus
method, frequency design, or state based design) of feedback control theory [32]
to design FCS algorithms that analytically guarantee the specified transient and
steady-state behavior at run-time. Compared with existing ad hoc approaches, our
analytical design approach significantly reduce the design time and required
efforts for adaptive systems because our approach requires much less design
/testing iterations. Furthermore, the resultant systems parameters can be easily
tuned with existing control theory methods and tools in practice and the resultant
system can be proved to satisfy its performance specifications. In contrast, the
tuning adaptive systems designed with ad hoc methods often depend on repeated
testing, guessing, or rule-of-thumb without performance guarantees at run-time.

44

In summary, we describe a unified FCS framework for adaptive real-time systems
that provides performance guarantees in unpredictable environments. Our FCS
framework includes 1) a software architecture for feedback performance control, 2) a
set of performance specifications and metrics that describes the efficiency, accuracy,
and robustness of performance guarantees, and 3) a control theory methodology for
designing FCS algorithms to satisfy the performance specifications. In the next three
chapters, we describe the details of three instantiations of the FCS framework in three
application domains.

45

Chapter 4
Real-Time CPU Scheduling
In this Chapter, we develop a set of novel real-time CPU scheduling algorithms called
FC-RTS [51][52][53][70] that guarantee low deadline miss ratio and high CPU utilization
when workload deviate from estimations at run-time. Our FC-RTS algorithms provide a
scheduling solution for a new category of soft real-time systems working in unpredictable
environments, whose performance cannot be guaranteed by many existing real-time
scheduling algorithms including RM [43], EDF [70], the Spring algorithm [79], and QoS
adaptation algorithms [4][61]. Such systems include open systems on the Internet such as
on-line trading servers, e-business servers, and on-line media streaming, and data driven
systems such as database applications. For example, in an on-line trading server, the
processing time for a service request often depends on the user input that is unknown to
the scheduler. For another example, in a surveillance system, the processing time of
objects tracking based on camera images can vary dramatically due to movement scope
of the object being tracked [23]. In addition, our FC-RTS algorithms can also provide
performance guarantees for off-the-shelf software applications, components, and device
drivers when accurate information on their execution time and invocation rates is

46
unavailable.
A motivation for applying FCS framework to real-time CPU scheduling is the
observation that many existing feedback based scheduling algorithms [8][21][25] are
based on heuristics rather than a theoretical foundation. These algorithms often depend
on laborious design/tuning/testing iterations, and may still fail to handle unexpected or
untested conditions at run-time. While the design methodology for automatic feedback
control systems has been developed in feedback control theory, the modeling, analysis
and implementation of real-time scheduling lead to significant research challenges to
real-time system research. In this thesis, we design our FC-RTS algorithms based on a
feedback control theory by instantiating the FCS framework in real-time CPU scheduling.
Specially, our major contributions include the following:
A novel and general feedback control real-time CPU scheduling architecture that
allows plug-ins of different real-time scheduling policies and QoS optimization
algorithms and a set of tuning rules based on the scheduling policies,
An analytical model of CPU-bound real-time system, which to our best
knowledge is the first dynamic model for generic real-time CPU scheduling,
A set of analysis results and tuning methods for FC-RTS algorithms to achieve
performance specifications including stability, settling time, overshoot, steady
state performance, and sensitivity with regard to workload variations,
Practical FC-RTS algorithms applicable to different types of real-time
applications,
Performance evaluation results demonstrating that our analytically designed FC-
RTS algorithms can provide robust performance guarantees in terms of deadline

47
miss ratio and CPU utilization, and achieve satisfactory performance profiles in
response to overloads caused by new task arrivals and task execution time
variations.
The feedback control real-time scheduling architecture is described in Section 4.1.
We describe the performance specifications and metrics in Section 4.2. We establish an
analytical model for a real-time system in Section 4.3. Based on the model, we present
the design and control analysis of a set of FC-RTS algorithms in Section 4.4. We present
the performance evaluation results of these scheduling algorithms in Section 4.5. We then
qualitatively compare FC-RTS algorithms with several existing scheduling paradigms in
Section 4.6. Finally, we summarize this chapter in Section 4.7.
CPU
Task Arrivals
Completed/Aborted
Tasks
QoS Actuator
Scheduler
Current Tasks
Performance
References
Control Input
Adjust
QoS
Sched
Controller
Controlled
Variables
Monitor
Basic
Scheduler

Figure 4.1. Feedback Control Real-Time Scheduling Architecture
4.1. Feedback Control Real-Time Scheduling Architecture
Our feedback control real-time CPU scheduling (FC-RTS) architecture (illustrated in
Figure 4.1) is composed of four parts: a task model, a set of control related variables, a
feedback control loop that maps a feedback control system structure to real-time CPU

48
scheduling, and a Basic Scheduler.
4.1.1. Task Model
In our task model, each task T
i
has N QoS levels (N 2). Each QoS level j (0 j N-1)
of T
i
is characterized by the following attributes:
D
i
[j]: the relative deadline
EE
i
[j]: the estimated execution time
AE
i
[j]: the (actual) execution time that can vary considerably from instance to
instance and is unknown to the scheduler
V
i
[j]: the value that task T
i
contributes if it is completed at QoS level j before its
deadline D
i
[j]. The lowest QoS level 0 represents the rejection of the task
and V
i
[0] = 0. Every QoS level contributes a miss penalty MP
i
< 0 if it
misses its deadline.
Periodic tasks:
P
i
[j]: the invocation period
B
i
[j]: the estimated CPU utilization B
i
[j] = EE
i
[j] / P
i
[j]
A
i
[j]: the (actual) CPU utilization A
i
[j] = AE
i
[j] / P
i
[j]
Aperiodic tasks:
EI
i
[j]: the estimated inter-arrival-time between subsequent invocations
AI
i
[j]: the average inter-arrival-time that is unknown to the scheduler
B
i
[j]: the estimated CPU utilization B
i
[j] = EE
i
[j] / EI
i
[j]
A
i
[j]: the (actual) CPU utilization A
i
[j] = AE
i
[j] / AI
i
[j]
In this model, a higher QoS level of a task has a higher (both estimated and actual)
CPU utilization and contributes a higher value if it meets its deadline, i.e., B
i
[j+1] > B
i
[j],

49
A
i
[j+1] > A
i
[j], and V
i
[j+1] > V
i
[j]. In the simplest form, each task only has two QoS
levels (corresponding to the admission and the rejection of the task, respectively). In
many applications including web services [5], multimedia [19], embedded digital control
systems [23], and systems that support imprecise computation [48] or flexible security
[68], each task has more than two QoS levels and the scheduler can trade-off the CPU
utilization of a task with the value it contributes to the system at a finer granularity. The
QoS levels may differ in term of execution time and/or period/inter-arrival-time. For
example, a web server may dynamically change the execution time of an HTTP session
by changing the complexity of the requested web page [5]. For another example, several
papers have shown that the deadlines and periods of tasks in embedded digital control
systems and multimedia players can be adjusted on-line [19][23][66] within certain
ranges. A key feature of our task model is that it characterizes systems in unpredictable
environments where tasks actual CPU utilization is time varying and unknown to the
scheduler. Such systems are amenable to the use of feedback control loops to
dynamically correct the scheduling errors to adapt to load variations at run-time.
4.1.2. Control Related Variables
An important step in designing the FC-RTS architecture is to decide the following
variables of a real-time system in terms of control theory.

Controlled variables are the performance metrics controlled by the scheduler in
order to achieve desired system performance. Controlled variables of a real-time
system may include the deadline miss ratio M(k) and the CPU utilization U(k)
(also called miss ratio and utilization, respectively), both defined over a time

50
window ((k-1)W, kW), where W is the sampling period and k is called the
sampling instant.
The miss ratio M(k) at the k
th
sampling instant is defined as the number
of deadline misses divided by the total number of completed and
aborted task instances in a sampling window ((k-1)W, kW). Miss ratio
is usually the most important performance metric in a real-time
system.
The utilization U(k) at the k
th
sampling instant is the percentage of
CPU busy time in a sampling window ((k-1)W, kW). CPU utilization is
regarded as a controlled variable for real-time systems due to cost and
throughput considerations. CPU utilization is important also because
the its direct linkage with the deadline miss ratio (see Section 4.3).
Another controlled variable might be the total value V(k) delivered by
the system in the k
th
sampling period. In the remainder of this paper,
we do not directly use the total value as a controlled variable, but
rather address the value imparted by tasks via the QoS Actuator (see
and Section 4.5.1)

Performance references represent the desired system performance in terms of the
controlled variables, i.e., the desired miss ratio M
S
and/or the desired CPU
utilization U
S
. For example, a particular system may require deadline miss ratio
M
S
= 0 and CPU utilization U
S
= 90%. The difference between a performance
reference and the current value of the corresponding controlled variable is called

51
an error, i.e., the miss ratio error E
M
= M
S
M(k) and the utilization error E
U
= U
S

U(k).

Manipulated variables are system attributes that can be dynamically changed by
the scheduler to affect the values of the controlled variables. In our architecture,
the manipulated variable is the total estimated utilization B(k) =
i
B
i
[l
i
(k)] of all
tasks in the system, where T
i
is a task with a QoS level of l
i
(k) in the k
th
sampling
window. The rational for choosing the total estimated utilization as a manipulated
variable is that most real-time scheduling policies (such as EDF and
Rate/Deadline Monotonic) can guarantee no deadline misses when the system is
not overloaded, and in normal situations, the miss ratio increases as the system
load increases. The other controlled variable, the utilization U(k), also usually
increases as the total estimated utilization increases. However, the utilization is
often different from the total estimated utilization B(k). This is due to the
estimation error of execution times when workload is unpredictable and time
varying. Another difference between U(k) and B(k) is that U(k) can never exceed
100% while B(k) does not have this boundary.
4.1.3. Feedback Control Loop
The FC-RTS architecture features a feedback control loop that is invoked at every
sampling instant k. It is composed of a Monitor, a Controller, and a QoS Actuator (Figure
4.1).

1) The Monitor measures the controlled variables (M(k) and/or U(k)) and feeds the
samples back to the Controller.

52

2) The Controller compares the performance references with corresponding controlled
variables to get the current errors, and computes a change D
B
(k) (called the control
input) to the total estimated requested utilization, i.e., B(k+1) = B(k) + D
B
(k), based
on the errors. The Controller uses a control function to compute the correct control
input to compensate for the load variations and keep the controlled variables close to
the references. The detailed design of the Controller is presented in Section 4.4.

3) The QoS Actuator calls a QoS optimization algorithm (see Section 4.5.1) to maximize
the system value by dynamically adjusting tasks QoS levels under the utilization
constraint computed by the Controller, B(k+1) = B(k) + D
B
(k). In the simplest form,
each task only has only two QoS levels and the QoS Actuator is essentially an
admission controller.
In addition to the above feedback control loop, our FC-RTS architecture also includes
arriving-time QoS control, i.e., in addition to being called periodically by the Controller,
the QoS Actuator is also invoked upon the arrival of each task. The arriving-time QoS
control isolates disturbances caused by new task arrivals (see Section 4.3). Feedback
control scheduling in systems without arriving-time QoS control was previously studied
in [50].
4.1.4. Basic Scheduler
The FC-RTS architecture has a Basic Scheduler that schedules admitted tasks with a
scheduling policy (e.g., EDF or Rate/Deadline Monotonic). The properties of the
scheduling policy can have significant impact on the design of the feedback control loop.
Our FC-RTS architecture permits plugging in different real-time scheduling policies for

53
this Basic Scheduler and then designing the entire feedback control scheduling system
around this choice (see Section 4.4.4).
A key difference between our work and many previous works is that while previous
work often assumes the CPU utilization of each task is known a priori, we focus on
systems in unpredictable environments where tasks actual CPU utilizations are unknown
and time varying. This more challenging problem necessitates the feedback control loop
to dynamically correct the scheduling errors at run-time. Our FC-RTS architecture
establishes a mapping from real-time scheduling to a typical structure of feedback control
systems. This step enables us to treat a real-time system as a feedback control system and
utilize feedback control theory to design the system rather than developing ad hoc
algorithms.
4.2. Performance Specifications and Metrics
We now specialize the second element of the FCS framework, the performance
specifications, to real-time CPU scheduling. The performance specifications consist of a
set of performance profiles in terms of utilization U(k) and miss ratio M(k), and a set of
load profiles in term of the total requested CPU utilization of a system.
4.2.1. Performance Profile
The performance profile characterizes important transient and steady state performance
of a real-time system. M(k) and U(k) characterize the system performance in the sampling
window ((k-1)W, kW). In contrast, traditional metrics for real-time systems such as
average miss-ratio and average utilization are defined based on a much larger time
window than the sampling period W. The average metrics are often inadequate metric in

54
characterizing the dynamics of the system performance in response to overload
conditions [50]. The performance profile of a real-time system includes the follows.
Stability: A real-time system is stable if its miss ratio M(k) and utilization U(k) are
always bounded for bounded references. Although both miss ratio M(k) and
utilization U(k) are naturally bounded in the range [0, 1], stability is a necessary
condition to prevent the controlled variables from severe deviations from the
reference values.
Transient-state response represents the real-time systems responsiveness and
efficiency of QoS adaptation in reacting to changes in run-time conditions.
Overshoot M
o
and U
o
: For a real-time system, we define overshoot as the
maximum amount that the system overshoots its miss ratio or utilization
reference divided by its miss ratio or utilization reference, i.e., M
o
= (M
max

M
S
) / M
S
, U
o
= (U
max
U
S
) / U
S
, respectively. The maximum miss ratio
M
o
and utilization U
o
in the transient state is called the absolute overshoot.
Overshoot is important to a real-time system because a high transient
miss-ratio or utilization can cause system failure in many systems such as
robots and media streaming [19].
Settling time T
s
: The time it takes the system to enter a steady state in
response to a load profile. The settling time represents how fast the system
can settle down to steady state with desired miss ratio and/or utilization.
Steady-state error E
SM
and E
SU
: The difference between the average values of
miss ratio M(k) and/or utilization U(k) in steady state and its corresponding
reference. The steady state error characterizes how precise the system can enforce

55
the desired miss ratio and/or utilization in steady state.
Sensitivity S
p
: Relative change of a controlled variable in steady state with respect
to the relative change of a system parameter p. For example, sensitivity of miss
ratio with respect to the task execution time S
AE
represents how significantly the
change in the task execution time affects the system miss-ratio. Sensitivity
describes the robustness of the system with regard to workload or system
variations.
4.2.2. Load Profile
For a CPU bound real-time system, the load profile L(k) of a system at sampling instant k
is defined in term of the total CPU utilization of all the tasks arriving at the system.
Specifically, two forms of overload can occur in a real-time system.
Arrival Overload: For this type of overload, the load variation L = (L
n
- L
m
) in a
step load SL(L
n
, L
m
) or a ramp load RL(L
n
, L
m
, T
R
) is caused by the new arrival of
a set of tasks {j} with a total CPU utilization of L = (L
n
- L
m
) at a system with an
initial admitted task set {p} with a total CPU utilization of A(0) = L
m
. The load
variation is defined as the total CPU utilization of all new tasks assuming every
new task is at the highest QoS level N-1), L = (L
n
- L
m
) =
{j}
A
j
[N-1], while the
initial load is defined as the total actual CPU utilization at time 0, i.e., A(0) = L
m
=
{p}
A
p
[l
p
(0)] where l
p
(0) denotes the initial QoS level of task p. For example, a
step load SL(0, 150%) represents a sudden arrival of a task set with a total
utilization of 150% at an initially idle system.

Internal Overload: For this type of overload, the load variation L = (L
n
- L
m
) in a

56
step load SL(L
n
, L
m
) or a ramp load RL(L
n
, L
m
, T
R
) is caused by increases in CPU
utilizations (i.e., execution times and/or inter-arrival times/periods) of tasks
already (admitted) in the system such that the total CPU utilization increases from
L
n
to L
m
. For example, a step load SL(100%, 150%) represents a scenario where
the execution time of every task in an initially 100% utilized system suddenly
increases by 50% at the same time instant (e.g., due to extra processing overhead
caused by retransmission of packet due to TCP congestion control).
4.3. Modeling the Controlled Real-Time System
Given the FC-RTS architecture described in Section 4.1 and the performance
specifications described in 4.2, we apply the feedback control theory based methodology
described in Section 3.3 to design the FC-RTS algorithms. The first step of this
methodology is to establish an analytical model to approximate the controlled system in
the FC-RTS architecture (Figure 4.1).
The controlled system includes the QoS Actuator, the scheduled real-time system, the
Basic Scheduler, and the Monitor. The input to the controlled system is the control input,
i.e., the change in the total estimated utilization D
B
(k). The output of the controlled
system includes the controlled variables, miss ratio M(k) and utilization U(k). Although it
is difficult to precisely model a nonlinear and time varying system such as a real-time
system, we can approximate such a system with a linear model for the purpose of control
design because of the robustness of feedback control with regard to system variations.
The block diagram of the controlled system model is illustrated in Figure 4.2. We now
derive the model from the control input, D
B
(z), through each block in Figure 4.2. The
goal is to derive the transfer function from the control input to the output, the controlled

57
variables U(z) and M(z). While the block diagram is expressed in the z-domain that is
amenable to control design, we describe the equivalent notations and formula in time
domain in the following for clarity of presentation. For example, D
B
(k) in the time
domain is equivalent to D
B
(z) in the z-domain.
D
B
(z) B(z)
M(z)
A(z)
Controlled
System
M
A A
th
1
U
A
1
1
G
A
U(z)
Controller 1/(z-1)

Figure 4.2. The Model of the Controlled System
Starting from the control input D
B
(k), the total estimated utilization B(k) is the
integration of the control input D
B
(k):
B(k+1) = B(k) + D
B
(k) Equation 4.1
Since the precise execution time of each task is unknown and time varying, the total
(actual) requested utilization A(k) may differ from the total estimated requested
utilization B(k):
A(k) = G
a
(k)B(k) Equation 4.2
where G
a
(k), called the utilization ratio, is a time-variant ratio that represents the extent
of workload variation in terms of total requested utilization. For example, G
a
(k) = 2
means that the actual total requested utilization is twice of the estimated total utilization.
Since G
a
(k) is time variant, we should use the maximum possible value G
A
=
max{G
a
(k)}, called the worst-case utilization ratio, in control design to guarantee
stability in all cases. Hence Equation 4.2 can be simplified to the following formula for
the purpose of control design:
A(k) = G
A
B(k)
Equation 4.3
The relationship between the total requested utilization A(k) and the controlled
variables are nonlinear due to saturation, i.e., the controlled variables remain constant
when the control input D
B
(k) 0. Saturation complicates the control design because the
controlled variables become unresponsive to the control in their saturation zones. When
the CPU is underutilized, the utilization U(k) is outside its saturation zone and stays close
to the total requested utilization A(k):
U(k) = A(k) (A(k) 1) Equation 4.4
However, Since utilization can never exceed 100%, U(k) saturates when the CPU is
overloaded:
U(k) = 1 (A(k) > 1) Equation 4.5
In contrast, the miss ratio M(k) saturates at 0 when the CPU is underutilized, i.e., the
total requested utilization is below a threshold A
th
(k), called the schedulable utilization
threshold, or utilization threshold for simplicity.
M(k) = 0 (A(k) A
th
(k)) Equation 4.6
In existing real-time scheduling theory, schedulable utilization bounds have been
derived for various real-time scheduling policies under different workload assumptions
[7][43][48][72]. A utilization bound A
b
is typically defined as a fixed lower bound for all
possible workloads under certain assumptions, while we define the utilization threshold
A
th
(k) as the time varying actual threshold for the systems particular workload in the k
th

sampling period (and hence A
b
A
th
(k)). Since it is always true that A
th
(k) 1, the
saturation zones of CPU utilization (A(k) 1) and that of miss ratio (A(k) A
th
(k)) are
guaranteed to be mutually exclusive. This property means that at any instant of time, at

59
least one of the controlled variables does not saturate. Note that different scheduling
policies in the Basic Scheduler usually lead to different utilization threshold A
th
(k). For
example, if EDF is plugged into the FC-RTS architecture and the workload is composed
of independent and periodic tasks, the utilization threshold A
th
= 100%. In comparison,
the utilization threshold is usually lower than 100% if RM is plugged into the
architecture. Therefore, the scheduling policy and the workload characteristics affect the
choices on the controlled variable and its reference (see Section 4.4.4).
When A(k) > A
th
(k), M(k) usually increases nonlinearly with the total requested
utilization A(k) (as demonstrated in Section 4.5.5). The relationship between M(k) and
A(k) needs to be linearized by taking the derivative at the vicinity of the operation point
(A(k) = A
th
(k)).
) (
) (
k dA
k dM
G
a
=
Equation 4.7
In practice, the miss ratio factor G
m
can be estimated experimentally by plotting a
M(k) curve as a function of A(k) based on experimental data and measuring its slope at
the vicinity of the point where M(k) starts to become nonzero (see Section 4.5.5). At the
vicinity of A(k) = A
th
(k), we have the following linearized formula:
M(k) = M(k-1) + G
m
(A(k) A(k-1)) (A(k) > A
th
(k)) Equation 4.8
Since G
m
is usually different at different load levels, we use the worst-case miss ratio
factor G
M
, defined as the maximum value of measured G
m
in the likely load range in
Controller tuning to guarantee stability. Note that different scheduling policies in the
Basic Scheduler usually have a different miss ratio factor, and hence the choice of the
scheduling policy has a direct impact on the Controller parameters (see Section 4.4.4).
From Equations (4.1)-(4.8), we can derive a transfer function for each controlled

60
variable when it is outside its saturation zone:
Utilization control: Under the condition that A(k) < 1, there exists a transfer function
H
U
(z) from the control input D
B
(z) to CPU utilization U(z) = P
U
(z)D
B
(z) and
P
U
(z) = G
A
/ (z-1) (A(k) < 1) Equation 4.9
Miss ratio control: Under the condition that A(k) > A
th
(k), there exists a transfer
function H
M
(z) from the control input D
B
(z) to Miss Ratio M(z) = P
M
(z)D
B
(z) and
P
M
(z) = G
A
G
M
/ (z-1) (A(k) > A
th
(k)) Equation 4.10
Since the model for miss-ratio is the same as the utilization except for the extra miss-
ratio factor G
M
in Equation 4.10, for simplicity of discussion we use a same formula P(z)
to represent the transfer functions of both controlled variables:
P (z) = G / (z-1) Equation 4.11
where G is called the process gain. G = G
A
for utilization control and G = G
A
G
M
for miss
ratio control.
In summary, the controlled system is approximated with a first order transfer function
(Equation 4.11) with a different saturation zone (Equation 4.5 and Equation 4.6) for each
controlled variable, utilization U(k) and miss ratio M(k), respectively. Note that the
saturation properties cause the controlled system to be non-linear and lead to special
challenges for the Controller design.
4.4. Design of FC-RTS Algorithms
In this section, we apply control design methods and analysis to the Controller, the key
component of feedback control scheduling algorithms. We first present the control
algorithm and the model of the feedback control loop for each controlled variable. Based
on the analytical system models, we apply control theory to tune the Controller and

61
develop a mathematical analysis on the performance profiles of the resultant Controller.
We then design several FC-RTS algorithms to handle the saturation zones in different
types of real-time systems.
4.4.1. Design of the Controller
At each sampling instant k, the Controller computes a control input D
B
(k), the change in
the total estimated requested utilization, based on the miss ratio error E
M
(k) = M
S
- M(k)
and/or the CPU utilization error E
U
(k) = U
S
- U(k). In this section, we focus on a
Controller for a single controlled variable. The goal of a Controller includes (1)
guaranteed stability, (2) zero steady state error, (3) zero sensitivity to workload
variations, and (4) satisfactory settling time and overshoots. Since the same control
function can be used for both controlled variables, we use the same symbol E(k) to
represent the miss ratio error E
M
(k) and the utilization error E
U
(k) in the rest of this
section. Similarly we use S to denote the miss ratio reference M
S
and utilization reference
U
S
, and the symbol O to denote the miss ratio reference M(k) and utilization reference
U(k).
For the FC-RTS architecture illustrated in Figure 4.1, we choose to use a simple P
(Proportional) control function [32] to compute the control input. The P control function
is in Equation 4.12(a) in the time domain and Equation 4.12(b) in the z domain where K
P

is a tunable parameter.
) ( ) (
) ( ) ( ) (
b K z C
a k E K k D
P
P B
=
=

Equation 4.12
The rationale for using a P Controller instead of a more sophisticated Controller such
as PID (Proportional-Integral-Derivative) Controller is that the controlled system

62
includes an integrator in the QoS Actuator (see Equation 4.1) such that zero steady state
error can be achieved without an I (Integral) term in the Controller (see detailed analysis
in Section 4.4.2). The D (Derivative) term is not appropriate for controlling real-time
systems because Derivative control may amplify the noise in miss ratio and utilization
due to random workloads.
The performance of the real-time system depends on the Controller parameter K
P
. An
ad hoc approach to design the Controller is to repeat numerous experiments on different
parameter values. In our work, we apply established control theory methods to tune the
parameters analytically to guarantee the performance specifications. We first tune the
Controller for each of the controlled variables in Section 4.4.2 based on the linear models
of the controlled system (Equation 4.11). Due to the saturation properties, the
performance of the closed loop system may deviate from the linear case. We address this
issue in Section 4.4.4.
D
B
(z)
M(z)
A(z)
K
P
G
A
/ (z-1)
L(z)
+
G
M
+
M
S
z/(z-1)
- +
D
B
(z)
U(z)
A(z)
K
P
G
A
/ (z-1)
L(z)
+ +
U
S
z/(z-1)
- +
(a) Miss Ratio Control
(b) Utilization Control

Figure 4.3. Closed-Loop System Model for Real-Time CPU Scheduling
4.4.2. Closed-Loop System Model
For the purpose of control design, the system output is the controlled variable O(k) (miss

63
ratio M(k) or utilization U(k)).There are two input signals to a closed loop system with a
single (miss ratio or utilization) Controller.
Reference Input and Arrival Overload
The first input is the performance reference S (i.e., M
S
or U
S
) modeled as a step signal,
Sz/(z-1) in the z domain. Note that with the arrival-time QoS control mechanism in our
FC-RTS architecture, the particular form of load profiles does not affect the systems
response because the actual tasks admitted into the system are always determined by the
QoS Actuator. Therefore, the system response to the reference input corresponds to the
system performance in response to arrival overload. Given the model of the controlled
system P(z) (Equation 4.11) and the Controller C(z) (Equation 4.12), we can establish a
same closed-loop transfer function of both miss ratio and utilization control in response
to the reference input (see the block diagrams in Figure 4.3):
) ( ) (
1
) (
) (
) 1 ( ) ( ) ( 1
) ( ) (
) (
b z SH
z
z
z O
a
G K z
G K
z P z C
z P z C
z H
S
P
P
S
=

=
+
=

Equation 4.13
where G = G
A
in utilization control, and G = G
A
G
M
in miss ratio control.
Disturbance Input: Internal Overload
The second input to the closed-loop system is the internal overload when admitted tasks
CPU utilizations vary. The internal overload can be modeled as a disturbance that adds to
the total requested utilization A(k) (see Figure 4.3(a)). In particular, a step load SL(L
n
, L
m
)
is modeled as a step signal L(k) that jumps instantaneously from 0 to (L
m
L
n
), or L(z) =
(L
m
L
n
)z/(z-1) in the z domain; a ramp load RL(L
n
, L
m
, T
R
) is modeled as a ramp signal
L(k) that linearly increases from 0 to (L
m
L
n
) in a duration of T
R
sec. Note that in the

64
case of internal overload input, the specific load profile decides the input signal and
therefore has a direct impact on the system performance. In this thesis, we focus our
analysis on the step load profile because it represents more severe load variations than the
ramp load with a finite duration. Regarding the disturbance input, the transfer function
for utilization control and the system output in response of the internal overload is as
follows.
) ( ) ( ) ( ) (
1
) (
) (
) 1 (
1
) ( ) ( 1
) ( ) (
) (
b z H z L z SH
z
z
z O
a
G K z
z
z P z C
z P z C
z H
D S
A P
D
+
=
+
=

Equation 4.14

The above transfer function is also applicable to miss ratio control except for the
disturbance input should be transformed to G
M
L(k) or G
M
L(z) in the z domain to take into
account the extra G
M
term in Figure 4.3(a).
4.4.3. Control Tuning and Analysis
We now present the tuning and analysis of the utilization Controller and the miss ratio
Controller based on the analytical models described in Equation 4.13(a) and Equation
4.14(a). According to control theory, the performance profile of a system depends on the
poles of its closed loop transfer function. Based on Equation 4.13(a) and Equation
4.14(a), we can place the closed loop pole p = 1-K
P
G at the desired location by choosing
the right value for the control parameter K
P
. We now present the details of using control
theory to derive K
P
to achieve desired performance profile.
Stability Condition: The sufficient and necessary condition for the utilization and
the miss ratio control to guarantee stability is:
0 < K
P
< 2/G Equation 4.15
Proof: According to control theory, a system can guarantee stability if and only if all the
poles {p
j
| 0 j n} (n is the total number of poles) of its transfer function are in the unit
circle of z-plane [33]:
|p
j
| < 1 (0 j n)
From Equation 4.13(a) and Equation 4.14(a), the only pole of the utilization and the miss
ratio control system in response to the arrival overload and the internal overload is
p
0
= 1 - K
P
G Equation 4.16
Hence, the utilization control and the miss ratio control guarantee stability if and only if
|1 - K
P
G| < 1
Therefore, the sufficient and necessary condition of stability is Equation 4.15.

We derive the steady state performance of the utilization and the miss ratio control
system by applying the Final Value Theorem to the system output in Equation 4.13(b)
and Equation 4.14(b). The following steady state analysis assumes that the stability
condition in Equation 4.15 is satisfied.

Steady state error (arrival overload): In response to an arrival overload, the miss
ratio and the utilization control guarantee zero steady state error, i.e., E
SC
= 0
Proof: Let O(z) be the output a stable system, the Final
Value Theorem [33] of digital control theory shows that
the system output converges to a final value
) ( ) 1 ( ) (
lim
1
z O z O
z
=

From Equation 4.13(b), the output of the utilization
and the miss ratio
control in response to an
arrival overload is
) 1 ( 1
) (
G K z
G K
z
Sz
z O
P
P

=

where S represents the

66
performance reference.
Applying the Final Value Theorem to the above
equation, the final value of the utilization and miss ratio
control is
S
G K z
G K
z
Sz
z z O z O
P
P
z z
=

= =

) 1 ( 1
) 1 ( ) ( ) 1 ( ) (
lim lim
1 1
Equation 4.16
It follows that the steady state error E
SC
= S - O() =
0.

Steady state error (internal overload): In
response to an internal overload, the miss
ratio and the utilization control achieve zero
steady state error, i.e., E
SC
= 0.
Proof: From Equation 4.14(b), the system output of the
utilization and miss ratio control in response to an
internal overload SL(L
m
, L
n
) is
) 1 (
1
1 ) 1 ( 1
) (
G K z
z
z
Lz
G K z
G K
z
Sz
z O
P P
P

+

=
where L = L
m
L
n
for the utilization control, and L =
G
M
(L
m
L
n
) for the miss ratio control.
Applying the Final Value Theorem to the above
equation, the final value of the utilization control and the
miss ratio control is
z z
Sz
z z O z O
z z

= =

( 1
)( 1 (( ) ( ) 1 ( ) (
lim lim
1 1
It follows that the
steady state error E
SC
=
S - O() = 0.

Sensitivity:
assuming stability,
the steady-state
performance of the
utilization control
and the miss ratio
control has zero
sensitivity with
regard to task
execution times,
inter-arrival-times,
and miss ratio
factor.
Proof: According to the
definition in Equation
4.11, G = G
a
(k) for the
utilization control, and

67
G = G
a
(k)G
m
(k) for the miss ratio control. The variation
in G
a
(k) represents the variation in the task execution
times and/or inter-arrival-times, and the variation in
G
m
(k) represents the variation in the miss ratio factor.
From Equations 4.17 and 4.19, the final output of the
utilization and miss ratio control system in response to
the arrival overload and the internal overload always
equals the performance reference S for any value of G if
it satisfies the stability condition (Equation 4.15). It
follows that S
G
= 0.

In summary of our steady state analysis, we have proven
that, under the stability condition in Equation 4.15, the
utilization control and the miss ratio control always
achieve the performance reference in steady state in
response to arrival and internal overload. Furthermore,
we have also shown that this guarantee is robust with
regard to task execution times, inter-arrival-times, and
the miss ratio factor.
Transient State Performance
According to control theory, for the system transfer
function Equation 4.13(a), the overshoot remains zero in
response to arrival
overload if the closed
loop pole p
0
0. From
Equation 4.16, the only
pole p
0
= 1 - K
P
G.
Hence the utilization
control and the miss
ratio control achieves
zero overshoot if and
only if
0 < K
P
1/G
The settling time
increases as the
Controller parameter
increases in the above
range.
We place the pole p
0

= 0.63 by settling K
P
=
0.37/G, or:
Miss Ratio Control: K
P
= 0.37/
Utilization Control: K
P
= 0.37/
Equation 4.18
The above values for the Controller parameter K
P
has the following properties based

68
on control analysis.
1) The parameters in Equation 4.18 satisfy the stability condition in Equation 4.15.
2) Since the control parameter value in Equation 4.18 also satisfy the zero overshoot
condition, the overshoot in response to the reference input is:
Miss Ratio Control: M
O
= 0 M
max
= M
S
(a)
Utilization Control: U
O
= 0 U
max
= U
S
(b)
Equation 4.19
3) However, the Controller cannot affect the overshoot in response to the disturbance
input, which directly changes the output before any control action could take place:
Miss Ratio Control: M
O
= G
M
(L
m
L
n
)/M
S

M
max
= M
S
+ G
M
(L
m
L
n
) (a) (a)
Utilization Control: M
O
= (L
m
L
n
)/U
S

Equation 4.20
U
max
= U
S
+ L
m
L
n
(b)

4) Regarding the system as in the steady state if its output O(k) is within % = 2% of its
final value, the above pole placement corresponds to a same settling time in response
of the reference input and the disturbance input.
Miss Ratio/Utilization Control: T
s
= 4.5 sec Equation 4.21
However, the above settling time is not applicable to the miss ratio control in
response to arrival overload because the miss ratio M(k) saturates at 0. Assume an arrival
overload occurs to an idle system at time 0, the miss ratio control observes M(0) = 0,
which results in a control signal of K
P
(M
S
M(0)) = K
P
M
S
. Since M
S
is typically small,
the control signal is also small. Due to the saturation problem, the miss ratio will stay at 0
and cause the control signal to remain small. This property can cause the utilization and
miss ratio to increase slower than the in case of the linear model and result in a longer
settling time than Equation 4.21. One solution is to assign a high initial value to the
estimated requested utilization B(k) when the system is idle, which will help to push the

69
system out of the saturation zone faster than a zero initial B(k). This solution is adopted in
our Evaluation Experiment B in Section 4.5.9.
Based on the above analysis, we have the following conclusions on the transient
performance of the closed-loop system.
Arrival Overload
From Equation 4.21, in response to an arrival overload the output settles to within 2% the
performance reference in 4.5 sec. Furthermore, Equation 4.20(a) ensures that with miss
ratio control, the system miss ratio never exceeds the miss ratio reference in response to
an arrival overshoot. Similarly, Equation 4.20(b) ensures that with utilization control, the
CPU utilization never exceeds the utilization reference in response to an arrival
overshoot. We use MATLAB to plot the closed-loop systems step-response to a unit
reference input in Figure 4.4.
Step Response
Time (sec)
A
m
p
l
i
t
u
d
e
0 1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
System: Closed Loop: r to y
Settling Time: 4.5

Figure 4.4. System Response to Reference Input

70
Step Response
Time (sec)
A
m
p
l
i
t
u
d
e
0 1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
System: Gd
Settling Time: 4.5

Figure 4.5. System Response to Disturbance Input
Internal Overload
From Equation 4.21, the system output can recover to within 2% the performance
reference in 4.5 sec after the beginning of an internal step-overload. However, (a) and (b)
show that the system suffers from a non-zero overshoot during transient state in response
to an internal step-overload. With miss ratio control, the system miss ratio M(k) can
overshoot the reference M
S
by G
M
(L
m
-L
n
). With utilization control, the CPU utilization
can overshoot the reference U
S
by G
M
(L
m
-L
n
). We use MATLAB to plot the closed-loop
systems step-response to a unit disturbance input (the reference input was set to zero) in
.
Impact of System/Workload Variations on Performance Profiles
Because a real-time system is usually a time-varying system (as discussed in Section 4.3),
an important issue is how the variations in the system and workload affect the above

71
analysis result based on fixed values of system parameters. Specifically, since G
a
(k) and
G
m
(k) may be different from the worst-case utilization ratio G
A
and miss-ratio factor G
M
.,
we need to analyze how the changes in miss ratio factor G
m
(k) and utilization ratio G
a
(k)
affect the performance profile of the closed-loop system in the following.
Stability
Based the stability condition in and the Controller parameter in Equation 4.18(a) and
Equation 4.18(b), we can derive the range of G
m
(k) and G
a
(k) such that the system
stability is guaranteed.
Miss Ratio Control: 0 < G
a
(k)G
m
(k) < 5.4G
A
G
M

Utilization Control: 0 < G
a
(k) < 5.4G
A
Equation 4.22
Note that since we usually compute the Controller parameter K
P
based on the worst
case estimation such that, G
A
> G
a
(k) > 0 and G
A
> G
a
(k) > 0, our closed-loop system
guarantees stability. Further more, even if the actual system parameter can exceed the
design-time estimations (due to estimation error or dramatic system change), stability is
still guaranteed by the closed loop system as long as G
a
(k) and G
m
(k) stay within the
above stability range.
Steady State Performance
We have proven that both miss ratio control and utilization control can achieve their
performance references in steady state as long as the systems remain stable. Therefore,
both the miss ratio control and utilization control provide robust and accurate
performance guarantees in steady state regardless of the actual values of miss ratio factor
and utilization ratio if they stay in the stability range (Equation 4.22).

72
0
2
4
6
8
10
12
14
0.5 1 1.5 2 2.5
Process Gain G
Miss Ratio Control: G = GaGm
Utilization Control: G = Ga
S
e
t
t
l
i
n
g

T
i
m
e

T
s

Figure 4.6. Settling Time vs. Process Gain

Transient Performance
Unlike stability and steady state performance, the closed loop systems settling time is
sensitive to the variations in miss ratio factor G
m
(k) and utilization ratio G
a
(k). Assume
we use an estimation of G
A
= 2.0 to compute the utilization control parameter K
P
= 0.37/
G
A
= 0.185 (as in our experiments in Section 4.5), then we plot the theoretical settling
time corresponding to different process gain G in Figure 4.6. The process gain decreases
from 12.5 sec to 4 sec as the process gain G increases from 0.8 to 2.2. This result shows
that with a same Controller parameter K
P
, the system reacts faster to overload when its
utilization ratio and miss ratio factor are larger. Therefore, a P Controller with a fixed
parameter K
P
cannot guarantee a fixed settling time. Instead, if the range of the process
gain G is known, a range of settling time can be guaranteed. For example, if we know
that the process gain stays in the range 0.8 G 2.0, the settling time can be guaranteed
to be in the range of 4.5 T
S
12.5 (sec) as shown in Figure 4.6.

73
Similarly, the overshoot is also sensitive to the variations in the process gain. For our
closed-loop transfer function in response to arrival overshoot (Equation 4.13(a)), the
overshoot remains zero if the closed loop pole p 0. Therefore, the system can achieve
zero overshoot in response to an arrival overload if miss ratio factor and
Miss Ratio Control: 0 < G
a
(k)G
m
(k) < 2.7G
A
G
M

Utilization Control: 0 < G
a
(k) < 2.7G
A

Equation 4.23
In summary, given the system parameters, the worst-case utilization ratio G
A
, and the
miss ratio factor G
M
, we can directly derive the control parameter K
P
based on Equation
4.18(a) and Equation 4.18(b) to guarantee a set of performance profiles including
stability, zero steady state error, and a satisfactory range of transient performance. Note
that the analytical tuning method of the control parameter is significantly easier and less
time consuming than ad hoc approaches based on repeated simulation experiments. This
is one important advantage of using our control-theory based FCS framework instead of
ad hoc solutions.
4.4.4. FC-RTS Algorithms
In this section, we present the design of FC-RTS algorithms based on the utilization
and/or miss ratio control to achieve satisfactory performance profiles in different types of
real-time systems. We also discuss the impact of the basic scheduling policy and
workloads on the FC-RTS algorithms design.
FC-U: Feedback Utilization Control
The FC-U scheduling algorithm uses a utilization control loop (see Figure 4.3(b)) to
control the utilization U(k). FC-U can guarantee that the system has zero miss ratio in
steady state if its reference U
S
A
th
where A
th
is the schedulable utilization threshold of

74
the system.
Because CPU utilization U(k) saturates at 100%, FC-U cannot detect how severely
the system is overloaded when U(k) remains at 100%. The consequence of this problem
is that FC-U can have a longer settling time than the analysis results based on the linear
model in severely overload conditions. The closer the reference is to 100%, the longer the
settling time will be. This is because the utilization control measures an error with a
smaller magnitude and thus generates a smaller control input than the ideal case
described by the linear model (Equation 4.11). For example, suppose the total requested
utilization A(k) = 200% and the utilization reference is 99%, the error measured by the
Controller would be E
U
= U
S
U(k) = 0.99 1 = -0.01; however, the error would have
been E
U
= U
S
U(k) = 0.99 2 = -1.01 according to the linear model described by
Equation 4.11. In the extreme case, U
S
= 100% can cause the system to stay in overload
(a settling time of infinity) because the error E
U
= 0 even when the system is severely
overloaded. Therefore, the reference U
S
should have enough distance from 100% (U
S

90%) to alleviate the impact of saturation on the control performance.
FC-U is especially appropriate for systems with a utilization bound that is a priori
known and not pessimistic. In such systems, FC-U can guarantee a zero miss ratio in
steady state if its reference U
S
A
b
A
th
. For example, FC-U can perform well in a
system with EDF scheduling and a periodic and independent task set because its
utilization bound is 100%. However, FC-U is not applicable for systems whose utilization
bounds are unknown or significantly pessimistic. In such systems, a reference that is too
optimistic (higher than the utilization threshold) can cause high miss ratio even in steady
state. On the other hand, a reference that is too pessimistic can unnecessarily underutilize

75
the system.
FC-M: Feedback Miss Ratio Control
The FC-M scheduling algorithm uses a miss ratio control loop (see Figure 4.3(a)) to
directly control the system miss ratio M(k) (FC-M has been called FC-EDF if EDF is
plugged into the Basic Scheduler [52]). Compared with FC-U, the advantage of FC-M is
that it does not depend on any knowledge about the utilization bound and therefore is
applicable in many real-world systems. In the process of directly controlling the miss
ratio, the miss ratio control loop always changes the total requested utilization A(k) to the
vicinity of the (unknown) utilization threshold A
th
(k). An additional advantage of FC-M is
that it can achieve higher CPU utilization than FC-U because the utilization threshold is
often higher than the utilization bound.
Similar to FC-U, FC-M has restrictions on the miss ratio reference M
S
due to
saturation. Because miss ratio M(k) saturates at 0, FC-M cannot detect how severely the
system is underutilized. Therefore FC-M can have a longer settling time than the analysis
results based on the linear model (Figure 4.3(a)) in severely underutilized conditions, and
the settling time increases as the miss ratio reference decreases. This is because the miss
ratio control measures an error of a smaller magnitude and generates a smaller control
input than the case of the linear model (Equation 4.11). For example, suppose the total
requested utilization A(k) = 10% and the miss ratio reference is M
S
= 1%, the error
measured by the Controller would be E
M
= M
S
M(k) = 0.01 0 = 0.01; however, the
error would have been much larger according to the linear model because it would have a
negative miss-ratio. In the extreme case, M
S
= 0 can cause the CPU to remain
underutilized because the error E
M
= 0 even when the system is severely underutilized.

76
Therefore, the miss ratio reference should have some distance from the saturation
boundary 0 (e.g., M
S
1%) to alleviate the impact of saturation on the control
performance. Unfortunately, a positive miss ratio reference also means that the system
cannot achieve zero miss ratio in steady state.
In summary, the FC-M scheduling algorithm (with a small positive miss ratio
reference) can achieve low deadline miss ratio (close to M
S
) and high CPU utilization
even if the systems utilization bound is unknown or time varying [52]. Since FC-M
cannot guarantee zero deadline miss ratio in steady state, it is only applicable to soft real-
time systems that can tolerate sporadic deadline misses in steady state.
CPU
Task Arrivals
Completed/Aborted
Tasks
QoS Actuator
FC-UM
Current Tasks
Adjust
QoS
Sched
M(k)
Basic
Scheduler
PI Controller
Min
D
BM
D
BU
D
B
PI Controller
Monitor
U(k)
M
S
U
S

Figure 4.7. The FC-UM Algorithm
FC-UM: Integrated Utilization/Miss Ratio Control
The FC-UM algorithm (also called FC-EDF
2
if EDF is plugged into the Basic Scheduler
[50]) integrates miss-ratio control and utilization control (Figure 4.7) together to achieve
the advantages of both FC-U and FC-M. In this integrated control scheme, both miss-
ratio M(k) and utilization U(k) are monitored. At each sampling instant, M(k) and U(k)
are fed back to two separate Controllers, the miss ratio Controller and the utilization

77
Controller, respectively. Each Controller then computes its control signal independently.
The control input of the utilization control D
BU
(k) is compared with the miss-ratio control
input D
BM
(k), and the smaller one D
B
(k) = min(D
BU
(k), D
BM
(k)) is sent to the QoS
Actuator.
Note that the advantage of FC-U is that it can achieve excellent performance (M(k) =
0) in steady state if the utilization reference is correct, while the advantage of FC-M is
that it can always achieve low (but non-zero) miss ratio and therefore is more robust in
face of utilization threshold variations. The integrated control structure can achieve the
advantages of both controls because of the following reasons. If used alone, the
utilization control would change the total requested utilization A(k) to its reference U
S
in
steady state, and the miss ratio control loop would change A(k) to the vicinity of the
utilization threshold A
th
(k) in steady state. Due to the min operation on the two control
inputs, the integrated control loop would change the total requested utilization to the
lower value caused by the two control loops, min(A
th
(k), U
S
). The implication of this
feature is that the integrated control loop always achieves the performance of the
relatively more conservative control loop in steady state. Specifically, in a system
scheduled by FC-UM, if U
S
A
th
(k), the utilization control dominates in steady state and
guarantees that the total requested utilization A(k) stays close to its utilization reference
U
S
and thus miss ratio M(k) = 0 in steady state. On the other hand, if U
S
> A
th
(k), the
utilization control dominates in steady state and guarantees that the total requested
utilization to stay close to its utilization threshold A
th
(k) and miss ratio M(k) = M
S
in
steady state.
Therefore, in a system with the FC-UM scheduler, the system administrator can

78
simply set the utilization reference U
S
to a value that causes no deadline misses in the
nominal case (e.g., based on system profiling or experiences), and set the miss ratio
reference M
S
according to the applications requirement on miss ratio. FC-UM can
guarantee zero deadline misses in the nominal case while guaranteeing that the miss ratio
stay close to M
S
even if the utilization threshold of the system becomes lower than the
utilization threshold. Our experimental results demonstrate that FC-UM achieves
satisfactory performance. The rigorous analysis of the integrated Controller is left for our
future work.
Impacts of Scheduling Policies and Applications on FC-RTS algorithm Design
An important factor that affects the design of FC-RTS algorithms is whether an a
priori known and non-pessimistic utilization bound exists for the scheduling policy and
workload of a system. Existing real-time scheduling theory has derived the schedulable
utilization bound for various scheduling policies based on different workload
assumptions. For example, assuming a periodic and independent task set, it has been
established that the schedulable utilization bound of EDF and RM is 100% and 69%,
respectively [48]. Recently, it is proven that the schedulable utilization bound for
Deadline Monotonic scheduling is 58% for general aperiodic and periodic tasks in the
ideal case [7]. Other papers established schedulable utilization bounds for other types of
workloads (e.g., [40][71]). Since FC-U can guarantee miss ratio M(k) = 0 in steady state
if its utilization reference U
S
A
b
, the utilization reference should be determined based
on the scheduling policy and workload of a system. For example, for an independent and
periodic task set scheduled by EDF, a U
S
= 90% is sufficient to guarantee that miss ratio
stays at 0 in steady state. Because FC-U can achieve zero steady state missratio, it is the

79
most appropriate FC-RTS algorithm for systems with a known and non-pessimistic
utilization bound. FC-UM can also achieve zero steady state miss-ratio in this type of
system, but it is more complicated than FC-U.
Unfortunately, the utilization bounds of many unpredictable real-time systems are
still unknown. For example, in a typical on-line trading server, database transactions and
Web request processing can be blocked frequently due to concurrency control, disk I/O,
and TCP congestion control. The task arrival patterns may also vary considerably
because its workload is composed of periodic price updating tasks and unpredictable and
aperiodic stock trading request processing. Deciding a utilization bound on top of
commercial OSs can become even more difficult due to unpredictable kernel activities
such as interrupt handling. Another issue is a theoretical utilization bound can be severely
pessimistic for the specific workload currently in a system. For example, although the
utilization bound of Rate Monotonic is 69% for periodic independent tasks, uniformly
distributed task sets often do not suffer deadline misses even when the CPU utilization
reaches 88% [44]. Enforcing the utilization at the utilization bound may not be cost-
effective in soft real-time systems. FC-M and FC-UM are more appropriate than FC-U
for systems without a known and non-pessimistic utilization bounds.
We should note that different scheduling policies and workloads usually introduce
different miss ratio factors G
M
. Because the gain K
P
of the miss ratio Controller should be
inversely proportional to the miss ratio factor (Equation 4.18(b)), the scheduling policy
and workload can directly affect the correct parameter of the miss ratio Controller. For
example, while our previous experiments showed that while the EDF algorithm with a
periodic task set led to a miss ratio factor G
M
= 1.254, the Extended Deadline Monotonic

80
(DM) algorithm with a mixed periodic and aperiodic task set has a much smaller miss
ratio factor G
M
= 0.447 (see Section 4.5.5). This result means that for DM with the mixed
task set, the K
P
of the miss ratio Controller should be 2.81 times the K
P
of EDF with the
aperiodic task set in order to achieve similar performance profiles.

In summary, we have designed three FC-RTS algorithms (FC-U, FC-M, and FC-UM)
using control theory based on an analytical model for a real-time system. Our control
theory analysis proves that the resultant FC-RTS algorithms can achieve the following
performance guarantees under the stability condition in Equation 4.22:
(1) Guaranteeing stability,
(2) Guaranteeing that the system miss ratio and utilization remains close to the
corresponding performance reference in steady state, and
(3) Satisfactory settling time (Figure 4.6) and zero overshoot under condition of
Equation 4.23 in transient state.
This design methodology is in contrast with existing ad hoc design methods that
depend on laborious design and testing iterations. We also investigate the impacts of
scheduling policies and workloads on the design of FC-RTS algorithms.
4.5. Experiments
In this section, we describe the simulation experiments that evaluate the performance of
our FC-RTS algorithms and the correctness of our control design. We first describe a
real-time CPU scheduling simulator used for our experiments. We then describe the
configurations of the experiments and workloads. A set of profiling experiments on the
controlled system is presented next. We then present two sets of evaluation experiments

81
on our FC-RTS algorithms.
4.5.1. FECSIM Real-Time System Simulator
The FC-RTS architecture is implemented on a generic uniprocessor real-time system
simulator called FECSIM [52]. FECSIM (Figure 4.8) has five components: a set of
Sources that each generates a periodic or aperiodic task; an Executor that emulate the
Basic Scheduler and the execution of the tasks; a Monitor that periodically measures
controlled variables; a Controller that periodically computes the control input based on
the performance errors; and a QoS Actuator that adjusts the QoS levels of the tasks to
optimize the system value (based on estimated utilizations) under the utilization
constraints. Different basic real-time scheduling policies can be plugged into the
Executor. The Controller can be turned on/off to emulate the closed loop or open loop
scheduling. The QoS Actuator can also be turned off for system profiling experiments
(see Section 4.5.5).
Source 1
Source 2
Source n

Executor
QoS
Actuator
Controller
Monitor
ready_q
finish
abort
controlled variables
M(k) and/or U(k)
control input
D
B
(k)
adjust QoS
performance
references
Scheduling Policy
QoS Optimization
Algorithm

Figure 4.8. The FECSIM Simulator
4.5.2. Scheduling Policy of the Basic Scheduler
To demonstrate the generality of our FC-RTS architecture and the robustness of our FC-

82
RTS algorithms, we present experimental results with two combinations of task sets and
scheduling policies in the basic scheduler. We denote these two combinations DM/PA
and EDF/P (see Table 4.1). We describe the scheduling policies in this section, and the
workloads in Section 4.5.3.
Configuration Basic Scheduling Policy Task Set
EDF/P EDF Periodic
DM/PA Extended Deadline Monotonic Mixed Periodic/Aperiodic
Table 4.1. Testing Configurations

Two different scheduling policies, Extended Deadline Monotonic (DM) and EDF are
used in the Basic Scheduler.
DM: Each (periodic or aperiodic) task is assigned a fixed priority that equals its
relative deadline. A shorter relative deadline leads to a higher priority. DM has
been proved to be the optimal static scheduling policy in term of maximizing
schedulable utilization bound under certain conditions [7].
EDF: Each (periodic or aperiodic) task is dynamically assigned a priority that
equals its absolute deadline. An earlier absolute deadline leads to a higher
priority. EDF is a major dynamic real-time scheduling policy [48][72].
4.5.3. Workload
Two different task sets are used in our evaluation experiments.
Mixed Periodic/Aperiodic (PA): the workload is composed of 50% aperiodic tasks
and 50% periodic tasks. This type of task set can be found in a typical on-line
trading server whose workload is composed of periodic stock updating tasks and
aperiodic user requests such as trading and information queries.
Periodic (P): all the tasks are periodic tasks. This type of task set emulates real-

83
time applications such as multi-media streaming and process control where most
of the system operations are periodic.
Each task follows the task model described in Section 4.1.1. Each task is assumed to
have three QoS levels (0, 1, 2) including the lowest level 0 that represents service
rejection. For the rejection level, both the task execution time and value are set to 0. The
distributions of the task parameters are as follows. For the purpose of presentation, we
assume each time unit is 0.1 ms.
EE
i
[j]: The estimated execution time ET
i
[2] of task T
i
at the QoS level 2 follows a
uniform distribution in the range [0.2, 0.8] ms, and ET
i
[1] = 0.2ET
i
[2].
AE
i
[j]: The actual execution time AE
i
[j] of task T
i
at QoS level j follows a normal
distribution N(AAE
i
, AAE
i
1/2
), where the average execution time AAE
i
[j] =
G
a
ET
i
[j]. G
a
, called the execution time factor, is a tunable workload
parameter that approximates the utilization ratio G
a
. The larger G
a
is, the
more pessimistic is the estimation of execution time. The maximum value of
G
a
is 2.0 in all of our experiments, which means that the estimated execution
time is twice the average execution time. Therefore, the worstcase utilization
ratio
Worst-Case Utilization Ratio: G
A
= 2.0 Equation 4.24
This value is used to compute the Controller parameters based on Equation
(4.18a) and (4.18b).
D
i
[j]: All QoS levels of a task T
i
have a same and fixed relative deadline D
i
= (10F
i

+ 10)ET
i
[2], where F
i
, follows a uniform distribution in the range of [10, 15].
A task instance is immediately aborted once it misses its deadline.

84
V
i
[j]: The value V
i
[j] of task T
i
at QoS level j is computed as a weight w
i
times its
estimated execution time, i.e., V
i
[j] = w
i
ET
i
[j]. The weight w
i
follows a
uniform distribution in the range [1, 5].
Periodic tasks:
P
i
[j]: All QoS levels of a task T
i
have a same period that equals its deadline P
i
= D
i
.
The average utilization of each periodic task i at QoS level j is AA
i
[j] =
AAE
i
[j]/P
i
.
Aperiodic tasks:
AI
i
[j]: The inter-arrival-time of an aperiodic task T
i
follows an exponential
distribution with an average inter-arrival-time of AI
i
= D
i
. The average
utilization of each periodic task i at QoS level j is AA
i
[j] = AAE
i
[j]/AI
i
.
EI
i
[j]: The estimated inter-arrival-time of an aperiodic task T
i
equals the average
inter-arrival-time, i.e., EI
i
= AI
i
= D
i
.
4.5.4. QoS Actuator
A Highest-Value-Density-First (HVDF) QoS assignment algorithm [67] is used in the
QoS Actuator. The value density of QoS level j of a task T
i
is defined as VD
i
[j] =
V
i
[j]/B
i
[j]. The HVDF algorithm assigns QoS levels to all the current tasks in the order of
the decreasing value density until the total estimated requested utilization reaches a
utilization constraint U
C
. A fixed threshold of 80% is used by open loop scheduling
algorithms. In comparison, our FC-RTS algorithms dynamically change the threshold U
C

= B(k+1) = B(k) + DB(k) at each sampling instant.
When each tasks utilization is small and there is no deadline misses, the HVDF
algorithm can approximate the optimal value under the utilization constraint. However, if

85
the actual requested utilization is unknown (as in the case in unpredictable
environments), the QoS optimization algorithm cannot always guarantee no deadline
misses and maximize the system value when used with an open loop scheduling
algorithm.
Note that our FC-RTS architecture can incorporate different real-time scheduling
policies and QoS optimization algorithms (although the scheduling policy does affect the
design of FC-RTS algorithms and the Controller parameters as discussed in Section
4.4.4)). Our work focuses on the steady and transient state performance of the feedback
control loop rather than evaluating the basic scheduling policies or QoS optimization
algorithms.
4.5.5. Profiling the Controlled Real-Time Systems
In the first set of experiments, we profile the controlled system to verify the saturation
properties of the controlled variables, miss ratio M(k) and CPU utilization U(k), and
measure the miss ratio factor G
M
, which is a key system parameter used for computing
the Controller parameter K
P
in miss ratio control (see Equation 4.18(a)).
Since we are interested in the properties of the controlled system, we turn off the
Controller and the QoS Actuator of FECSIM in the profiling experiments. A set of step
loads SL(0, L
m
) with different overload level L
m
are used to stress FECSIM with for 60
sec. Each step load is composed of a set of tasks with an average total requested
utilization of L
m
. The experiments are repeated for both EDF/P and DM/PA
configurations. We plot the measured average CPU utilization and average miss ratio
corresponding to each step load level L
m
in Figure 4.9(a) (DM/PA) and Figure 4.9 (b)
(EDF/P). Each point in the figures represents the average value of 5 runs. The 90%

86
confidence intervals of the average miss ratio are also shown, while the confidence
intervals of the average utilization are skipped because it is always within 1% from
corresponding average values.
0 20 40 60 80 100120140160180200
Average Total Requested Utilization (%)
0
20
40
60
80
100
A
v
e
r
a
g
e

M
i
s
s

R
a
t
i
o
,

A
v
e
r
a
g
e

U
t
i
l
i
z
a
t
i
o
n

(
%
)
(a) DM/PA
0 20 40 60 80 100120140160180200
Average Total Requested Utilization (%)
0
20
40
60
80
100
A
v
e
r
a
g
e

M
i
s
s

R
a
t
i
o
,

A
v
e
r
a
g
e

U
t
i
l
i
z
a
t
i
o
n

(
%
)
(b) EDF/P
Average Utilization
Average Miss Ratio

Figure 4.9. Controlled Variables vs. Total Requested Utilization
Profiling Results on DM/PA
First we study the profiling results on DM/PA. From Figure 4.9(a), we can see that CPU
utilization U(k) saturates at 100% after the step load level L
m
exceeds 100%. Miss ratio
M(k) saturates at 0 when the average total requested utilization A is below 90%, and
deadline misses starts to occur when A reaches 90%.
When A is above 90%, the systems average miss ratio increases as the total
requested utilization increases. We measure the maximum slop of the miss ratio curve
near the boundary of the saturation zone to approximate the miss ratio factor G
M
. In
Figure 4.9(a), the maximum slope is 0.447 when the average total requested utilization
increases from 100% to 110%. Therefore the miss ratio factor
Miss Ratio Factor (DM/PA): G
M
= 0.447 Equation 4.25
Profiling Results on EDF/P
Second, we study the profiling results on EDF/P. From Figure 4.9(b), we can see that
CPU utilization U(k) saturates at 100% after the step load level L
m
exceeds 100%. Miss
ratio M(k) saturates at 0 when the average total utilization A is below 100%, and deadline
misses starts to occur when A reaches 100% (the deadline misses when A = 100% is due
to random execution times of the workload).
When A is above 100%, the systems average miss ratio increases as the total
requested utilization increases. In Figure 4.9(b), the maximum slope is 0.447 when the
average total requested utilization increases from 100% to 110%.
Miss Ratio Factor (EDF/P): G
M
= 1.254 Equation 4.26
4.5.6. Controller Parameters
Based on the worst-case utilization ratio G
A
(Equation 4.24) of our workload, and the
worst-case miss ratio factor G
M
(Equation 4.25 and Equation 4.26) based on our profiling
experiments, we can compute the Controller parameter using Equation 4.18(a) and
Equation 4.18(b). The resultant Controller parameter K
P
for each FC-RTS algorithm is
listed in Table 4.2. All FC-RTS algorithms has a sampling window W = 0.5 sec in all
experiments.
FC-UM
FC-U FC-M
K
P
(DM/PA) 0.414
K
P
(EDF/P)

0.185
0.148
W (Sampling
Window)
0.5 sec
Table 4.2. Controller Parameters of FC-RTS Algorithms

88
4.5.7. Performance References
The miss ratio reference depends on the applications requirement and tolerance to
deadline misses in steady state. For example, Amazon.com may accept a higher miss
ratio reference than E-Trade.com because usually merchandize purchase may have less
strict timing constraints than stocking trading transactions. We assume that a miss ratio
reference M
S
= 2% (in both FC-M and FC-UM) is appropriate our simulated applications.
The utilization reference U
S
should be lower than the nominal utilization threshold of the
basic scheduling policy and the task set. We have also discussed in Section 4.4.4 that U
S

should be lower than 100%, the saturation boundary the utilization control. Since the
theoretical utilization bound of EDF and periodic task set is 100% in the ideal case [48],
we set U
S
= 90% in both FC-M and FC-UM in the EDF/P case. Although it has been
shown that DM and general (aperiodic and/or periodic) task sets have a theoretical
utilization bound of 58%, this bound is too pessimistic for our mixed aperiodic/periodic
task set. For example, in our profiling experiments (Figure 4.9(a)), the utilization
threshold A
th
appears to be in the range (90%, 100%). We choose U
S
= 80% in FC-U and
U
S
= 90% in FC-UM. FC-UM has a more optimistic utilization reference than FC-U
because the miss ratio control in FC-UM provides a worst-case bound for the closed-loop
performance even if the utilization reference becomes higher than the actual utilization
threshold. The chosen performance references are summarized in Table 4.3.
FC-U FC-M FC-UM
80% (DM/PA)
U
S

90% (EDF/P)

N/A

90%
M
S
N/A 2% 2%
Table 4.3. Performance References of FC-RTS Algorithms

89
0 5 10 15 20 25 30 35
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(a) FC-U
Us
U(k)
B(k)
M(k)
0 5 10 15 20 25 30 35 40 45 50
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(b) FC-M
U(k)
B(k)
M(k)
0 5 10 15 20 25 30 35 40 45 50
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(c) FC-UM
Us
U(k)
B(k)
M(k)
0 10 20 30 40 50
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(d) Open-Loop Baseline
U(k)
B(k)
M(k)

Figure 4.10. Response of Scheduling Algorithms to Arrival Overload SL(0, 150%)
(DM/PA)

90
0 5 10 15 20 25 30 35
Time (second)
0
20
40
60
80
100
U
t
i
l
i
z
a
t
i
o
n
;

M
i
s
s

R
a
t
i
o

(
%
)
(a) FC-U
Us
U(k)
B(k)
M(k)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(b) FC-M
U(k)
B(k)
M(k)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Time (second)
0
20
40
60
80
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(c) FC-UM
Us
U(k)
B(k)
M(k)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Time (second)
0
20
40
60
80
100
U
t
i
l
i
z
a
t
i
o
n
;

M
i
s
s

R
a
t
i
o

(
%
)
U(k)
B(k)
M(k)

Figure 4.11. Response of Scheduling Algorithms to Arrival Overload SL(0, 150%)
(EDF/P)
4.5.8. Evaluation Experiment A: Arrival Overload
In this Section, we present the performance evaluation results of three FC-RTS
algorithms, FC-U, FC-M, and FC-UM in response to an arrival overload SL(0, 150%).
The execution time factor G
a
= 2. Therefore the average execution time of each task was
twice of the estimation. An open loop scheduling algorithm using a fixed utilization
constraint B = 80% for QoS Optimization is also evaluated as a baseline. The same
scheduling policies (DM and EDF) and QoS optimization algorithm (the HVDF
algorithm) are used for all FC-RTS algorithms and the baseline. A zero initial value for
B(0) = 0 for the total estimated utilization B(k) is used in this Section. A larger initial
value for B(k) is used in Experiment B (Section 4.5.9) to reduce the settling time of FC-

91
M and FC-UM.
The sampled miss ratio M(k) and utilization U(k) of a typical run for each scheduling
algorithm are illustrated in Figure 4.10 (DM/PA) and Figure 4.11 (EDF/P). We now
describe the results for each of the scheduling algorithms.
FC-U
The performance evaluations of FC-U with DM/PA and EDF/P are illustrated in Figure
4.10(a) and Figure 4.11(a), respectively. First we look at FC-U with DM/P. In response to
the arrival overload, FC-U increases the CPU utilization U(k) by increasing the total
estimated utilization B(k) of the tasks in the system. The increasing B(k) is enforced by
the QoS Actuator that increases task QoS levels with the QoS optimization algorithm
HVDF. By 4.5 sec, the settling time predicted by our control analysis (see Section 4.4.3),
U(k) reaches 77.1%, which is within 3.6% of the reference U
S
= 80%. This result is close
to our prediction that the U(k) should reach within 2% of the reference by 4.5 sec. The
small difference between the experimental results and the theoretical prediction is due to
the randomness of our workload
2
. U(k) never reaches beyond 80% in the transient state
(before 4.5 sec). This result is also consistent with our theoretical prediction of zero
overshoot, U
O
= 0 (see Section 4.4.3).
The CPU utilization U(k) remains stable all through the run. After 4.5 sec, the
utilization stays close to 80% and the system error stays close to zero. Because U(k) stays
below the utilization threshold, the miss ratio M(k) = 0 in throughout the run.
The performance of FC-U with EDF/P (see Figure 4.11(a)) is similar to that of FC-U

2
The measured performance profiles in our performance experiments are approximations to the theoretical
definitions due to the noise introduced by the random workload.

92
with DM/PA. At 4.5 sec, FC-U increases the CPU utilization U(k) to 87.14%, within
3.2% of the reference U
S
= 90%. U(k) never reaches beyond 90% in the transient state
(before 4.5 sec). The CPU utilization U(k) remains stable all through the run and close to
90% after 4.5 sec. Because U(k) stays below the utilization threshold, the miss ratio M(k)
= 0 throughout the run.
FC-M
The performance evaluations of FC-M with DM/PA and EDF/P are illustrated in Figure
4.10(b) and Figure 4.11(b), respectively. We first study FC-M with DM/PA. In response
to the arrival overload, FC-U increases the total estimated utilization B(k) by increasing
the QoS levels of arriving or admitted tasks. As discussed in Section 4.4.3, due to the
saturation problem of the miss-ratio control, the setting time of FC-M in response to the
arrival overload is longer than the prediction based on the linear model (Equation 4.13).
M(k) stays at 0 for the first 26.5 sec since the beginning of the arrival overload. The
system settles at approximately 30 sec when M(k) reaches 1.23% (within 0.77% to the
reference M
S
= 2%) and U(k) reaches 94.44%. We can shorten the settling time of FC-M
in response to arrival overload by assigning a larger initial value to the total estimated
utilization B[0] (as shown in Section 4.5.9). M(k) never reaches beyond 2% and therefore
achieves zero miss ratio overshoot in the transient state.
M(k) remains stable throughout the run. In steady state (after 30 sec), M(k) stays close
to 2% and below 5% throughout the run except for M(k) = 5.97% at 31.5 sec. This result
shows that the steady state error is close to 0 as predicted by our analysis in Section 4.4.3.
We also observe that with FC-M, the CPU utilization U(k) in steady state is clearly
higher than the CPU utilization (close to 80%) in the run of FC-U. This is because by

93
directly controlling the miss ratio, FC-M can change the CPU utilization to the vicinity of
the (a priori unknown) utilization threshold, which is higher than the utilization reference
of FC-U that is set to 80% a priori.
The performance of FC-M with EDF/P (Figure 4.11(b)) is similar to the case of
DM/PA. The settling time is approximately 87 sec when the miss ratio reaches 2.88%.
FC-M with EDF/P achieves zero overshoot in transient state. The miss ratio stays close to
2% in steady state and remains stable throughout the run. Similar to the case of DM/PA,
FC-M with EDF/P also has a higher CPU utilization (close to 100%) than FC-U with
EDF/P (close to 90%) in steady state.
In summary, compared with FC-U, FC-M achieves higher CPU utilization and
robustness with regard to utilization threshold variations at the cost of a low but non-zero
miss ratio in steady state.
FC-UM
The performance evaluations of FC-UM with DM/PA and EDF/P are illustrated in Figure
4.10(c) and Figure 4.11(c), respectively. First we study the performance of FC-UM with
DM/PA. After the overload arrives, FC-UM increases the utilization U(k). Similar to FC-
M, the miss ratio M(k) stays at 0 and the CPU utilization U(k) increases slower than FC-
U. In the beginning of the run, the (saturated) miss ratio control computes a smaller
control signal D
BM
(0) = K
P
(M
S
M(0)) = 0.414*(0.020) = 0.008 than of the utilization
controls signal D
BM
(0) = K
P
(U
S
U(0)) = 0.185*(0.90) = 0.167. Due to the min
operation on control inputs from the two Controllers, the miss ratio control dominates the
control loop in the starting phase. The miss control signal remains 0.008 and stays
smaller than the utilization control signal, which decreases as the utilization U(k)

94
increases. At time 27 sec, the utilization U(54) reaches 94.9% and the miss ratio M(54) =
0.93%. Now the utilization control signal D
BU
(54) = -0.009 becomes smaller than the
miss ratio control signal D
BM
(54) = 0.004 and takes over the control loop. Because the
utilization threshold is higher than the utilization reference U
S
= 90%, the utilization
control dominates the control loop, and U(k) stays close to 90% while the miss ratio stays
at 0 after 27 sec. Therefore, the settling time is approximately 27 sec. Since neither U(k)
nor M(k) surpasses its corresponding reference in transient state (before 27 sec), FC-UM
achieves 0 overshoot in both U(k) and M(k).
In the steady state, the utilization U(k) stays close to 90% and hence FC-UM achieves
zero steady state error in term of the utilization. The miss ratio M(k) remains close to 0%,
lower than the miss ratio reference M
S
= 2% throughout the steady state except M(63) =
2.04%. This is because the utilization reference is lower than the utilization threshold and
therefore dominates the control loop in the steady state. Note that if the utilization
reference were higher than the utilization threshold, the miss ratio control would
dominate the control loop and FC-UM would achieve zero steady error in term of miss
ratio and a steady state utilization close to the utilization threshold. The system remains
stable throughout the run.
The performance of FC-UM with EDF/P (Figure 4.11(c)) is similar to the case of FC-
UM with DM/PA. The miss-ratio control dominates the control loop in the beginning of
the experiment until 75 sec (the settling time) when the utilization control starts to take
over the control loop. FC-UM with EDF/P achieves zero overshoot in both utilization
U(k) and M(k). Because the utilization reference U
S
is lower than the utilization
threshold, FC-UM with EDF/P achieves zero steady state error in term of utilization and

95
the miss ratio stays at 0 throughout the steady state.
In summary, FC-UM combines the advantages of both FC-U and FC-M and achieves
zero steady state miss ratio in the nominal case when the utilization reference is lower
than the utilization threshold. Furthermore, FC-UM can also achieve a low steady state
miss ratio even if the systems utilization threshold changes to lower than the utilization
reference.
Open Loop QoS Optimization Algorithm
In comparison with the FC-RTS algorithms, the system scheduled by the open loop QoS
optimization algorithm suffers from high miss ratios with both DM/PA and EDF/P (see
Figure 4.10(d) and Figure 4.11(d)). This is because the task execution time is on average
twice of the estimation and the QoS optimization algorithm overloaded the CPU due to
the incorrect estimations on task execution time. On the other hand, the system would
suffer from low CPU utilization if the task execution time were lower than the estimation
(see Section 4.5.9). This result demonstrates that the open loop QoS optimization
algorithm is incapable of maintaining satisfactory performance in face of unpredictable
workload.
In summary, we have demonstrated that all of our three FC-RTS algorithms, FC-U,
FC-M, FC-UM can provide desired performance guarantees in terms of miss ratio and
CPU utilization in steady state and achieve satisfactory performance profiles in response
to an arrival overload SL(0, 150%) when the average task execution times is different
from the estimation. In contrast, the open-loop QoS optimization fails to provide such
performance guarantees in face of the same overload.

96
0.8 0.8
1.26 1.26
2 2
1.5 1.5
0
0.5
1
1.5
2
2.5
0 100 200 300 400
Time (sec)
G
a
'

Figure 4.12. Execution Time Factor G
a
in Experiment B
4.5.9. Evaluation Experiment B: Arrival/Internal Overload
In the second set of evaluation experiments, we stress our FC-RTS algorithms and the
baseline with a more unpredictable load profile than the one used in Experiment A. The
new load profile causes an arrival overload of SL(0, 150%) in the beginning of each run.
Furthermore, the average task execution times of all tasks vary every 100 sec to create
internal overload in the system. The execution time factor G
a
throughout the run is
shown in Figure 4.12. The execution time factor G
a
instantaneously jumps from 0.8 to
1.26 at time 100 sec. This change causes a 57.5% increase in the average execution time
of every task. Suppose the total requested utilization of the system is A(200) before the
jump, the execution time change corresponds to an internal overload of SL(A(200),
1.575A(200)). A similar step load SL(A(400), 1.575A(400)) occurs again at time 200 sec
when G
a
jumps from 1.26 to 2. The jump at time 300 sec, on the other hand, creates an
internal underload SL(A(600), 0.75A(600)) (modeled as a negative step signal) when G
a

instantaneously decreases from 2 to 1.5.
In this set of experiments, a larger initial value B(0) = 80% is assigned to the
estimated requested utilization B(k) to shorten the settling time of FC-M and FC-UM in

97
response to arrival overloads. The configurations of our FC-RTS algorithms are listed in
Table 4.2 and Table 4.3. The open-loop baseline uses a fixed B(k) = 80% and 90% for
QoS optimization with DM/PA and EDF/P, respectively. A typical run of each of the FC-
RTS algorithms and the baseline is shown in Figure 4.13 (DM/PA) and Figure 4.14
(EDF/P).
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(a) FC-U
Us
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(b) FC-M
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(c) FC-UM
Us
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
U(k)
B(k)
M(k)

Figure 4.13. Response of Scheduling Algorithms to Arrival/Internal Overload (DM/PA)

98
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(a) FC-U
Us
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(b) FC-M
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
(c) FC-UM
Us
U(k)
B(k)
M(k)
0 50 100 150 200 250 300 350 400
Time (second)
0
50
100
U
(
k
)
;

B
(
k
)
;

M
(
k
)

(
%
)
U(k)
B(k)
M(k)

Figure 4.14. Response of Scheduling Algorithms to Arrival/Internal Overload (EDF/P)
FC-U
The performance evaluations of FC-U with DM/PA and EDF/P are illustrated in Figure
4.13(a) and Figure 4.14(a), respectively. We first study the performance of FC-U with
DM/PA. Because FC-U starts from an non-zero estimated requested utilization B(0) =
80%, it settles to steady state within only 0.5 sec. FC-U stays in steady state while the
utilization U(k) stays close to the utilization reference U
S
= 80% and the miss ratio M(k)
remains 0 until 200 sec when the average execution time of every task increases from 0.8

99
to 1.26 and causes an internal overload. Consequently, the utilization U(k) overshoots to
100% at 100.5 sec and the miss ratio increases to 13.06%. In response to the overload
condition, FC-U reduces total estimated utilization B(k) in the system by lowering task
QoS levels. Within 9 sec (closed to the predicted settling time of 8 sec for G
a
= 1.26)
after the internal overload starts, U(k) decreases to 80.34% while the M(k) becomes 0,
and the system resettles in a steady state with the utilization close to 80% and miss ratio
M(k) = 0%. FC-U responds similarly to the internal overload at 200 sec when the
execution factor increases from 1.26 to 2. The system settles down to a satisfactory
steady state within 5 sec (close to the predicted settling time of 4.5 sec for G
a
= 2).
At time 300 sec, an internal underutilization occurs at 300 sec when the execution
time factor decreases from 2 to 1.5. Consequently, the utilization U(k) decreases to
63.36%. In response to the instantaneous underutilization, FC-U increases total estimated
utilization B(k) allowed in the system by improving task QoS levels. At 305 sec, the
system resettles in a satisfactory steady state with the utilization close to 80% and miss
ratio M(k) = 0%.
The performance of FC-U with EDF/P is similar to the above case of FC-U with
DM/PA. The FC-U with EDF/P reacts to both the arrival overload and the subsequent
internal load variations efficiently and (re)settles to a satisfactory steady state with the
utilization close to 80% and 0 miss ratio. The performance profiles of the FC-M with
DM/PA and EDF/P are summarized in Table 4.4. Note that FC-U with DM/PA and
EDF/P both provide 80% utilization in all steady states despite of the difference in
execution times. This observation verifies that FC-U has zero sensitivity with regard to
execution time variations and provides robust performance guarantees in face of

100
unpredictable workloads.
Miss Ratio Utilization
Settling Time 4.5 sec Transient State
(0 ~24.5) sec Absolute Overshoot 0.88% 84.12%
Steady State
(24.5~100.5) sec
Average 0.00% 80.04%
Settling Time 9 sec Transient State
(100.5~114) sec Absolute Overshoot 13.06% 100%
Steady State
(114~200.5)
Average 0.00% 80.00%
Steady State
(212~300.5) sec
Average 0.00% 80.01%
(300.5~311) sec Absolute Overshoot 0.00% 87.22%
Steady State
(311~400) sec
Average 0.00% 80.00%
(a) FC-M with DM/PA
(0 ~79.5) sec Absolute Overshoot 0 90.20%
Steady State
(79.5~100.5) sec
Average 0.00% 90.03%
Settling Time 13% Transient State
Steady State
(111~200.5) sec
Average 0.00% 90.01%
(200.5~205.5) sec Absolute Overshoot 32.03% 100%
Steady State
(205.5~300.5) sec
Average 0.00% 90.00%
(300.5~327.5) sec Absolute Overshoot 0.00% 88.32%
Steady State
(327.5~400) sec
Average 0.00% 89.99%
(b) FC-M with EDF/P
Table 4.4. The Performance Profiles of FC-U in Experiment B
FC-M
The run of FC-M with DM/PA and EDF/P is illustrated in Figure 4.13(b) and Figure
4.14(b), respectively. We first study the run of FC-M with DM/PA. Because the initial

101
estimated requested utilization B(0) = 80% and the initial execution time factor G
a
= 0.8,
the system starts from zero miss ratio with utilization close to 64%. The system settles to
an steady state within 24.5 sec when the miss ratio M(49) = 1.41% and the utilization
U(49) = 96.1%. FC-U stays in steady state while the miss ratio U(k) stays close to the
miss ratio reference M
S
= 2% and the utilization U(k) stays close to 100%. In the steady
state (from 24.5 sec to 100 sec), the average miss ratio is 1.88% (steady state error E
SM
=
0.11%) while the average utilization is 99.1%.
The system stays in steady state until 200 sec when the average execution time of
every task increases from 0.8 to 1.26 and causes an internal overload. Consequently, the
miss ratio and the utilization overshoots to 1.41% and 40.7%, respectively. In response to
the instantaneous overload, FC-M reduces the total estimated utilization B(k) by lowering
task QoS levels. Within 13.5 sec after the internal overload starts, the M(k) changes to
M(228) = 1.44% while U(228) = 98.0%, and the system resettles to a steady state. In the
new steady state (from 114.5 sec to 200 sec), the average miss ratio is 1.95% (steady state
error E
SM
= 0.05%) while the average utilization is 98.45%. Similarly, FC-M successfully
responds to the internal overload at 200 sec and settles down to a satisfactory steady state
within 11.5 sec.
At time 300 sec, the execution time factor decreases from 2 to 1.5 and the utilization
consequently drops to 79.9%. At time 211 sec, M(k) increases to 3.2% while U(k)
increases to 99.74%, and the system resettles in a satisfactory steady state with an
average miss ratio of 2.01% (the steady state error E
SM
= -0.01%) and an average miss
ratio of utilization close to 98.27%.
The performance of FC-M with EDF/P is similar to the above case of FC-M with

102
DM/PA. The FC-M with EDF/P successfully reacts to the arrival overload and the
subsequent internal load variations and (re)settles to steady states with the miss ratio
close to 2% and utilization close to 100% despite of the difference in execution times.
This observation verifies that FC-M has zero sensitivity with regard to execution time
variations and provides robust performance guarantees in face of unpredictable
workloads. The performance profiles of the FC-M with DM/PA and EDF/P are
summarized in Table 4.5.
Steady State
(24.5~100.5) sec
Average 1.88% 99.01%
Steady State
(114~200.5)
Average 1.95% 98.45%
Steady State
(212~300.5) sec
Average 2.01% 97.61%
Steady State
(311~400) sec
Average 2.01% 98.27%
(a) FC-M with DM/PA
Steady State
(79.5~100.5) sec
Average 2.23% 100.00%
Steady State
(111~200.5) sec
Average 1.99% 100.00%
Steady State
(205.5~300.5) sec
Average 2.03% 99.99%

103
Steady State
(327.5~400) sec
Average 2.04% 100.00%
(b) FC-M with EDF/P
Table 4.5. The Performance Profiles of FC-M in Experiment B
FC-UM
The run of FC-UM with DM/PA and EDF/P is illustrated in Figure 4.13(c) and Figure
4.14(c), respectively. We first study the run of FC-UM with DM/PA. In response to the
arrival overload at time 0, the miss ratio control dominates the control loop in the
transient state until the utilization approaches the utilization reference U
S
= 90% when the
utilization takes over the control loop. Because the utilization reference is lower than the
utilization threshold, the utilization control dominates the control loop and the system
settles to steady state at 17.5 sec. The miss ratio M(k) stays at 0% most of the time while
in steady state, and the utilization U(k) stays close to 90%.
The system stays in the steady state until 200 sec when the average execution time of
every task increases from 0.8 to 1.26. The utilization U(k) overshoots to 100% and the
miss ratio overshoots to 24.53%. Although both the miss ratio control and the utilization
control compute negative control signals in response to the internal overload, the miss
ratio takes over the control loop because the utilization saturates at 100% resulting in a
control signal with a smaller magnitude. The miss ratio control dominates the control
loop until the utilization approaches 90% and the miss ratio becomes zero. The FC-UM
then takes over and the system settles to a new steady state at 105 sec with an average
miss ratio of 0.07% and an average utilization of 89.85% (steady state error E
SU
=
0.15%).
FC-UM responds similarly to the internal overload at 200 sec when the execution

104
factor increases from 1.26 to 2. The system settles down to a satisfactory steady state
within 2.5 sec. In the steady state (from 203 sec to 300 sec), the average miss ratio is
0.12% and the average utilization is 89.71% (steady state error E
SU
= 0.29%).
At time 300 sec, the execution time factor decreases from 2 to 1.5 and the utilization
U(k) drops to 69.24%. Similar to the beginning of the run, FC-UM increases total
estimated utilization B(k) by improving task QoS levels. At time 308.5 sec, U(k)
increases to 92.02% while the system resettles in a steady state with an average miss ratio
of 0.07% and an average utilization close to 89.90% (the steady state error E
SU
= 0.10%).
The performance of FC-UM with EDF/P is similar to the above case of FC-UM with
DM/PA. The FC-UM with EDF/P successfully reacts to both the arrival overload and the
internal overload and (re)settles to satisfactory steady states while the miss ratio stays
close to 0% and the utilization stays close to 90% despite of the difference in execution
times. This observation verifies that FC-UM has zero sensitivity with regard to execution
time variations and provides robust performance guarantees in face of unpredictable
workloads. The performance profiles of the FC-UM are summarized in Table 4.6.

Steady State
(24.5~100.5) sec
Average 0.03% 89.92%
Steady State
(114~200.5)
Average 0.07% 89.85%
Steady State
(212~300.5) sec
Average 0.12% 89.71%

105
Steady State
(311~400) sec
Average 0.07% 89.90%
(a) FC-UM with DM/PA
Steady State
(79.5~100.5) sec
Average 0.00% 89.48%
Steady State
(111~200.5) sec
Average 0.00% 89.85%
Steady State
(205.5~300.5) sec
Average 0.00% 89.78%
Steady State
(327.5~400) sec
Average 0.00% 89.86%
(b) FC-UM with EDF/P
Table 4.6. The Performance Profiles of FC-UM in Experiment B
Open-Loop Baseline
The performance of the open-loop baseline with DM/PA and EDF/P is illustrated in
Figure 4.13(d) and Figure 4.14(d), respectively. In contrast with our FC-RTS algorithms,
the open-loop baseline fails to provide performance guarantees in miss ratio or utilization
in with both EDF/P and DM/PA. When task execution times are lower than the
estimations (from 0 to 100 sec), the baseline algorithm underutilizes the CPU (with
utilization U(k) close to 72%). On the other hand, when the execution exceeds the
estimations (from 100.5 sec to 400 sec), the system suffers from persistent deadline
misses. For example, the baseline with DM/PA has an average miss ratio of 9.23% from
200.5 sec to 300 sec and the miss ratio reaches 94.1%. The baseline with EDF/P has an
average miss ratio of 51.39% in the same period.

106
In summary, our evaluation results verify that our FC-RTS algorithms can provide the
following performance guarantees under the stability condition in Equation 4.22:
(1) Stability in face of arrival overload and internal overload
(2) System miss ratio and utilization stay close to the corresponding performance
reference in steady state regardless of variations in task execution times
(3) Satisfactory settling time and low overshoot in transient state
In addition to the performance profiles, the average performance of the FC-RTS
algorithms and the baseline are shown in Figure 4.15(a) (DM/PA) and Figure 4.15(b)
(EDF/P). The considered performance metrics include the average miss ratio M
a
, average
CPU utilization U
a
, and the Average Value Completion Ratio V
a
defined as the total
completed value divided by the total values of all the arriving tasks at the highest QoS
level. V
a
characterizes the utility and throughput of the system throughout the run. All of
the above metrics is computed based on the performance throughout the run. Every data
point in Figure 4.15(a) and Figure 4.15(b) is the mean of 5 repeated runs. The 90%
confidence interval of each M
a
, U
a
, and V
a
is within 0.91%, 0.23%, and 1.25%,
respectively, to its mean. We can see that all the FC-RTS algorithms consistently
outperform the open-loop baseline in terms of average miss ratio and the value
completion ratio.

107
0.13
2.14
0.32
9.11
80.31
96.89
89.27
90.90
46.51
51.85
50.55
41.87
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
FC-U FC-M FC-UM Baseline
%
Ma
Ua
Va

(a) DM/PA
1.00
2.15
0.56
26.38
90.27
95.89
87.81
93.00
50.73
52.01
49.58
28.88
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
FC-U FC-M FC-UM Baseline
%
Ma
Ua
Va

(b) EDF/P
Figure 4.15. Average Performance of FC-RTS algorithms and the Baseline
(M
a
: Average Miss Ratio; U
a
: Average Utilization; V
a
: Average Value Completion Ratio)

In summary, our evaluation results demonstrate that our three FC-RTS algorithms
provide robust and precise performance guarantees in term of utilization and miss ratio

108
even when the workload significantly varies from the estimation. Furthermore, they also
achieve satisfactory transient state performance profiles in response to arrival and internal
overload. In contrast, an open loop QoS optimization algorithm fails to provide such
guarantees when the workload deviates from the a priori estimation.
4.6. Comparison of Real-Time Scheduling Algorithms in Overload
We now qualitatively compare several existing real-time scheduling algorithms (see
Table 4.7). Our comparison is based two criteria, the required knowledge of the workload
by a scheduler and its performance in overload conditions. Simple algorithms such as
Rate (Deadline) Monotonic based on off-line schedulability analysis depend on complete
knowledge about the workload and the system including tasks resource requirements,
future arrivals and the systems schedulable utilization bound. These algorithms cannot
work in overload conditions because of their lack of overload handling mechanisms. The
(open loop) on-line admission control or QoS optimization based algorithms add
flexibility to real-time systems by not requiring knowledge about task future arrivals,
although the tasks resource requirements and utilization bound still need to be known a
priori. FC-RTS algorithms accomplished the next level of flexibility by providing robust
performance guarantees without requiring a priori knowledge about tasks resource
requirements and even the utilization bound as in the case of FC-M and FC-UM.
Therefore, FC-RTS algorithms provide the most appropriate solutions for soft real-time
systems in unpredictable environments. Such systems include online trading and e-
business servers, and data-driven systems such as smart spaces, agile manufacturing, and
many defense applications such as C4I.

109
Knowledge of the Workload/System Performance in Overload
Miss Ratio Task
resource
requirement
Future
arrival
time
Utilization
Bound
Steady
state
Transient
state
CPU utilization
RM, EDF Yes Yes Yes N/A N/A
Open Loop
Admission
Control/QoS
Optimization
Yes No Yes 0 High if estimation of
resource
requirement is not
pessimistic; Low
otherwise
FC-U No No Yes 0 Bounded
by
overshoot
High
FC-M No No No Small Bounded
by
overshoot
High

FC-UM
No No No 0
nominally;
Guaranteed
to be small
Bounded
by
overshoot
High
Table 4.7. Comparison of Real-Time Scheduling Paradigms in Overload Conditions
4.7. Summary
We successfully apply our FCS framework to systematically design a set of feedback
control real-time CPU scheduling (FC-RTS) algorithms that achieve desired transient and
steady state performance specifications in face of unpredictable task execution times and
arrivals. The key results of our research on CPU scheduling include:
A novel FC-RTS architecture that integrates performance feedback control with
different real-time scheduling policies and QoS optimization algorithms,
Specialization of the generic performance profiles and load profiles to metrics
based on CPU utilization and deadline miss ratio, and arrival/internal overload in
CPU-bound real-time systems,
An analytical model of CPU-bound real-time systems that provides a foundation
for design and analysis of such algorithms with established control theory,

110
A set of FC-RTS algorithms, FC-U, FC-M, and FC-UM, that provides following
performance guarantees in terms of deadline miss ratio and/or CPU utilization for
different types of real-time applications in unpredictable environments,
Stability in face of arrival overload and internal overload,
Accurate enforcement of performance references in steady state, and
Satisfactory settling time and low overshoot in transient state.
A set of tuning/analysis results of FC-RTS algorithms to achieve desired
performance profiles in response to unpredictable overload conditions,
Simulation evaluation results that demonstrate our FC-RTS algorithms achieve
robust performance guarantees and desired performance profiles in response to
new task arrivals and execution time variations, and
A qualitative comparison of classical real-time scheduling paradigms and FC-
RTS algorithms that achieve a leap in flexibility and predictability in

111

Chapter 5
Web Server with Delay Guarantees
5.1. Introduction
The increasing diversity of applications supported by the World Wide Web and the
increasing popularity of time-critical web-based applications (such as online trading)
motivates building QoS-aware web servers. Such servers customize their performance
attributes depending on the class of the served requests so that more important requests
receive better service. From the perspective of the requesting clients, the most visible
service performance attribute is typically the service delay. Different requests may have
different tolerances to service delays. For example, one can argue that stock trading
requests should be served more promptly than information requests. Similarly, interactive
clients should be served more promptly than background software agents such as web
crawlers and prefetching proxies. Some businesses may also want to provide different
service delays to different classes of customers depending on their importance or monthly
fees. In this chapter, we provide a solution to support delay differentiation in web servers.
While existing best effort differentiation approaches [9][10][13][28] on web servers

112
usually offer better service to premium clients, they do not provide any guarantees do not
provide guarantees on the extent of the difference between premium and basic
performance levels. This difference depends heavily on load conditions and may be
difficult to quantify. In a situation where clients pay to receive better service, any
ambiguity regarding the expected performance improvement may cause client concern,
and is, therefore, perceived as a disadvantage. Compared with the best effort
differentiation model, the proportional differentiated service and the absolute guarantee
model both provide stronger guarantees in service differentiation.
In the absolute guarantee model, a fixed maximum service delay (i.e., a soft deadline)
for each class needs to be enforced. A disadvantage of the absolute guarantee model is
that it is usually difficult to determine appropriate deadlines for web services. For
example, the tolerable delay threshold of a web user may vary significantly depending on
web page design, length of session, browsing purpose, and properties of the web browser
[19]. Since system load can grow arbitrarily high in a web server, it is impossible to
satisfy the absolute delay guarantees of all service classes under overload conditions. The
absolute delay guarantee requires that all classes receive satisfactory delay if the server is
not overloaded; otherwise desired delays are violated in the predefined priority order, i.e.,
low priority classes always suffer guarantee violation earlier than high priority classes
3
.
In the absolute guarantee model, deadlines that are too loose may not provide necessary
service differentiation because the deadlines can be satisfied even when delays of
different classes are the same. On the other hand, deadlines that are too tight can cause

3
Another scheme to implement absolute guarantee is to apply admission control on incoming requests
during overload conditions. However, from the perspective of web clients, request denial by admission
control is no better than service failure due to overload.

113
extremely long latency for low priority classes in order to enforce high priority classes
(potentially unnecessarily) tight deadlines.
In the proportional differentiated service model introduced in [26], a fixed ratio
between the delays seen by the different service classes can be enforced. This architecture
provides a specification interface and an enforcement mechanism such that a desired
"distance" between the performance levels of different classes can be specified and
maintained. This service model is more precise in its performance differentiation
semantics than the best effort differentiation model. The proportional differentiated
service is also more flexible than absolute guarantee because it does not require fixed
deadlines being assigned for each service class.
Depending on the nature of the overload condition, either the proportional
differentiated service or the absolute guarantee may become more desirable. The
proportional differentiated service may be less appropriate in severe overload conditions
because even high priority clients may get extremely long delays. In nominal overload
conditions, however, the proportional differentiated service may be more desirable than
absolute guarantee because the proportional differentiated service can provide adequate
and precise service differentiation without requiring artificial, fixed deadlines being
assigned to each service class. Therefore, a hybrid guarantee is desirable in some
systems. For example, a hybrid policy can be that the server provides proportional
differentiated service when the delay received by each class is within its tolerable
threshold. When the delay received by a high priority class exceeds its threshold, the
server automatically switches to the absolute guarantee model that enforces desired
delays for high priority classes at the cost of violating desired delays of low priority

114
classes. This policy can achieve the flexibility of the proportional differentiated service in
nominal overload and bound the delay of high priority class in severe overload
conditions.
In this chapter, we present a web server architecture to support delay guarantees
including the absolute guarantee, proportional differentiated service, and the hybrid
guarantee described above. A key challenge in guaranteeing service delays in a web
server is that resource allocation that achieves the desired delay or delay differentiation
depends on load conditions that are unknown a priori. A main contribution of this thesis
is the introduction of a feedback control architecture for adapting resource allocation such
that the desired delay differentiation between classes is achieved. We formulate the
adaptive resource allocation problem as one of feedback control and apply feedback
control theory to develop the resource allocation algorithm. We target our architecture
specifically for the HTTP 1.1 protocol [32], the most recent version of HTTP that has
been adopted at present by most web servers and browsers. As we show in this thesis,
persistent connections introduced by HTTP 1.1 give rise to peculiar server bottlenecks
that affect our choice of resource allocation mechanisms for delay differentiation. Hence,
our contributions can be summarized as follows:

An adaptive architecture for achieving relative and absolute service delay
guarantees in web servers under HTTP 1.1
Use of feedback control theory and methodology to design an adaptive connection
scheduler with proven performance guarantees. The design methodology
includes:

115
Using system identification to model web servers for purposes of
performance control,
Specifying performance requirements of web servers using control-based
metrics, and
Designing feedback controllers using the Root Locus method to satisfy the
performance specifications.
Developing a system identification methodology and software tool as a empirical
and practical modeling solution for computer systems with unknown or
complicated dynamics, which has been a major barrier for applying feedback
control in such systems.
Implementing the adaptive architecture by modifying an Apache web server on
top of a Linux platform.
Performance evaluation that demonstrates our adaptive architecture and
connection scheduling algorithms achieve robust service delay guarantees even
when the workload varies considerably.

The rest of this chapter is organized as follows. In Section 5.2, we briefly describe
how web servers (in particular those with the HTTP 1.1 protocol) operate. In Section 5.3,
we formally define the semantics of delay differentiation guarantees on web servers. The
design of the adaptive server architecture to provide delay guarantees is described in
Section 5.4. In Section 5.5, we apply feedback control theory to systematically design
feedback controllers to satisfy the desired performance of the web server. The
implementation of the architecture on an Apache web server and experimental results are

116
presented in Sections 5.6 and 5.7, respectively. We summarize this chapter in Section 5.8.
5.2. Background
The first step towards designing architectural components for service delay
differentiation is to understand how web servers operate. Web server software usually
adopts a multi-process or a multi-threaded model. Processes or threads can be either
created on demand or maintained in a pre-existing pool that awaits incoming TCP
connection requests to the server. The latter design alternative reduces service overhead
by avoiding dynamic process creation and termination - a very costly operation in an
operating system such as UNIX. Here, we assume a multi-process model with a pool of
processes, which is the model of the Apache server, the most commonly used web server
today [31].
In HTTP 1.0, each TCP connection carries a single HTTP request. This results in an
excessive number of concurrent TCP connections. To remedy this problem the current
version of HTTP, called HTTP 1.1 [32], reduces the number of concurrent TCP
connections with a mechanism called persistent connections, which allows multiple web
requests to reuse the same connection. An HTTP 1.1 client first sends a TCP connection
request to a web server. The request is stored in the listen queue of the server's well-
known port. Eventually, the request is dequeued and a TCP connection is established
between the client and one of the server processes. The client can then send HTTP
requests and receive responses over the established connection. The HTTP 1.1 protocol
requires that an established TCP connection be kept alive after a request is served in
anticipation of potential future requests. If no requests arrive on this connection within
TIMEOUT seconds, the connection is closed and the process responsible for it is returned

117
to the idle pool. Due to the increasing popularity of HTTP 1.1, we focus on the
implications of persistent connections on server performance, and present a resource
allocation mechanism for delay differentiation that specifically addresses the peculiarities
of this protocol.
Persistent connections generate a new type of resource bottleneck on the server. Since
a server process may be tied up with a persistent connection even after the request is
served, the CPU(s) can be under-utilized. One way to alleviate this problem is to increase
the number of server processes. However, too many processes can cause thrashing in
virtual memory systems [24] thus degrading server performance considerably. In
practice, an operating system specific limit is imposed on the number of concurrent
server processes. This limit is often the bottleneck in servers implementing HTTP 1.1.
Hence, while new connections suffer large queuing delays, requests arriving on existing
connections are served almost immediately by their dedicated processes
4
. We verify this
observation experimentally as described in Section 5.7.1. The observation is important as
it affects our choice of delay metrics and QoS enforcement architecture. In particular,
since CPU may not be the bottleneck resource, CPU scheduling/allocation is not
necessarily an effective mechanism to provide service differentiation in web servers
using HTTP 1.1. Instead, we develop a server process allocation mechanism (Section
5.4.1) to support service differentiation in such systems. Note that it is not the objective
of our current research to interfere with HTTP 1.1 semantics to improve CPU utilization.

4
Some new web servers (e.g., [58]) have a single-threaded and event driven architecture, which has no
limit on the number of connections that can be accepted besides limits on the maximum number of open
descriptors imposed by the OS. In this architecture, processing delay instead of the connection delay may
dominate the server response time.

118
5.3. Semantics of Service Delay Guarantees
Let the connection delay denote the time interval between the arrival of a TCP
connection (establishment) request and the time the connection is accepted (dequeued) by
a server process. Let the processing delay denote the time interval between the arrival of
an HTTP service request to the process responsible for the corresponding connection and
time the server completes transferring the response to the client.
Connection delay includes the queuing delay on the server's well known port. As
explained in the previous section, such queuing delay may be significant even when CPU
utilization on the server is not high due to lack of available server processes/threads.
Processing delay of requests on already established connections, on the other hand, is
much smaller because such requests do not get queued in the servers well-known port.
Hence, we focus on applying delay differentiation only to connection delays. Using
connection delays as the delay metric of choice is also desirable for another reason.
Besides being the dominant delay, it is also less of a function of client-side factors than
the processing delay. The processing delay is dominated by the time it takes to send the
response to the client which depends on TCP throughput. If the client is slow, TCP flow
control will reduce the response transfer rate accordingly. Since processing delay depends
on client speed, it is not an appropriate metric of server performance quality that is
attributable to the server.
Suppose every HTTP request belongs to a class k (0 k < N). The connection delay
C
k
(m) of class k at the m
th
sampling instant is defined as the average connection delay of
all established connections of class k within the ((m-1)S, mS) sec, where S is a constant
sampling period. Connection delay guarantees are defined as follows. For simplicity of

119
presentation, we use delay to refer to connection delay interchangeably in the rest of this
paper.

Relative Delay Guarantee: A desired relative delay W
k
is assigned to each class
k. A relative delay guarantee {W
k
| 0 k < N} requires that C
j
(m)/C
l
(m) = W
j
/W
l

for classes j and l (j l). For example, if class 0 has a desired relative delay of
1.0, and class 1 has a desired relative delay of 2.0, it is required that the
connection delay of class 0 should be half of that of class l.
Absolute Delay Guarantee: A desired (absolute) delay W
k
is assigned to each
class k. An absolute delay guarantee {W
k
| 0 k < N} requires that C
j
(m) W
j
for
classes j if a class l > j and C
l
(m) W
l
(a lower class number means a higher
priority). Note that since system load can grow arbitrarily high in a web server, it
is impossible to satisfy the desired delay of all service classes under overload
conditions. The absolute delay guarantee requires that all classes receive
satisfactory delay if the server is not overloaded; otherwise desired delays are
violated in the predefined priority order, i.e., low priority classes always suffer
guarantee violation earlier than high priority classes.

Based on the relative and absolute delay guarantees, different hybrid guarantees can
be composed for the specific requirements of the application. For example, the hybrid
guarantee described in Section 5.1 can be formulated as follows.

Hybrid Delay Guarantee: Each class k is assigned a value W
k
that represents
both its desired delay and its desired relative delay. The hybrid guarantee {W
k
| 0

120
k < N} provides the relative delay guarantees if the desired absolute delay of
every class is satisfied. When the server is severely overloaded and desired delays
cannot be provided to all classes, the hybrid guarantee provides absolute delay
guarantees to high priority classes at the cost of violating the delays of low
priority classes. This hybrid guarantee provides the flexibility of the proportional
differentiated service in nominal overload while bounding the delay of high
priority classes in severe overload conditions.
TCP listen queue
TCP connection
requests
Connection
Scheduler
HTTP response
Web
Server
Web
Server
Server
Process
monitor Controllers
{W
k
| 0 k < N}
{C
k
| 0 k < N}
{B
k
| 0 k < N}
HTTP service requests

Figure 5.1. The Feedback-Control Architecture for Delay Guarantees
5.4. A Feedback Control Architecture for Web Server QoS
In this section, we present an adaptive web server architecture (as illustrated in Figure
5.1 ) to provide the above delay guarantees. A key feature of this architecture is the use of
feedback control loops to enforce desired relative/absolute delays via dynamic
reallocation of server process. The architecture is composed of a Connection scheduler, a
Monitor, a Controller, and a fixed pool of server processes. We describe the design of the

121
components in the following subsections.
5.4.1. Connection Scheduler
The Connection Scheduler serves as an actuator to control the delays of different
classes. It listens to the well-known port and accepts every incoming TCP connection
request. The Connection Scheduler uses an adaptive proportional share policy to allocate
server processes to connections from different classes
5
. At every sampling instant m,
every class k (0 k < N) is assigned a process budget, B
k
(m), i.e., class k should be
allocated at most B
k
(m) server processes in the m
th
sampling period. For a system with
absolute delay guarantees (Section 5.4.4)), the total budgets of all classes may exceed the
total number of server processes, which is a condition called control saturation. In this
case, the process budgets are satisfied in the priority order until every process has been
allocated to a class. This policy means that the process budgets of high priority classes
are always satisfied before those of low priority classes, and thus the correct order of
guarantee violations can be achieved. For a server with relative delay guarantee, our
Relative Delay Controllers always guarantee that the total budget equals the total number
of processes (Section 5.4.4). For each class k, the Connection Scheduler maintains a
(FIFO) connection queue Q
k
and a process counter R
k
. The connection queue Q
k
holds
connections of class k before they are allocated server processes. The counter R
k
is the
number of processes allocated to class k. After an incoming connection is accepted, the
Connection Scheduler classifies the new connection and inserts the connection descriptor
to the scheduling queue corresponding to its class. Whenever a server process becomes

5
Note that the Connection Scheduler uses process allocation instead of CPU allocation to control the
delays of different classes. This is because processes may hold idle (persistent) connections and therefore

122
available, a connection at the front of a scheduling queue Q
k
is dispatched if class k has
the highest priority among all eligible classes {j| R
j
< B
j
(m)}.
For the above scheduling algorithm, a key issue is how to decide the process budgets
{B
k
| 0 k < N} to achieve the desired relative or absolute delays {W
k
| 0 k < N}. Note
that static mappings from the desired relative or absolute delay {W
k
| 0 k < N} to the
process budget {B
k
| 0 k < N} (e.g., based on system profiling) cannot work well when
the workloads are unpredictable and vary at run time (see performance results in Section
5.7.3). This problem motivates the use of feedback controllers to dynamically adjust the
process budgets {B
k
| 0 k < N} to maintain desired delays.
Because the Controller can dynamically change the process budgets, a situation can
occur when a class ks new process budget B
k
(m) (after the adjustment in saturation
conditions described above) exceeds the total number of free server processes and
processes already allocated to class k. Such class k is called an under-budget class. Two
different policies, preemptive vs. non-preemptive scheduling, can be supported in this
case. In the preemptive scheduling model, the Connection Scheduler immediately forces
server processes to close connections of over-budget classes whose new process budgets
are less than the number of processes currently allocated to them. In the non-preemptive
scheduling, the Connection Scheduler waits for server processes to voluntarily release
connections of over-budget classes before it allocates enough processes to under-budget
classes. The advantage of the preemptive model is that it is more responsive to the
Controllers input and load variations, but it can cause jittery delay in preempted classes
because they may have to re-establish connections with the server in the middle of

CPU is not necessarily the bottleneck resource under HTTP 1.1 protocols (as discussed in Section 5.2).

123
loading a web page. Only the non-preemptive scheduling is currently implemented in our
web server testbed.
5.4.2. Server Processes
The second component of the architecture (Figure 5.1) is a fixed pool of server
processes. Every server process reads connection descriptors from the connection
scheduler. Once a server process closes a TCP connection it notifies the connection
scheduler and becomes available to process new connections.
5.4.3. Monitor
The Monitor is invoked at each sampling instant m. It computes the average
connection delays {C
k
(m) | 0 k < N} of all classes during the last sampling period. The
sampled connection delays are used by the Controller to compute new process
proportions.
5.4.4. Controllers
The architecture uses one Controller for each relative or absolute delay constraint. At
each sampling instant m, the Controllers compare the sampled connection delays {C
k
(m) |
0 k < N} with the desired relative or absolute delays {W
k
| 0 k < N}, and computes
new process budgets {B
k
(m) | 0 k < N}
6
, which are used by the Connection Scheduler to
(non-preemptively) reallocate server processes during the following sampling period. We
first describe the Absolute Delay Controllers and the Relative Delay Controllers, and
briefly describe how to compose the Hybrid Delay Controllers based the Absolute and

6
It is the exact algorithm for this computation that control theory enables us to derive as described in the
remainder of this section and Section 5.5.

124
Relative Delay Controllers in the end of this section.
The Absolute Delay Controllers
The absolute delay of every class k is controlled by a separate Absolute Delay
Controller CA
k
. The key parameters and variables of CA
k
, are shown in Table 5.1.

Reference VS
k
The reference of an Absolute Delay Controller CA
k
is the desired delay
of class k, i.e., VS
k
= W
k
.
Output V
k
(m) From the Absolute Delay Controller CA
k
s perspective, the system
output V
k
(m) at the sampling instant m is the sampled delay of class k,
i.e., V
k
(m) = C
k
(m).
Error E
k
(m) The difference between the reference and the output, i.e., E
k
(m) = VS
k

V
k
(m).
Control input U
k
(m) At every sampling instant m, the Absolute Delay Controller CA
k

computes the control input U
k
(m), i.e., the process budget B
k-1
(m) of
class k.
Table 5.1. Variables and Parameters of the Absolute Delay Controller CA
k

The goal of the Absolute Delay Controller CA
k
is to reduce the error E
k
(m) to 0 and
achieve the desired delay for class k. Intuitively, when E
k
(m) = VS
k
V
k
(m) < 0, the
Controller should increase the process budget U
k
(m) = B
k
(m) to allocate more processes
to class k. At every sampling instant m, the Absolute Delay Controller calls PI
(Proportional-Integral) control [28] to compute the control input. A digital form of the PI
control function is
) ( )
1
1
), 1 ( ( )) 1 ( ) ( ( ) 1 ( ) (
) ( ) ) ( ) ( ( ) (
0
b
K
r K K g k rE k E g k U k U
a j E K k E K k U
I
I P k k k k
k
j
k I k P k
+
= + = + =
+ =

= (5.1)
where g and r (or K
P
and K
I
) are design parameters called the controller gain and the
controller zero, respectively. Equations 5.1(a) and 5.1(b) are equivalent with each other,

125
and Equation 5.1(b) is used in our implementation. Performance of the web server
depends on the values of the controller parameters. An ad hoc approach to design the
controller is to conduct laborious experiments on different values of the parameters.
Instead, we apply control theory to tune the parameters analytically to guarantee the
desired performance in the web server. The design and tuning methodology is presented
in Section 5.5.
For a system with N service classes, the Absolute Delay Guarantee is enforced by N
Absolute Delay Controllers CA
k
(0 k < N). At each sampling instant m, each Controller
CA
k
computes the process budget of class k. Note that in overload conditions, the process
budgets (especially those of low priority classes) computed by the Absolute Delay
Controllers may not be feasible if the sum of the computed process budgets of all classes
exceeds the total number of server processes M, i.e.,
j
P
k
(m) > M. This is a situation
called control saturation. Because low priority classes should suffer guarantee violation
in overload conditions, the system always satisfy the computed process budgets in the
decreasing order of priorities until every server process has been allocated to a class
7
.
The Relative Delay Controllers
The relative delay of every two adjacent classes k and k-1 is controlled by a separate
Relative Delay Controller CR
k
. Each Relative Delay Controller CR
k
, has following key
parameters and variables. For simplicity of discussion, we use the same notations for the
corresponding parameters and variables of the Absolute Delay Controller and the
Relative Delay Controllers.

7
To avoid complete starvation of low priority classes, the system may reserve a minimum number of server

126
Reference VS
k
The reference of the Relative Delay Controller CR
k
is the desired delay
ratio between class k and k-1, i.e., VS
k
= W
k
/W
k-1
.
Output V
k
(m) From the perspective of the Relative Delay Controller CR
k
, the system
output is the sampled delay ratio between class k and k-1, i.e., V
k
(m) =
C
k
(m) / C
k-1
(m).
Error E
k
(m) The difference between the reference and the output, E
k
(m) = VS
k

V
k
(m).
Control input
U
k
(m)
At every sampling instant m, CR
k
computes the control input U
k
(m)
defined as the ratio (called the process ratio) between the number of
processes to be allocated to class k-1 and k, U
k
(m) = B
k-1
(m) / B
k
(m).
Table 5.2. Variables and Parameters of the Relative Delay Controller CR
k

Intuitively, when E
k
(m) < 0, CR
k
should decrease the process ratio U
k
(m) to allocate
more processes to class k relative to class k-1. The goal of the controller CR
k
is to reduce
the error E
k
(m) to 0 and achieve the correct delay ratio between class k and k-1. Similar to
the Absolute Delay Controller, the Relative Delay Controller also uses PI (Proportional-
Integral) control (Equation (5.1)) to compute the control input (note that the parameters
and variables are interpreted differently in the Absolute Delay Controller and the Relative
Delay Controller).
For a system with N service classes, the Absolute Delay Guarantee is enforced by
N-1 relative Delay Controllers CR
k
(1 k < N). At every sampling instant m, the system
calculates the process budget B
k
(m) of each class k as follows.

control_relative_delay ({W
k
| 0 k < N}, {C
k
(m) | 0 k < N})
{
set class (N-1)s process proportion P
N-1
(m) = 1;
S = P
N-1
(m);
for ( k = N-2; k 0; k--) {
calls CR
k+1
to get the process ratio U
k+1
(m) between class k and k+1;
the process proportion of class k P
k
(m) = P
k+1
(m)U
k
(m)
S = S + P
k
(m);
}
for ( k = N-1; k 0; k--)

processes to each service class.

127
B
k
(m) = M (P
k
(m) / S)
}

The Hybrid Delay Controllers
The hybrid delay guarantee described in Section 5.3 can be implemented via dynamic
switching between the Absolute Delay Controllers and the Relative Delay Controllers.
The server switches from Relative Delay Controllers to Absolute Delay Controllers if the
absolute delay guarantee of the highest priority class is violated, i.e., C
0
(m) > W
0
+ H; On
the other hand, the server switches from Absolute Delay Controllers to Relative Delay
Controllers if C
0
(m) < W
0
- H. The use of a threshold window H in the mode switching
condition is to avoid thrashing between the two sets of Controllers. Since the hybrid
delay guarantee is a straightforward extension of absolute and relative delay guarantees,
we focus on the design and evaluation of absolute and relative delay guarantees in the
rest of this chapter.

In summary, we have presented a feedback control architecture to achieve absolute,
relative and hybrid delay guarantees on web servers. A key component in this
architecture is the Controllers, which are responsible of dynamically computing correct
process budgets in face of unpredictable workload and system variations. In the rest of
the paper, we use the closed-loop server to refer to the adaptive web server with the
Controllers ( ), while the open-loop server refers to a non-adaptive web server without
the Controllers. We present the design and tuning of the Controllers in the next section.
5.5. Design of the Controller
In this Section, we apply the FCS framework to design the Relative Delay Controller CR
k

128
and the Absolute Delay Controller CA
k
. In Section 5.5.1, we specify the performance
requirement of the Controllers. We then use system identification techniques to establish
dynamic models for the web server in Section 5.5.2. Based on the dynamic model, we use
the Root Locus method to design the Controllers that meet the performance specifications
(Section 5.5.3).
5.5.1. Performance Specifications
In this section, we use the performance specifications of the FCS framework to
characterize the performance requirement of our web server in terms of service delay
guarantees. The performance specifications of web server include in following:

Stability: a (BIBO) stable system should have bounded output in response to
bounded input. To the Relative Delay Controller (with a finite delay ratio
reference), stability requires that the delay ratio should always be bounded at run-
time. To the Absolute Delay Controller, stability requires that the service delay
should always be bounded at run-time. Stability is a necessary condition for
achieving desired relative or absolute delays.
Settling time T
s
is the time it takes the output to converge to within 2% of the
reference and enter steady state. The settling time represents the efficiency of the
Controller, i.e., how fast the server can converge to the desired relative or
absolute delay. As an example, we assume that our web server requires the
settling time T
s
< 5 min.
Steady state error E
s
is the difference between the reference input and average of
output in steady state. The steady state error represents the accuracy of the

129
Relative Delay Controller or the Absolute Delay Controller in achieving the
desired relative or absolute delay. As an example, we assume that our web server
requires a steady state error |E
s
| < 0.1V
S
. Note that satisfying this steady state error
requirement means that the web server can achieve the desired relative or absolute
delays in steady state.
5.5.2. Modeling the Web Server: A System Identification Approach
A dynamic model describes the mathematical relationship between the input and the
output of a system (usually with differential or difference equations). This dynamic
model provides a basis for the analytical design of the Controller. In Section 4.3, we
approximate the aggregate dynamics of a generic CPU bound real-time systems with an
intuitive first order model. Unfortunately, such an analytical modeling approach cannot
be easily applied to more computer systems with more complicated or unknown
dynamics such as web servers. We adopt an empirical approach by applying a system
identification approach [11] to estimate the model such systems based on system
profiling data.
In our system identification approach, the controlled computer system is
approximated as a linear model described as a difference equation with unknown
parameters. A system engineer or administrator first uses a workload generator to
stimulate the controlled system with pseudo-random digital white-noise input [61]. He
then can use a least squares estimator [11] to estimate the model parameters. This system
identification methodology provides a practical solution for the modeling computing
systems with unknown dynamics, which has been a major barrier for applying feedback
control in such systems. We have also developed a software tool (as illustrated in Figure

130
5.2) to facilitate the system identification of web servers.
We now apply system identification to establish a dynamic model for the controlled
web server system (including the Connection scheduler, the server processes, and the
Monitor) for the purpose of controlling relative/absolute delays. From the perspective of
a Relative Delay Controller CR
k
, the control input to the controlled system is the process
ratio U
k
(m) = B
k-1
(m)/B
k
(m). The output of the controlled system is the delay ratio V
k
(m) =
C
k
(m)/C
k-1
(m). From the perspective of an Absolute Delay Controller CA
k
, the (control)
input of the controlled system is the process budget U
k
(m) = B
k
(m). The output of the
controlled system is the delay V
k
(m) = C
k
(m). We intentionally use the same symbols for
input and output for Relative and Absolute Delay Controllers because the design
methodology described below applies to both cases. Assuming the controlled system
models for different classes are similar, we skip the class number k of U
k
(m) and V
k
(m) in
the rest of this Section. Our experimental results (Section 5.7.2) establishes that, for both
relative and absolute delay control, the controlled system can be modeled as a second
order difference equation with adequate accuracy for the purpose of control design. The
architecture for system identification is illustrated in Figure 5.2. We describe the
components of the architecture in the following subsections.

131
Web
Server
Web
Server
TCP listen queue
TCP connection
requests
Connection
Scheduler
HTTP response
Server
Process
Least squares
estimator
white-noise
generator
{ C
0
, C
1
}
{ B
0
, B
1
}
Model
parameters
monitor
HTTP service requests

Figure 5.2. Architecture for system identification
Model Structure
The web server is modeled as a difference equation with unknown parameters, i.e., an
n
th
order model can be described as follows,

= =
+ =
n
j
j
n
j
j
j m U b j m V a m V
1 1
) 2 . 5 ( ) ( ) ( ) (

In an n
th
order model, there are 2n parameters {a
j
, b
j
| 1 j n} that need to be
decided by the least-squares estimator. The difference equation model is motivated by the
fact that the output of an open-loop server depends on previous inputs and outputs.
Intuitively, the dynamics of a web server is due to the queuing of connections and the
non-preemptive scheduling mechanism. For example, the connection delay may depend
on the number of server processes allocated to its class in several previous sampling
periods. For another example, after class ks process budget is increased, the Connection
Scheduler has to wait for connections of other classes to voluntarily release server
processes to reclaim enough processes to class k.

132
White Noise Input
To stimulate the dynamics of the open-loop server, we use a pseudo-random digital
white noise generator to randomly switch two classes process budgets between two
configurations. White noise input has been commonly used for system identification [11].
The white noise algorithm because we use a standard algorithm [61].
Least Squares Estimator
The least squares estimator is the key component for system identification. In this
section, we review its mathematical formulation and describe its use to estimate the
model parameters. The derivation of estimator equations is given in [12]. The estimator is
invoked periodically for at every sampling instant. At the m
th
sampling instant, it takes as
input the current output V(m), n previous outputs V(m-j) (1 j n), and n previous
inputs U(m-j) (1 j n). The measured output V(m) is fit to the model described in
Equation (5.2). Define the vector q(m) = (V(m-1) V(m-n) U(m-1) U(m-n))
T
, and the
vector (m) = (a
1
(m)a
n
(m) b
1
(m) b
n
(m))
T
, i.e., the estimations of the model
parameters in Equation (5.2). These estimates are initialized to 1 at the start of the
estimation. Let R(m) be a square matrix whose initial value is set to a diagonal matrix
with the diagonal elements set to 10. The estimators equations at sampling instant m are
[11]:
) 5 . 5 ( )) 1 ( ) ( ) ( ) ( )( 1 ( ) (
) 4 . 5 ( )) 1 ( ) ( ) ( )( ( ) ( ) 1 ( ) 1 ( ) (
) 3 . 5 ( ) 1 ) ( ) 1 ( ) ( ( ) (
1
=
+ =
+ =

m R m q m m q I m R m R
m m q m V m m q m R m m
m q m R m q m
T
T
T

At any sampling instant, the estimator can predict a value V
p
(m) of the output by
substituting the current estimates (m) into Equation (5.2). The difference V(m)-V
p
(m)

133
between the measured output and the prediction is the estimation error. It was proved that
the least squares estimator iteratively updates the parameter estimates at each sampling
instant such that
0im
(V(i) - V
p
(m))
2
is minimized.
Our system identification results (Section 5.7.2) established that, the controlled
system can be modeled as a second order difference equation,
V(m) = a
1
V(m-1) + a
2
V(m-2) + b
1
U(m-1) + b
2
U(m-2) (5.6a)
In the case of relative delay control, V(m) denotes the delay ratio between the two
controlled classes, and U(m) denotes the process ratio between the two controlled classes,
and the estimated model parameters are (Section 5.7.2):
(a
1
, a
2
, b
1
, b
2
) = (0.74, -0.37, 0.95, -0.12) (5.6b)
In the case of absolute delay control, V(m) denotes the delay of one controlled class,
and U(m) denotes the process budget of the controlled class. The estimated model
parameters based on system identification experiments (Section 5.7.2) are
(a
1
, a
2
, b
1
, b
2
) = (-0.08, -0.2, -0.2, -0.05) (5.6c)
5.5.3. Root-Locus Design
Given a model described by Equation (5.6a), we can apply control theory to design
the Relative Delay Controller and the Absolute Delay Controller. The controlled system
model in Equation (5.6a) can be converted to a transfer function G(z) in z-domain
(Equation 7). The transfer function of the PI controller (Equation 1) in the z-domain is
Equation (5.8). Given the controlled system model and the Controller model, the transfer
function of the closed loop system is Equation (5.9).

134
) 9 . 5 (
) ( ) ( 1
) ( ) (
) (
) 8 . 5 (
1
) (
) (
) 7 . 5 (
) (
) (
) (
2 1
2
2 1
z G z D
z G z D
z G
z
r z g
z D
a z a z
b z b
z U
z V
z G
c
+
=
=

+
= =

We use the Root Locus [32] to tune the controller gain g and the controller zero r so
that the performance specs can be satisfied. We only summarize results of the design in
this thesis. The details of the design process can be found in control textbooks such as
[32].
To design the Relative Delay Controller, we use the Root Locus tool to plot the traces
of the closed loop poles (based on the model parameters in Equation (5.6b)) as the
controller gain increases are illustrated on the z-plane in Figure 5.3. The closed-loop
poles are placed at
p
0
= 0.70, p
1,2
= 0.380.62i (5.10a)
by setting the Relative Delay Controllers parameters to
g = 0.3, r = 0.05 (5.10b)
Similarly, to design the Relative Delay Controller (based on the model parameters in
Equation (5.6c)), the closed-loop poles are placed at
p
0
= 0.607, p
1,2
= -0.300.59i (5.11a)
by setting the Absolute Delay Controllers controller parameters to
g = -4.6, r = 0.3 (5.11b)
The above pole placement is chosen to achieve the following properties in the closed
loop system [32]:

135
Stability: The closed-loop system with the Relative Delay Controller (with
parameters in Equation (5.10b)) or the Absolute Delay Controller (with
parameters in Equation (5.11b)) guarantees stability because all the closed-loop
poles are in the unit circle, i.e., |p
j
| < 1 (0 j 2) (Equations (5.10a) and (5.11a)).
Settling time: According to control theory, decreasing the radius (i.e., the
distance to the origin in the z-plane) of the closed-loop poles usually results in
shorter settling time. The Relative Delay Controller (with Equation (5.10b))
achieves a settling time of 270 sec, and the Absolute Delay Controller (with
Equation (5.11b)) achieves a settling time of 210 sec, both lower than the required
settling time (300 sec) defined in Section 5.5.1.
Steady state error: Both the Relative Delay Controller and the Absolute Delay
Controller achieve zero steady state error, i.e., E
s
= 0. This result can be easily
proved using the Final Value Theorem in digital control theory [28]. This result
means that, in steady state, the closed-loop system with the Relative Delay
Controller or the Absolute Delay Controller guarantees the desired relative delays
or the desired absolute delays, respectively.

In summary, using feedback control theory techniques including system identification
and the Root Locus design, we systematically design the Relative Delay Controller and
the Absolute Delay Controller that analytically provide the desired relative or absolute
delay guarantee and meet the transient and steady state performance specifications
described in Section 5.5.1. This result further shows the strength of the control-theory-
based design framework for adaptive computing systems.

136
Root Locus Closed Loop Poles

Figure 5.3. The Root Locus of the web server model
5.6. Implementation
We now describe the implementation of the web server. We modified the source code
of Apache 1.3.9 [11] and implemented a new library as a Connection Manager (including
the Connection Scheduler, the Monitor and the Controllers). The server was written in C
on a Linux platform. The server is composed of a Connection Manager process and a
fixed pool of server processes (modified from Apache). The Connection Manager process
communicates with each server process with a separate UNIX domain socket.

The Connection Manager runs a loop that listens to the web servers TCP socket
and accepts incoming TCP connection requests. Each connection request is
classified based on its senders IP address
8
and scheduled by a Connection
Scheduler function. The Connection Scheduler dispatches a connection by
sending its descriptor to a free server process through the corresponding UNIX

137
domain socket. The Connection Scheduler time-stamps the acceptance and
dispatching of each connection. The difference between the acceptance and the
dispatching time is recorded as the connection delay of the connection. Strictly
speaking, the connection delay should also include the queuing time in the TCP
listen queue in the kernel. However, the kernel delay is negligible in this case
because the Connection Manager always greedily accepts (dequeues) all incoming
connection requests in a tight loop.
The Monitor and the Controllers are invoked periodically at every sampling
instance. For each invocation, the Monitor computes the average delay for each
class. This information is then passed to the Controllers, which implements the
control algorithm to compute new process budgets.
We modified the code of the Apache server processes so that they accept
connection descriptors from UNIX domain sockets (instead of common TCP
listen socket as in the original Apache server). When a server process closes a
connection, it notifies the Connection Manager of its new status by sending a byte
of data to the Connection Manager through the UNIX domain socket.

The server can be configured to a closed-loop/open-loop server by turning on/off the
Controllers. An open-loop server can be configured for either system identification or
performance evaluation.

8
Other classification criteria include cookies, browser plug-ins, request type/path, and virtual servers [17].

138
5.7. Experimentation
All experiments were conducted on a testbed of fives PCs connected with 100 Mbps
Ethernet. Each machine had a 450MHz AMD K6-2 processor and 256 MB RAM. One
machine was used to run the web server with HTTP 1.1, and up to four other machines
were used to run clients that stress the server with a synthetic workload. The
experimental setup was as follows.

Client: We used SURGE [14] to generate realistic web workloads in our
experiments. SURGE uses a number of user equivalents (also called users for
simplicity) to emulate the behavior of real-world clients. The load on the server can
be adjusted by changing the number of users on the client machines. Up to 500
concurrent users were used in our experiments.
Server: The total number of server processes was configured to 128. Since service
differentiation is most necessary when the server is overloaded, we set up the
experiment such that the ratio between the number of users and the number of server
processes could drive the server to overload. Note that although large web servers
such as on-line trading servers usually have more server processes, they also tend to
have many more users than the workload we generated. Therefore, our configuration
can be viewed an emulation of real-world overload scenarios at a smaller scale. The
sampling period S was set to 30 sec in all the experiments. The connection
TIMEOUT of HTTP 1.1 was set to 15 sec.

In Section 5.7.1, we present experimental results that compare connection delays with

139
response time of a server with HTTP 1.1. The experiments on system identification are
presented in Section 5.7.2. We present the evaluation of the closed-loop server in Section
5.7.3.
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
0 200 400 600
Number of Users
T
i
m
e

(
s
e
c
)
Connection Delay Response Time

Figure 5.4. Connection delay and response time
5.7.1. Comparing Connection Delays and Response Times
In the first set of experiments, we compare the average connection delay and the
average response time (per HTTP request) of an open-loop server (see Figure 5.4) to
justify the use of connection delay as a metric for service differentiation in web servers
with HTTP 1.1. All connections are treated as being in a same class and all server
processes are allocated to the class. Every point in Figure 5.4 refers to the average
connection delay or average response time in four 10-minute runs with a same number of
users. The 90% confidence intervals are within 0.58 sec to all the presented average
connection delays, and within 0.21 sec to all the presented average response times. The
connection delay is significantly higher and increases at a much faster rate than the
response time as the number of users increases. For example, when the number of users is
400, the connection delay is 4.9 times the response time. Note that the average response
time is computed based on two types of requests, i.e., the response time (including the

140
connection delay and the processing delay) of the first request of each connection and the
response time (including only the processing time) of each subsequent request. The
difference between connection delay and response time is due to the fact that processing
delay is on average significantly shorter than connection delay. We also run similar
experiments with 256 server processes (the maximum number allowed by the original
Apache on Linux). With 256 server processes, the ratio between the connection delay and
the response time is similar to that presented in Figure 5.4. For example, the connection
delay was 5.3 times the response time when 400 users are used. This result justifies our
decision to use connection delay as a metric for service differentiation in web servers
with HTTP 1.1.

141
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
-1
0
1
E
s
t
i
m
a
t
e
d

p
a
r
a
m
e
t
e
r
s
(a) Estimated model parameters (second order model)
a1
a2
b1
b2
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
0
2
4
6
8
D
e
l
a
y

R
a
t
i
o
(b) Modeling error (second-order model)
actual
estimate
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
0
2
4
6
8
D
e
l
a
y

R
a
t
i
o
(c) Modeling error (first-order model)
actual
estimate
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
0
2
4
6
8
D
e
l
a
y

R
a
t
i
o
(d) Modeling error (third-order model)
actual
estimate

Figure 5.5. System identification results for Relative Delay
5.7.2. System Identification
We now present the results of system identification experiments for both relative
delay and absolute delay to establish a dynamic model for the open-loop system. Four
client machines are divided into two classes 0 and 1, and each class has 200 users. We
begin with the relative delay experiments. The input, process ratio U(m) = B
0
(m)/B
1
(m), is
initialized to 1. At each sampling instant, the white noise randomly sets the process ratio
to 3 or 1. The sampled output, the relative delay V(m) = C
1
(m)/C
0
(m) is fed to the least
squares estimator to estimate model parameters (Equation (5.2)). Figure 5.5(a) shows that
the estimated parameters of a second order model (Equation (5.6)) at successive sampling

142
instants in a 30 min run. The estimator and the white noise generator are turned on 2 min
after SURGE started in order to avoid its start-up phase. We can see that the estimations
of the parameters (a
1
, a
1
, b
1
, b
2
) converge to (0.74, -0.37, 0.95, -0.12). Substituting the
estimations into Equation (5.6), we established an estimated second-order model for the
open-loop server. To verify the accuracy of the model, we re-run the experiment with a
different white noise input (i.e., with a different random seed) to the open-loop server and
compare the actual delay ratio and that predicted by the estimated model. The result is
illustrated in Figure 5.5(b). We can see that prediction of the estimated model is
consistent with the actual relative delay throughout the 30 min run. This result shows that
the estimated second order model is adequate for designing the Relative Delay
Controller. We also re-ran the system identification experiments to estimate a first order
model and a third order model. The results demonstrate that the estimated first order
model had larger prediction error than the second order model (see Figure 5.5(c)), while
an estimated third order model does not tangibly improve the modeling accuracy (see
Figure 5.5(d)). Hence the second order model is chosen as the best compromise between
accuracy and complexity.
The system identification experiments are repeated with the same workload and
configurations for the absolute delay. The input of the open loop system is the process
budget U(m) = B
0
(m) of class 0, which is initialized to 64. At each sampling instant, the
white noise randomly sets the process budget to 96 or 64. The output is the sampled delay
V(m) = C
0
(m) of class 0. To linearize the model, we feed the difference between two
consecutive inputs (B
0
(m) - B
0
(m-1)) and the difference between two consecutive outputs
(C
0
(m) - C
0
(m-1)) to the least squares estimator to estimate the model parameters in

143
Equation (5.2). Figure 5.6(a) shows that the estimated parameters of the second order
model (Equation (5.6)) at successive sampling instants in a 30 min run. The estimations
of the parameters (a
1
, a
1
, b
1
, b
2
) converge to (-0.08, -0.2, -0.2, -0.05). To verify the accuracy
of the model, we re-run the experiment with a different white noise input to the open-loop
server and compare the actual difference between two consecutive delay samples with
that predicted by the estimated model (Figure 5.6(b)). Similar to the relative delay case,
the prediction of the estimated model is consistent with the actual delay throughout the 30
min run. This result shows that the estimated second order model is adequate for
designing the Absolute Delay Controller.

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
-0.5
0.0
0.5
E
s
t
i
m
a
t
e
d

m
o
d
e
l

p
a
r
a
m
e
t
e
r
s
(a) Estimated model parameters (second-order model)
a1
a2
b1
b2
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
-20
-10
0
10
20
C
0
(
m
)
-
C
0
(
m
-
1
)

(
s
e
c
o
n
d
)
Modeling error (second-order model)
actual
estimate

Figure 5.6. System Identification Results for Absolute Delay
5.7.3. Evaluation of the Adaptive Web Server
In this section, we present evaluation results for our adaptive web server. We first
present the evaluation results of the Relative Delay Controller. We then present the
results for guaranteeing the relative delays of a server with three classes. The evaluation
results of absolute delay guarantee are presented in the end of this section.

144
Evaluation of Relative Delay Guarantees between Two Classes
To evaluate the relative delay guarantee in a server with two classes, we set up the
experiments as follows.

Workload: Four client machines are evenly divided into two classes. Each client
machine has 100 users. In the first half of each run, only one client machine from
class 0 and two client machines from class 1 (100 users from class 0 and 200
users from class 1) generate HTTP requests to the server. The second machine
from class 0 starts generating HTTP requests 870 sec later than the other three
machines. Therefore, the user population changes to 200 from class 0 and 200
from class 1 in the latter half of each run.
Closed-loop server: The reference input (the desired delay ratio between class 1
and 0) to the Controller is W
1
/W
0
= 3. The process ratio B
0
(m)/B
1
(m) is initialized
to 1 in the beginning of the experiments. To avoid the starting phase of SURGE,
the Controller is turned on 150 sec after SURGE started. The sampled absolute
connection delays and the delay ratio between the two classes are illustrated in
Figure 5.7(a) and (b), respectively.
Open-loop server: An open-loop server is also tested as a baseline. The open-
loop server is fine-tuned to have a correct process allocation based on profiling
experiments using the original workload (100 class 0 users and 200 class 1 users).
The results of the open-loop server are illustrated in Figure 5.7(c)(d).

We first look at the first half of the experiment on the closed-loop server (Figure 5.7

145
(a)(b)). When the Controller is turned on at 150 sec, the delay ratio C
1
(m)/C
0
(m) = (28.5
sec / 6.5 sec) = 4.4 due to incorrect process allocation. The Controller dynamically
reallocates processes and changes the relative delay to the vicinity of the reference W
1
/W
0

= 3. The relative delay stays close (within 10%) to the reference at most sampling instants
after it converged. This demonstrates that the closed-loop server can guarantee the
desired relative delay. Compared with an open-loop server, a key advantage of a closed-
loop server is that it can maintain robust relative delay guarantees when workload varies.
Robust performance guarantees are especially important in web servers, which often face
with unpredictable and bursty workload [26]. The robustness of our closed-loop server is
demonstrated by its response to the load variation starting at 870 sec (Figure 5.7(a)(b)).
Because the number of users of class 0 suddenly increases from 100 to 200, the delay
ratio drops from 3.2 (at 870 sec) to 1.2 (at 900 sec) - far below the reference W
1
/W
0
= 3.
The Controller reacts to load variation by allocating more processes to class 0 while
deallocating processes from class 1. By time 1140 sec, the relative delay successfully re-
converges to 2.9.
In contrast, while the open-loop server achieves satisfactory relative delays when the
workload conforms to its expectation (from 150 sec to 900 sec), it violates the relative
delay guarantee after the workload changes (see Figure 5.7(c)(d)). After the workload
changes (from 960 sec to the end of the run), connections from class 0 consistently have
longer delays than connections from class 1.
In terms of the control metrics, the closed-loop server maintains stability because its
relative delay is clearly bounded throughout the run. We observe from (Figure 5.7(b))
that the server renders satisfactory efficiency and accuracy in achieving the desired

146
relative delays. In particular, in response to the workload variation at time 870 sec, the
duration of the distinguishable performance deviation from the reference lasts for 180 sec
(from 900 sec to 1080 sec), well within the theoretical settling time of 270 sec based on
our design (Section 5.5.3). The delay ratio stays close to the reference in steady state,
which demonstrates a small steady state error
9
.
0 500 1000 1500
Time (second)
0
20
40
60
C
o
n
n
e
c
t
i
o
n

D
e
l
a
y

(
s
e
c
o
n
d
)
(a) Close-loop: Connection Delays C(0) and C(1)
Class 0
Class 1
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
0
1
2
3
4
5
D
e
l
a
y

R
a
t
i
o

C
1
(
m
)
/
C
0
(
m
)
(b) Close-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))
reference
Delay Ratio
Process Ratio
0 500 1000 1500
Time (second)
0
20
40
60
C
o
n
n
e
c
t
i
o
n

D
e
l
a
y

(
s
e
c
o
n
d
)
(c) Open-loop: Connection Delays C(0) and C(1)
Class 0
Class 1
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Time (second)
0
1
2
3
4
5
D
e
l
a
y

R
a
t
i
o

C
1
(
m
)
/
C
0
(
m
)
(d) Open-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))
reference
Delay Ratio
Process Ratio

Figure 5.7. Evaluation Results of Relative Delay Guarantees between Two Classes
Evaluation of a Server with Three Classes
In the next experiment, we evaluate the performance of a closed-loop server with

9
Due to the noise of the server caused by the random workload, it is impossible to precisely quantify the
settling time and steady state error based on the ideal definitions (Section 5.5.1).

147
three classes. Each class has a client machine with 100 users. The Controller is turned on
at 150 sec. The desired relative delays are (W
0
, W
1
, W
2
) = (1, 2, 4). The process
proportions are initialized to (P
0
, P
1
, P
2
) = (1, 1, 1). From Figure 5.8, we can see that the
connection delay begin at (C
0
, C
1
, C
2
) = (14.6, 17.3, 17.5) which has the ratio (1, 1.2,
1.2), and then changes to (C
0
, C
1
, C
2
) = (9.3, 16.2, 33.9) which has the ratio (1, 1.7, 3.6),
i.e., close to the desired relative delay, 240 sec after the Controller is turned on. The
relative connection delay remains bounded and close to the desired relative delay in
steady state. This experiment demonstrates the Relative Controllers can guarantee desired
relative delays for more than two classes.

0 500 1000 1500
Time (second)
0
20
40
60
C
o
n
n
e
c
t
i
o
n

D
e
l
a
y

(
s
e
c
o
n
d
)
Class 0
Class 1
Class 2

Figure 5.8. Evaluation Results of Relative Delay Guarantees for Three Classes

Evaluation of Absolute Delay Guarantees
In this section, we evaluate the absolute delay guarantee for two classes. The
experiment is set up as follows.

Workload: The same workload as in the experiments for relative guarantee is
used to evaluate the absolute guarantees. In the first half of each run, 100 users

148
from class 0 and 200 users from class 1 generate HTTP requests to the server.
Another 100 users from class 0 start generating HTTP requests 870 sec later than
the original users. Thus the user population changes to 200 from class 0 and 200
from class 1 in the latter half of each run.
Closed-loop server: The reference input (the desired delays for class 1 and 0) to
the Controller is (W
0
, W
1
) = (10, 30) (sec). The process budgets (B
0
(m), B
1
(m)) are
initialized to 64 for each class in the beginning of the experiments. To avoid the
start up phase of SURGE, the Controller is turned on 150 sec after SURGE
started. The sampled absolute connection delays of the two classes are illustrated
in Figure 5.9(a).
Open-loop server: An open-loop server is tested as a baseline. The open-loop
server is fine-tuned to have a correct process allocation to achieve the desired
absolute delays based on profiling experiments using the original workload (100
class 0 users and 200 class 1 users). The results of the open-loop server are
illustrated in Figure 5.9(b).

In the first half of the experiment on the closed-loop server (Figure 5.9(a)), the
Controllers dynamically allocate processes and the delays of both classes remain close to
their desired delay (10 sec and 30 sec, respectively). At time 870 sec, the number of users
of class 0 suddenly increases from 100 to 200, and the delay of class 0 increases from 8.4
sec (at time 870 sec) to 20.0 sec (at time 900 sec) violating its absolute delay guarantee
(10 sec). The Controllers react to the load variation by allocating more processes to class
0 and decreasing the number of processes allocated to class 1. By time 1020 sec, the

149
delay of class 0 successfully re-converges to 9.6 sec at the cost of violating the delay
guarantee of the low priority class (class 1)
10
.
In comparison, while the open-loop server achieves satisfactory delays for both
classes when the workload is similar to its expectation (from 150 sec to 900 sec), it fails
to provide delay guarantee for class 0 with the highest priority, after the workload
changes (see Figure 5.9(b)). Instead, connections from class 0 consistently have longer
delays than connections from class 1 after the workload changes, i.e., the open-loop
server fails to achieve the desired delay for the high priority class.
Note that while both the open loop server and the closed loop server violate the delay
guarantee of one service class, the closed loop server provides the correct order of
guarantee violation by discriminating against the low priority class, while the open loop
server fails to achieve the correct order. In terms of control metrics, the unsaturated (high
priority class) controller maintains stability because its delay is clearly bounded
throughout the run. Note that because the system load can grow arbitrarily, Absolute
Delay Controllers (especially those of low priority classes) can saturate and becomes
unstable in overload conditions even if it is tuned correctly. We observe from (Figure
5.9(a)) that the server renders satisfactory efficiency and accuracy in achieving the
desired delay for the high priority class (class 0). In particular, in response to the
workload variation at time 870 sec, the duration of the distinguishable performance
deviation from the reference lasts for 60 sec (from 930 sec to 990 sec), well within the
theoretical settling time of 210 sec based on the control design (Section 5.5.3). The delay

10
The low priority class suffers long delay in the second half of the experiment. This is because that the
server devotes most processes to the high priority class to enforce its absolute delay guarantee and
consequently starves low priority classes.

150
of class 0 stays close to the reference in steady state, which demonstrates a small steady
state error for high priority class, i.e., the desired delay of the high priority class is
guaranteed in steady state even when the server is severely overloaded.

0 500 1000 1500
Time (second)
10
30
0
50
100
150
200
250
L
a
t
e
n
c
y

(
s
e
c
o
n
d
)
(a) Connection Delays of the Closed Loop Server
Class 0
Class 1
0 500 1000 1500
Time (second)
0
50
100
150
200
250
L
a
t
e
n
c
y

(
s
e
c
o
n
d
)
(b) Connection Delays of the Open Loop Server
Class 0
Class 1

Figure 5.9. Evaluation of Absolute Delay Guarantees
In summary, our evaluation results demonstrate that the closed-loop server provides
robust relative and absolute delay guarantees even when workload significantly varied.
Properties of our adaptive web server also include guaranteed stability, satisfactory
efficiency and accuracy in achieving desired delay or relative delay differentiation. The
experimental results are also consistent with our theoretical analysis, which verifies the
correctness of our design methodology and dynamic system model for real-time systems.
5.8. Summary
We apply the FCS framework to the develop an adaptive architecture to provide

151
relative, absolute and hybrid service delay guarantees for different service classes on web
servers under HTTP 1.1. The first contribution of this work is the architecture based on
feedback control loops that enforce delay guarantees for different classes via dynamic
connection scheduling and process reallocation. The second contribution is our use of
feedback control theory to design the feedback loop with proven performance guarantees.
In contrast with ad hoc approaches that often rely on laborious tuning and design
iterations, our control theory approach enables us to systematically design an adaptive
web server with established analytical methods. The design methodology includes using
system identification to establish dynamic models for a web server, and using the Root
Locus method to design feedback controllers to satisfy performance specifications. The
adaptive architecture has been implemented by modifying an Apache web server.
Experimental results demonstrate that our adaptive server provides robust delay
guarantees even when workload varies significantly. Properties of our adaptive web
server also include guaranteed stability, and satisfactory efficiency and accuracy in
achieving desired delay or delay differentiation. In the future, we will extend our
architecture to web server farms. We are also interested to achieve service delay
guarantees in web servers supporting dynamic contents (e.g., database queries and media
streaming) where feedback control scheduling of multiple resources (CPU, memory, and
storage) may be necessary to handle different run-time conditions.

152

Chapter 6
Online Data Migration in Storage
Systems
11

6.1. Introduction and Motivations
The storage requirements of enterprise-scale computing systems are currently
increasing at a very fast pace. Taking into account only online data (excluding tapes,
optical disks, and other tertiary storage media), storage system capacity has been
doubling in size every six to twelve months [60]. Current enterprise systems store up to
tens of terabytes in tens to hundreds of disk arrays and enclosures, interconnected by
storage area networks (SAN) such as Fibre Channel [2] or Gigabit Ethernet [1]. In many
cases, large data sets are spread over geographically distributed locations, with some
degree of replication for failure and disaster recovery.
It is extremely difficult to make good data placement decisions at this level of
complexity. Ideally, data should be close in terms of low latency and high effective

11
The work presented in this chapter was done when the author was a research intern at HP Labs.

153
bandwidth to the applications using it, while continuing to provide the required level of
reliability. More importantly, even if a data placement is initially adequate at some point
in the life of the system, it may become inadequate on short notice. New devices added
to the system need to be populated with data in order to balance the load; symmetrically,
some devices may be taken offline for repairs or due to obsolescence. Failures may occur
in the system without advance warning; even if the level of redundancy remains high
enough to prevent data loss, the performance in degraded mode may be unacceptable, and
the previous level of redundancy may need to be re-established by creating more replicas
of critical data. Finally, the performance achievable with a given data placement depends
on how the data is accessed, and by whom. Access patterns may change because of
gradual trends (e.g. more customers of a company), or seasonal variations (e.g. load
spikes for e-tailers before the holidays), or periodic application characteristics (e.g. the
west coast branch of a multinational opens shortly after the European offices have closed
for the day). The placement of data onto storage devices may change many times during
the lifetime of a system. We consider backup as a particular case of migration, in which
the original copy is not erased; keeping online backups of critical data to minimize
switchover and recovery times is a widely followed practice in large installations.
In this chapter, we address the problem of migrating data in a storage system on-line,
i.e. the data being migrated is concurrently being accessed by applications running on the
storage system. Some existing solutions work offline by interrupting the systems
operation while migration occurs. This has the benefit that it makes it easy to guarantee
that customer data will remain consistent, as the presence of non-coordinated concurrent
readers and writers could otherwise result in data corruption. However, global enterprises

154
such as distributed data centers and multinational corporations need to access their data
around the clock because of the planetary scale of their operations. The business costs of
bringing down their systems are unacceptable. Some other existing solutions take the
middle road, selectively blocking accesses to the subset of the data being migrated. The
drawback is that client applications may not be able to cope with the increased delays
(e.g., because of timeouts built into their code), and that the performance degradation can
be substantial, both because of some data being unavailable and because of the contention
for system resources between the migration task and the client applications. Some
systems have quiet periods during which operations can cease, or at least degraded
performance does not have a major impact; but many systems are in use all the time, and
even systems that are periodically quiescent will seldom be able to tolerate an arbitrary
degradation in performance because of a data migration caused by an unforeseen event
during business hours. It is not acceptable for data migration to arbitrarily disrupt the
quality of service provided to applications executing in parallel. If migration were
allowed to go unchecked, customers could see a significant slow-down because the
storage, host, and networking subsystems are busy relocating data.
Realistic applications in practical systems must satisfy quality-of-service (QoS)
requirements. Examples of these requirements include performance (throughput,
bandwidth, latency) and dependability (availability, reliability). In general, on-line data
migration should satisfy two conflicting requirements:

Performance isolation: Applications should not see persistent QoS violations
during data migration, i.e., on-line data migration should be transparent to
applications in terms of performance.

155
Efficiency: Data migration should be completed fast under the constraint of
achieving performance isolation.
We assume that migration should use as much as possible of the available system
resources left by applications (or, equivalently, we want to satisfy both requirements
while keeping QoS violations to a minimum). In general, this is only one possible design
point: the right equilibrium between the two requirements depends on the individual
needs of each storage system. Some may need the migration to be completed as fast as
possible even if application performance is impacted, while others may place a greater
emphasis on the QoS guarantees.
We present a novel approach to migrate data in storage systems. The main
contribution of this work is Aqueduct, an adaptive architecture that, based on periodic
measurements of the current performance of the applications, uses a feedback loop to
dynamically adjust the speed of data migration. Aqueduct completes a migration
efficiently while avoiding QoS violations in running applications. So far, research has
concentrated on minimizing the total backup/migration time, whereas our adaptive
approach may proceed slowly, but guarantees that QoS requirements will not be violated.
Our feedback loop has been designed systematically, following well-established concepts
and methodology in control theory as opposed to hand-tuned tuned heuristics for a
particular system. Another contribution of this work is a performance analysis of the
speed of data migration and its impact on concurrent applications on a networked storage
system testbed. Our performance evaluation showed that Aqueduct provides guarantees
desired application iops (number of I/Os per second) for all devices during data
migration.

156
In Section 6.2, we present the design of the Aqueduct architecture. Section 6.3
describes the design and analysis of the control part of the Aqueduct feedback loop. The
implementation details are described in Section 6.4. In Section 6.5, we present the
evaluation experiments and the results. In Section 6.6, we conclude the chapter with a
summary and future works.
6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration
We develop a feedback-control-based migration executor that dynamically converges
to the correct migration speed using on-line trial-and-error based on performance
feedback. A data migration management subsystem is composed of a migration planner
and a migration executor. Given an initial and a goal store assignment, the migration
planner computes a migration plan composed of a sequence of moves, and the migration
executor moves data across devices to complete the migration plan. Excessive migration-
penalties are unacceptable for applications with QoS requirements. Due to the
uncertainties in storage systems and the fact that existing workload models allow
fluctuations on arbitrary time scales [35], it is difficult to a priori predict the "correct"
speed of data migration that will not cause QoS violations and not too pessimistic at the
same time. In this thesis, we present Aqueduct, a feedback-control architecture for the
migration executor.
6.2.1. Migration Planner
The migration planner generates a migration plan as an input to the migration
executor and triggers the execution of data migration. A migration plan is composed of a
set of partially ordered moves. Each move is composed of the name of a store object to be

157
moved, the source device, the destination device, and the dependencies on other moves
representing the precedence constraints among moves. A store object represents a logical
entity of storage such as all the data of an e-mail server, a database, or the /usr directory
of a file system. The partial order may be due to the capacity constraints on the devices.
For example, the following example describes a move planmove10 that moves store
planTest_item14 from device c10t0d0 to c10t2d0, and three other moves must occur
before this move.
move planmove10 {
{store planTest_item14}
{source /dev/dsk/c10t0d0}
{destination /dev/dsk/c10t2d0}
{dependencies { planmove4 planmove15 planmove5 }}
}

Since the migration plan is only partially ordered, it is possible to conduct several
eligible moves in parallel. The current Aqueduct prototype (Section 6.4) only conducts
moves sequentially.
6.2.2. LV Mover
The LV Mover is a mechanism to move a logical volume from its current device to a
destination device. The LV Mover is implemented on top of the LVM (Logical Volume
Manager) [56] of HP-UX. When the LV Mover is invoked, the LV Mover first creates a
mirror of the logical volume on the destination device, and then splits the two copies with
the mirror as the logical volumes master copy on the new device. The underlying LVM
guarantees the consistency of the data in the logical volume while it is being moved. The
LV Mover maintains the logical consistency of data being moved.
However, the LV Mover does not have control over the speed of each move. Instead,
the Actuator modulates the migration speed by enforcing an idle interval between

158
subsequent invocations of the LV Mover (see Section 6.2.7). Each store is divided into
small logical volumes called substores with a fixed size of S
sub
MB so that the speed of
data migration can be controlled at a fine granularity. We call the move of a substore a
submove, and each move in a migration plan was executed in a sequence of submoves
until the whole store is moved. The sleep time between the end of a submove and the start
of the next submove is called the inter-submove-time.
6.2.3. QoS guarantees
Two orthogonal issues need to be addressed on QoS specifications for applications on
a storage system: What QoS metrics should be guaranteed? At what granularity should
guarantees be provided?
QoS metric
The QoS metrics for storage systems include latency, iops (number of completed
I/Os per second) and bandwidth [19]. Ideally, guarantees should be provided in all the
required metrics, which may require a multi-input-multi-output control solution [55]. In
this thesis, we design the Aqueduct to provide guarantees only on iops as a first step
toward the full set of performance guarantees.
Granularity of QoS guarantees
QoS guarantees may be achieved at three different granularities.
Stream QoS guarantees give the finest granularity of QoS control. However, an
individual stream tends to render noisy behavior. Because stream-level
monitoring and control overheads are proportional to the number of streams,
stream-level guarantee may not scale well in enterprise storage systems with large

159
number of I/O streams.
Global QoS guarantee refers to a guarantee on aggregated performance of all
streams in a storage system. The aggregated behavior of large number of I/O
stream tends to be less noisy and easier to control than each individual stream.
The global QoS guarantee also avoids the scalability problem of the stream
guarantee. However, the global guarantee does not guarantee satisfactory
performance for each individual I/O stream. Our experimental results (Section
6.5.2) show that some devices can suffer especially severe performance
degradation even when the global aggregated performance specification may be
satisfied.
Device QoS guarantee is a trade-off between the stream and global QoS
guarantees. A device may be a single disk in arrays composed of independent
disks (JBOD). In a RAID, a device may be a LUN, or a Logical Unit, which are a
set of disks bound together using a layout such as RAID 1/0 or RAID 5, and
addressed as a single entity. With the device QoS guarantee, the aggregated
performance on each device achieves its QoS spec. Since the aggregated
performance of streams on each victim device is guaranteed, the probability of
stream-level QoS violation is lower than the global guarantee scheme. The device
QoS guarantee scheme can also scale better than the stream QoS guarantee
because its monitoring and control overhead is proportional to the number of
devices, which is usually much smaller than the number of streams.
Aqueduct provides the device QoS guarantee. The guaranteed device iops {IS
i
| 0 i
< N}, where N is the number of devices in the system, is listed in a contract file as an

160
input to Aqueduct. In the common case where all the devices (e.g., all the disks in a
JBOD or all the LUNs in a RAID) are similar, the guaranteed device iops may be the
same. In our FCS framework, IS
i
is also called the performance reference of device i. The
contract may be directly specified by the system administrators or derived from
application requirements on each device. Note that the device iops only includes the I/Os
performed on be half of the applications. The I/Os for data migration are not considered
as part of the device iops.

Controller
Actuator
Planner
Migration
Plan
Monitor
{IS
i
}
{I
i
(k)}
R
m
(k)
Disk Arrays
Goal
Assignment
Initial
Assignment
LV Mover
LUN
I/O Streams
Applications
Contract

Figure 6.1. Aqueduct: The Feedback Control Architecture for Data Migration
6.2.4. The Feedback Control Loop
Aqueduct (illustrated in Figure 6.1) features a feedback control loop that is invoked at
every sampling instant kW where W is a constant sampling window.
1) The Monitor periodically samples the average device iops {I
i
(k) | 0 i < N} in the
last sampling period ((k-1)W, kW).
2) The Controller compares the sampled iops {I
i
(k) | 0 i < N} with the performance
references {IS
i
(k) | 0 i < N}, and computes a control input, the new speed of data

161
migration, in the next sampling period (kW, (k+1)W). Intuitively, Aqueduct
should slow down data migration when some devices iops are lower than their
corresponding performance references; and speed up data migration when all the
devices perform better than their performance references. The Controller
quantifies the mapping from {I
i
(k) | 0 i < N} to the control input so that the
migration efficiently converges to the correct speed and avoid excessive
oscillations.
3) The Actuator moves data according to a migration plan while enforcing the
migration speed according the control input.
We now present the details of the major components of the feedback control loop in
the following sections.
6.2.5. The Monitor
At every sampling instant k, the Monitor collects the average iops of each device
{I
i
(k) | 0 i < N} in the last sampling period and feeds it to the Controller. The Monitor
may be implemented on top of existing performance monitoring tools such as the HP
PerfView toolset (a part of the HP OpenView software) [37].
6.2.6. The Controller
At each sampling instant k, given the sampled iops {I
i
(k) | 0 i < N} from the
Monitor, and the performance references {IS
i
(k) | 0 i < N} from the contract, the
Controller uses a control algorithm to compute a new migration speed for the next
sampling period (kW, (k+1)W). The migration speed is defined as the inter-submove-time
T
im
(k) or submove rate R
m
(k), i.e., number of submoves in the sampling period (kW,

162
(k+1)W). The control algorithm works as follows:

1) For each device 0 i < N, error E
i
(k) = IS
i
- I
i
(k). A device i has a negative
error if its iops IS
i
is less than its reference I
i
(k).
2) Find the smallest error E
min
(k) = min{E
i
| 0 i < N}. If there are devices with
negative errors, E
min
(k) is the negative error with the largest absolute value.
3) Compute the change in submove-rate according to a PI (Proportional-Integral)
control function [33]:
( ) ( )
+ =

k u
m
u E KI k E KP k dR
1
min min
) (

Equation 6.1
4) Compute the new submove-rate:
R
m
(k) = R
m
(k-1) + dR
m
(k) Equation 6.2
5) Convert R
m
(k) to inter-submove-time:
T
im
(k) = W/R
m
(k) - T
m

where T
m
is the average submove (measured via system profiling).
6) Notify the Actuator of the new inter-submove-time T
im
(the control input).

The rationale for using inter-submove-time as the manipulated variable is that a
longer inter-submove-time reduces the resource consumed by data migration and
consequently improves the performance of concurrent I/O streams. In Section 6.5.3, we
present a set of profiling results to verify that device iops can be effectively controlled via
regulation of the inter-submove-time.
The control input T
im
is computed based on the minimum error among all devices.
Therefore, the Controller dynamically adjusts the migration speed until the minimum

163
error converges to zero. This means that if the control function is properly tuned, the iops
of every device is higher or equal to its references, IS
i
I
i
(k) (0 i < N), in steady state.
The control algorithm has two parameters called control gains, i.e., KP and KI. The
values of the KP and KI need to be tuned to guarantee stability and efficient convergence
to the specs. The detailed design and tuning of the control function are presented in
Section 6.3.1.
6.2.7. The Actuator
The Actuator executes the migration plan at the migration speed that is dynamically
adjusted by the Controller. The Actuator divides each move in the migration plan into
sub
S
S
submoves, where S is the size of the store to be moved and the S
sub
is the fixed
size of each substore. For each submove, the Actuator invokes the LV Mover to move a
substore, and sleeps for T
im
sec after the LV Mover completes the submove before
invoking the LV Mover to conduct the next submove.
In summary, we designed Aqueduct, an on-line data migration architecture that
guarantees the specified aggregated application iops on each storage device. The key
novelty and contribution of Aqueduct is a feedback control loop that adaptively adjusts
migration speed based on performance of concurrent applications.
6.3. Design and Analysis of the Controller
In this Section, we present the second major contribution of Aqueduct, the modeling,
design, analysis and tuning of the Controller, which is the critical to the success of
Aqueduct. We first establish a dynamic model of Aqueduct and the storage system in
Section 6.3.1, and then tune the control functions with established control theory in

164
Section 6.3.2.
6.3.1. The Dynamic Model
For the controlled system (including the storage system, the Monitor, and the
Actuator) of Aqueduct, the output is the victim iops I
v
(k) defined as the iops of device i
with the smallest error E
min
(k+1)) in the time interval ((k-1)W, kW). The input of the
controlled system is the submove rate R
m
(k) or the inter-submove-time T
im
(k). In our
control design, we use R
m
(k) as the input because the submove rate leads to the following
linear model of the controlled system that is amenable to linear control theory:
( ) ) ( ), ( ) ( ) (
) ( )) 1 ( ) ( ( ) ( ) 1 (
1
min
min min
b Gz z P z R z P z I
a k R k R G k I k I
m
m m
= =
= +

Equation 6.3
Equation 6.3(a) is in the time domain, and Equation 6.3(b) is the equivalent model in the
z-domain. The term
1
z represents a time delay of a sampling window. The other
dynamics of the controlled system are ignored because they occur at a much smaller time
scale than the sampling window. The process gain, G, is the derivative of the output
I
min
(k+1) with respect to the submove rate R
m
(k). The process gain characterizes how
sensitive the victim device iops is with regard to the change in submove rate, and is
different for different I/O size and read/write percentage. To guarantee stability in all
cases, we use the process gain with the largest magnitude, G
max
= -8.00 in our control
design. We verify the above controlled system model and measure the process gains with
profiling experiments (see Section 6.5.3).
The transfer function from the minimum error E
min
(z) to the change in submove rate
dR
m
(z) is the standard PI control (Equation 6.2) in the z-domain:
( )

+ =
1
1
1
z
z
K K z C
I P

Equation 6.4
The transfer function from the change in submove rate dR
m
(z) to the submove rate
R
m
(z) (Equation 6.2) is modled as an integrator:
( )
1
2
=
z
z
z C
Equation 6.5
For the closed loop Aqueduct system, the input is the reference of the victim iops, and
the output is the victim iops I
min
(z). Given Equation 6.4, Equation 6.5, and Equation 6.6,
we can derive the transfer function of the closed loop Aqueduct system (including the
Controller and the controlled system):
( )
( )
( ) ) ( ) ( 1
) ( ) (
2 1
2 1
z P z C z C
z P z C z C
z P
C
+
=

Equation 6.6
Assuming the common case that the references of all devices are the same, IS
i
= IS (0
i < N), the constant reference is modeled as a step input,
1 z
z
IS
, to the closed loop
system. Therefore, the output of Aqueduct, the victim iops I
v
(z), can be derived:
( ) ) (
1
z P
z
z
IS z I
c v
=

Equation 6.7
6.3.2. Controller Tuning and Analysis
Given the dynamic model in Equation 6.6, we apply the Root-Locus method [33] in
control theory to tune the control gains K
P
and K
I
. Since the tuning follows a standard
process and is similar to the web server tuning described in Section 5.5.3, we only give
the results and analysis in this section. Given the process gain G = G
max
= -8.00, we set
K
I
= 0.1008, K
P
= 0.0364 Equation 6.8
which place the closed-loop poles of Equation 6.6 at
p
0,1
= 0.84 0.0607i Equation 6.9
Now we apply control theory [33] to analyze the performance profile based on the
tuning results in Equation 6.8.
Stability: Aqueduct guarantees stability because all of its closed-loop poles locate
inside the unit circle of z-domain, i.e., |p
0,1
| < 1. The stability is guaranteed despite
the possible variations in the process gain G cases because we assume the process
gain with the largest magnitude G
max
in our stability analysis.
Steady state performance: Applying the final value theorem in control theory to
Equation 6.7, we derive the final value of the victim iops:
IS
z P z C z C
z P z C z C
z
z
IS z I
z
v
=
+
=
) ( ) ( ) ( 1
) ( ) ( ) (
1
) 1 ( ) (
2 1
2 1
1
lim

Equation 6.10
This results means that, the victim iops accurately converges to the specified
reference IS, and every devices iops I
i
(k) IS (0 i < N) after the feedback control
loop converges. We should also note that since I
v
() = IS, Aqueduct achieves the
optimal speed under the constraint of the specified device iops.
Sensitivity with regard to the process gain G: Equation 6.10 also shows that the
final value of victim iops I
v
(k) does not dependent on the process gain G, i.e., the
victim converges to the reference IS regardless of the process gain G as long as
Aqueduct remains stable. This property is important because the process gain G
may vary at run-time in different workloads.
Overshoot: Assuming the process gain G = G
max
, the victim iops I
v
(k) overshoots
the reference IS by 18%, i.e., max(I
v
(k)) = 1.18IS during the transient state after
Aqueduct starts the execution of a migration plan.
Settling/Rise Time: Assuming the process gain G = G
max
, Aqueducts settling

167
time is 29W, where W is the sampling window, i.e., execution, the victim iops
I
v
(k) converges to within 2% from the reference 29W sec after the beginning of
migration. Let the sampling window W = 30 sec, Aqueduct has a settling time of
870 sec. Although the settling time may be long, the victim iops reaches 0.98IS
5W sec (called the rise time in control theory) after the migration starts and stays
within 0.98IS I
v
(k) 1.18IS afterwards (as illustrated in , which is plotted with
Matlab). Since the system overshoot is small (18%), we regard the system as in
the steady state for the purpose of performance evaluation.
In summary, we apply a control theory methodology to tune and analyze Aqueduct.
Specifically, we establish a dynamic model for the Aqueduct system, apply the Root
Locus method to tune the controller, and prove that the tuned Aqueduct achieves robust
performance guarantees in term of device iops and satisfactory performance profile.
Step Response
Time (sec)
V
i
c
t
i
m

i
o
p
s
0 100 200 300 400 500 600 700 800 900 1000
0
0.2
0.4
0.6
0.8
1
1.2
1.4

Figure 6.2. Step Response of Aqueduct

168
6.4. Implementation
We implement an Aqueduct prototype in C++ on HP-UX 11.0. Upon invocation,
Aqueduct creates two processes, a Monitor/Controller process and an Actuator process.
We now describe the source code of the major components in detail.
Initialization
1) The Monitor/Controller process forks the Actuator process to execute the
migration plan and establishes a pipe (with the non_blocking I/O mode)
between the main Aqueduct process and the mover process. The Actuator
process reads the output file generated by the migration planner.
2) The Monitor/Controller process initializes a monitor object and a controller
object. The constructor of class Controller initializes a vector of iops
references of all I/O devices based on a contract file.
The Monitor/Controller Process
The Monitor/Controller process repeats a loop until the migration plan is completed.
In each iteration of the loop, the Monitor/Controller process calls monitor.sample() and
then controller.control().
1) monitor.sample() samples the iops of all devices. In the current Aqueduct
prototype, monitor.sample() loops until it successfully opens an output file
that is periodically generated by the workload generator called Pylon (see
Section 6.5.1). It then reads the iops samples of all devices from the output
file and put them to a vector monitor.vec_stream_perf. In the future, Aqueduct
should be modified to interact with a performance-monitoring tool such as

169
PerfView to get the performance samples.
2) controller.control(monitor.vec_stream_perf) computes the new inter-
submove-time based the control algorithm described in Section 6.2.6, and then
writes it to the pipe connected with the Actuator process. This step is skipped
if the Controller is turned off through a configuration file.
The Actuator Process
In parallel with the Monitor/Controller process, the Actuator process repeats the plan
through a loop. In each iteration of the loop,
1) Get the next submove (substore and destination device) from the migration
plan.
2) Fork a LV Mover process to conduct the submove and wait for its completion.
3) Read from the (non-blocking) pipe. If the read succeeds, the inter-submove-
time inter_submv_time is set to the value from the pipe, otherwise it keeps the
old value.
4) Sleep for inter_mv_time sec.
6.5. Experiments
We now present a set of experimental results on a networked storage test-bed at the
Storage System Program of HP Labs. In Section 6.5.1, we describe the configurations of
the experiments. In Section 6.5.2, we present a performance study that quantifies the
performance penalty caused by data migration. In Section 6.5.3, we then describe a set of
profiling experiments to measure the process gain of the controlled storage system.
Finally, we present a set of performance evaluations of the Aqueduct prototype.

170
6.5.1. Experiment Configurations
The hardware used in the presented experiments includes a JBOD disk array
composed of 5 disks, and a host machine running HP-UX 11.0. The disk array and the
host machine is connected with a fibre channel.
Two logical volume groups with 25 stores are used in our experiments. The host
machine runs a synthetic workload generator called Pylon to generate streams of I/O
requests to the disk array. Now we describe the stores and I/O streams in details.
Stores
Volume group vg02 includes 4 disks (also called physical volumes in LVM)
in the JBOD. 24 stores (including 20 migrated stores and 4 fixed stores) are
created in vg02. A migrated store is a store that is moved across devices, while
a fixed store is never moved in any experiments.
Volume group vg04 includes a separate disk in the JBOD with a single
standalone store on it. Similar to the fixed stores, the standalone store is not
moved. It is called standalone because it belongs to a volume group that never
participates in data migration.
I/O streams
A Pylon process executed 5 as-fast-as-possible (afap) I/O streams including one
stream (called a fix-stream) on each fixed store and, one stream (called a standalone-
stream) on the standalone store. All I/O streams are completely random with a run count
= 1, i.e., the I/O stream never generates I/O requests on sequential locations on a disk.
Intuitively, afap I/O suffers most from resource contention from concurrent data

171
migration. Therefore, the migration-penalty on afap I/O streams represents the upper
bound of the migration-penalty on I/O streams with real workloads. Since there is only
one I/O stream per device in all the experiments presented in this report, the iops of a
stream is the same as the total iops of its target device. When multiple I/O streams exist
on a same device, the device iops should be the aggregated iops of all I/O streams on the
device.
6.5.2. A Performance Study on Migration Penalty
In this section, we present a performance study to quantify the performance penalty
caused by data migration on concurrent applications. In all the experiments presented in
this section, the Controller is turned off and a fixed inter-submove time is used.
Metrics
Data migration may affect the performance of application I/Os in two ways: 1) It
may cause resource contention with concurrent application I/O; 2) The load on devices
may change due to changes load distribution on devices when stores are moved. We only
execute I/O streams on fixed/standalone stores in our performance studies and there is no
change of load distribution during data migration. Therefore, only the effect of resource
contention is reflected in the presented experiments. To quantify the impact of data
migration on concurrent application I/O, we define the following terminology.

minM
i
The minimum iops of device i during the execution of a migration plan
maxM
i
The maximum iops of device i during the execution of a migration plan
minN
i
The minimum iops of device i after a migration-plan is completed
P
i
The migration-penalty (or penalty for simplicity) of device i during the execution of
a migration plan. RP
i
= (minN
i
minM
i
) / minN
i

The penalty P
i
of device i represents how many iops it lost due to concurrent data

172
migration relative to its iops without concurrent migration.
Data Migration
completes
Device w.
Fixed stores
Device w.
Standalone
store
IO size=2kB, read only, T
im
= 4 sec
IO size=64kB, Read 50%, T
im
= 0 sec
Device iops
Time (min)
100
150
190
80
2 4 6 8

Figure 6.3. Device iops during data migration
Results
The iops of all devices in two experiments with different workloads are illustrated in
Figure 6.3 .
In Experiment 1, all devices (the top 5 curves in Figure 6.3) are read-only, and the
size of requested data (request size) is 2 KB. The inter-submove-time T
im
= 4 sec.
In Experiment 2 (the bottom 5 curves in Figure 6.3), all devices are 50% read
with request size = 64 KB. The inter-submove-time T
im
= 0 sec, i.e., the Actuator
process does not sleep between two subsequent submoves.
The sampling period W = 30 sec in all experiments. Each point in Figure 6.3
represents the iops of a device during a sampling period. In both experiments all devices

173
achieve significantly less iops during migration than after the migration is completed. In
Experiment 1 (see Figure 6.4), the penalties on the fixed streams were in the range [9.7%,
13.0%]. The standalone-stream has a smaller penalty of 4.1%. In Experiment 2 (see
Figure 6.5), the penalties of the fixed streams are in the range [15.4%, 17.9%]. The
standalone-stream has a smaller penalty of 5.3%.
i Type minM
i
maxM
i
minN
i
DP
i
VP
i
AP
i
P
i
0 Fixed 161.94 173.67 186.10 6.3% 3.5% 3.2% 13.0%
1 Fixed 169.40 174.86 187.54 2.9% 2.8% 3.9% 9.7%
2 Fixed 163.41 172.80 187.55 5.0% 3.9% 3.9% 12.9%
3 Fixed 166.26 175.18 186.63 4.8% 2.7% 3.5% 10.9%
4 Standalone 180.15 187.77 4.1% 4.1%

Figure 6.4. Migration Penalty in Experiment 1
i Type minM
i
maxM
i
minN
i
DP
i
VP
i
AP
i
P
i
0 Fixed 83.86 89.59 102.14 5.6% 7.7% 4.6% 17.9%
1 Fixed 86.28 91.82 102.02 5.4% 5.5% 4.5% 15.4%
2 Fixed 84.29 90.55 102.41 6.1% 6.7% 4.8% 17.7%
3 Fixed 86.00 92.25 101.77 6.1% 5.1% 4.2% 15.5%
4 Standalone 97.45 102.95 5.3% 5.3%

Figure 6.5. Migration Penalty in Experiment 2
A more detailed analysis (see Figure 6.4 and Figure 6.5) reveals that the penalty of a
device can be logically divided into three portions, namely, the device-penalty, the vg-
penalty, and the array-penalty.
Device-Penalty DP
i
of device i is defined as the difference between the maximum
iops and the minimum iops of device i during migration divided by the minimum
iops of device i without migration, i.e., DP
i
= (maxM
i
- minM
i
) / minN
i
. This is
based on the observation that, during data migration, devices with fixed-streams
achieve lower iops when moves occur on them (i.e., serving as a source or
destination device of a move) when no moves occur on them. The device penalty
is caused by resource contentions between data migration and I/O streams on
resource (e.g., disk arm and/or disk controller) of the shared device.

174
Vg-Penalty VP
i
of device i (with a fixed streams) is defined as the difference
between the minimum iops minM
s
of the device with a standalone-stream during
migration and device is maximum iops maxM
i
during migration; i.e., VP
i
=
(maxM
i
- minM
s
) / minN
i
. Note that even when no moves occurred to a device
with a fixed-stream, its iops is still lower than the iops of the device with a
standalone-stream during migration. The vg-penalty may be caused by the volume
group management overhead (e.g., metadata update and/or locking) that occurs to
every device in a volume-group when mirror-split operations are performed to any
logical volumes in the volume group.
Global-penalty AP
i
: Surprisingly, even the device with a standalone-stream
achieves lower iops during migration than after the migration is completed,
although the standalone store belongs to a volume group that never participates in
data migration. We call the portion of migration penalty on every device in a
storage system the global-penalty AP
i
. AP
i
is the portion of penalty not included
in the device-penalty or vg-penalty, i.e., AP
i
= P
i
- DP
i
- VP
i
= (minN
i
- minM
s
) /
minN
i
. The global-penalty may be due to contention on resources shared by the
whole storage system, e.g., array controllers and/or fiber channels. The reasons
for the vg-penalty and the global-penalty remain open questions that need further
investigation into the mirror/split mechanism.
The measured device-penalty, vg-penalty, and global-penalty for all devices in
Experiments 1 and 2 are summarized in Figure 6.4 and Figure 6.5, respectively. Different
devices suffer different degrees of penalties. In particular, when data moves occur on a
device, it receives all three penalties, while a device with standalone streams only suffers

175
from the global penalty. We call the device with the least iops during a sampling period
((k-1)W, kW) the victim device at the k
th
sampling period. The iops of the victim device is
called the victim iops I
v
(k). Note that the victim iops I
v
(k) forms a bottom envelop of
the sampled iops of all devices during migration.
6.5.3. System Profiling
In this section, we present a set of profiling experiments to 1) verify the effectiveness
of migration speed (inter-submove-time or submove rate) as the manipulated variable for
controlling the device iops, and 2) measure the process gain G for control design. The
standalone stream and its target disk is not used in the profiling experiments because it is
usually not the victim device. 4 workloads with different combinations of I/O request
sizes (2KB or 64KB) and read ratio (100% or 50%) are used. For each workload, we run
Aqueduct (with the Controller turned off) repeatedly with a same migration plan. Each
run uses a fixed inter-submove-times in the range between 0 sec and 22 sec. The
sampling window W = 30 sec in all runs. Each data point plotted in Figure 6.6(a)(b)
represents the average of victim iops avg(I
v
(k)) of all sampling periods during the
execution of a migration plan. The 90% confidence interval of every data point is within
1.11% of the average value.
Victim-iops and inter-submove-time
We can see that the average victim-iops increases monotonously with inter-submove-
time T
im
in all workloads. For example, for workload (2 KB, 100% read), the average
victim-iops increased from 150.51 iops to 180.588 iops (an increase of 20.0%) when the
inter-submove-time increases from 0 sec to 22 sec. This result verifies that migration

176
speed is an effective mechanism for controlling device iops. However, Figure 6.6(a) also
shows that the relationship between T
im
and I
v
is non-linear, i.e., the slope of the I
v
vs. T
im

curve changes dramatically in the testes range of T
im
. Such non-linear relationship is not
amenable to linear control design [33].
(b) victim-iops vs submove rate
y = -7.8621x + 182.71
y = -8.0027x + 188.87
y = -3.9064x + 104.64
y = -4.0015x + 103.68
80
100
120
140
160
180
0 1 2 3 4 5 6
Rm
I
v
(2KB, 50%READ) (2KB, 100%READ)
(64KB, 50%READ) (64KB, 100%READ)
(a) victim-iops vs inter-submove-time
80
100
120
140
160
180
0 5 10 15 20 25
Tim
I
v
(2KB, 50%READ) (2KB, 100%READ)
(64KB, 50%READ) (64KB, 100%READ)

Figure 6.6. Relationship between migration speed and migration speed
Victim-iops and submove-rate
To find a linear model for the controlled storage system, we re-plotted the average
victim iops as a function of the submove-rate R
m
in Figure 6.6(b) based on the same
experiments. We can see that I
v
decreases linearly as a function of submove-rate R
m
with
all workloads. Using linear regression techniques (as shown in ), we find that the
relationship between the victim-iops and submove-rate can be formulated as
avg(I
v
(k))= I
c
+ GR
m
where I
c
is viewed as the constant capacity of the victim device. The process gain G of

177
the controlled storage system can be approximated with the slope of the R
m
vs. I
v
curve.
Note that G is different for different workloads and is especially sensitive to request size.
Smaller I/O size leads to a larger G, i.e., iops with smaller I/O is more sensitive to
changes in migration speed. For example, G = -8.00 when request size = 2KB and read
only, which is more than twice of G = 3.91 when request size = 64KB and 50% read.
Among all the workload, G
max
= -8.00 has the largest magnitude and is used in our
control design (Section 6.3).
The submove rate R
m
can be approximately converted to inter-submove-time T
im

= SP/R
m
- T
m
where T
m
is the submove time, i.e., the average duration of each submove.
The average submove time of all sampling periods during migration for each workload is
as follows:
Req. size 100%READ 50%READ
2KB 6.3840.326 sec 6.8070.388 sec
64KB 7.0050.376 sec 6.8730.332 sec

Since smaller submove time leads to a longer sleep time in our control algorithm, we use
the smallest measurement T
m
= 6.384 sec in our Controller implementation to be
conservative.
6.5.4. Performance Evaluation
In this Section, we present the performance evaluation of the (closed-loop) Aqueduct
prototype. The standalone stream and its target device are not used in the performance
evaluation. Two workloads are used in the experiments. In workload A, each I/O stream
has an request size of 2 KB and all I/O requests are read-only. For workload B, each I/O
has an request size of 64KB and 50% of I/O are read. With workload A, the iops
reference of each device is 165 iops, and the spec for each device is 95 iops when

178
workload B is used.
We use a baseline called AFAP, which is configured by turning off the Controller in
Aqueduct and set the fixed inter-sub-move time to 0 sec. In all the evaluation
experiments presented in this section, the initial value of T
im
(0) = 0 sec for the (closed-
loop) Aqueduct. The sampling period is 30 sec. All the data points presented in this
section (except for those in ) are the average values (with 90% confidence interval) of
11 repeated runs.
Performance Metrics
The following performance metrics are used in the performance evaluation.
IA
i
: Average iops of device i during migration.
VR
i
: QoS violation ratio of device i. A QoS violation of device i is a sampling
period k during which the iops of device i is less than its spec, i.e., I
i
(k) < I
si
. The
QoS violation ratio of device i is defined as the number of QoS violations NV
i

divided by the total number of sampling period NSP, i.e., VR
s
= NV
i
/ NSP.
WV
i
: Worst QoS violation of device i, defined as the largest error of device i
during migration relative to its reference, i.e., WV
i
= max(IS
i
- I
i
(k)) / IS
i
. Note that
while the QoS violation ratio represents the frequency of QoS violations, the
worst QoS violation represents the extent of QoS violation in the worst sampling
period.
T
r
: Rise time of the storage system defined as the during from the beginning of
migration (time 0) to the first sampling instant kW when I
i
(k) > 0.98IS
i
for every
device i. As discussed in Section 6.3.2, since the overshoot is small, we regard the
system as in the steady state after the rise time. A reason for using the rise time

179
instead of the real settling time as the start time of steady state is that the rise time
is easier to measure in a noisy system such as the storage system. The rise time
describes how fast Aqueduct reaches the correct migration speed to approach the
reference iops of all devices after migration starts. It is desirable to have a small
rise time and therefore a short transient interval without QoS guarantees.
The following three metrics describes the system performance in the steady state..
IAS
i
: Average iops of device i in steady state.
VRS
i
: Steady-state QoS violation ratio of device i, defined as the number of QoS
violations after the system enters a steady state divided by the total number of
sampling period in the steady state.
WVS
s
: Worst QoS violation of device i in the steady state, defined as the largest
error of device i relative to its spec in steady state.
T
DM
: Execution time of a migration plan, defined as the duration of the execution
of a migration plan. T
dm
represents the efficiency of data migration. Note that it is
undesirable to over-throttle the migration speed while unnecessarily allowing
devices to perform better than its spec.

180
T
im
R
m
IS
0.98IS
D
e
v
i
c
e

i
o
p
s

{

I
i
}

(
i
o
p
s
)
s
u
b
m
o
v
e
-
r
a
t
e

(
1
/
(
3
0

s
e
c
)
)
i
n
t
e
r
-
s
u
b
m
o
v
e
-
t
i
m
e

(
s
e
c
)

Time (min)
Time (min)
T
s
=90 sec
Steady-state
T
DM
=720 sec

Figure 6.7. Device iops and control input of Aqueduct

A Typical Run
A typical run of Workload A is illustrated in . The top graph illustrates the iops of
all devices. The bottom graph shows the inter-submove-time T
im
(k) and the submove rate
R
m
(k) computed by the Controller. The iops of all devices are less than the iops spec IS =
165 iops when migration started with T
im
(0) = 0 sec and R
m
(0) = 5 submove/W. This is
because the data migration is too fast and causes excessive resource contention with the
concurrent I/O streams. Aqueduct reacts to the QoS violations by gradually increasing
T
im
(k) to slow down the migration. By the time 90 sec (the rise time), the iops of all
devices increases to equal or more than 0.98IS = 161.7 iops while the submove rate is
reduced to 3 submove/W. In the steady state, the iops of all devices stays above or close
to the spec while the submove rate stays at 2-3 submove/SP. The victim iops I
v
(k) stays
close to the reference in steady state. This demonstrates that Aqueduct converges to a

181
correct speed and achieved QoS guarantees in the steady state. The system remains
stable throughout the run. The execution time of the migration plan T
DM
= 720 sec in this
run.
Average Device iops
The average device iops of the AFAP baseline and Aqueduct, and the steady state
average iops of Aqueduct are illustrated in Figure 6.8. With AFAP, the average iops of
every device is less than the spec with both workloads. In comparison, when Aqueduct
executed the same migration plan, every device achieved an average iops higher than the
reference. In addition, the steady state average iops IAS
i
of Aqueduct is more than its
average iops, IAS
i
> IA
i
> IS. This result shows that Aqueduct effectively increases the
iops of every device to more than the reference. The performance improvement is
especially significant after the system settles to the steady state.
0 1 2 3
Device Number
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
i
o
p
s
(a) Average Device iops (Workload A: 64KB 50%READ)
IAi AFAP
IAi AQUEDUCT
IASi AQUEDUCT
SPEC IS
0 1 2 3
Device Number
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
i
o
p
s
(b) Average Device iops (Workload B: 2KB READ only)
Figure 6.8. Average iops of AFAP and Aqueduct, and Aqueduct in steady state
QoS Violation Ratio
The QoS violation ratio of AFAP and Aqueduct, and the steady state QoS violation
ratio of Aqueduct are illustrated in Figure 6.9. We can see that the AFAP baseline causes

182
every device to violate its iops spec in most of the sampling periods because VR
i
> 90%
for every device i in both workloads. In the Aqueduct case, the QoS violation ratio of
every device is significant lower than the AFAP baseline, i.e., VR
i
< 35% (Workload A)
and VR
i
< 30% (Workload B) for every device i. The QoS violation ratio in the steady
state is further reduced to lower than 20% in both workloads, i.e., VR
i
< 20% (Workloads
A and B) for every device i. The steady-sate QoS violation ratio in the Aqueduct case in
is less than of the QoS violation ratio of the AFAP baseline. However, Aqueduct
cannot eliminate QoS violations even in steady state because its feedback control loop
oscillated around (rather than above) the specs. However, if a device iops I
i
close to the
spec, i.e., I
i
0.98IS
i
, is acceptable to the applications, we can treat IS
i
= 0.98IS
i
as a
relaxed spec for device i
13
. The QoS violation ratios based on the relaxed specs are
plotted in Figure 6.10. We can see that the QoS violation ratios of all devices remain
above 90% (both workloads) in the AFAP case. In the Aqueduct case, the QoS violation
ratios of all devices are lower than 20%. Most importantly, the steady state QoS violation
ratios are reduced to close to zero for all devices, i.e., VRS
i
< 5% for every device and
workload. It means that Aqueduct successfully achieved QoS guarantees at 98% of specs
after it settled down to a steady state.

13
Note that Aqueduct can achieve the strict spec IS by using IS
i
/0.98 to compute the control input (see the
Controller algorithm) so that the device iops converge to IS
i
/0.98 (instead of IS
i
).

183
0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Q
o
S

v
i
o
l
a
t
i
o
n

r
a
t
i
o
(a) Workload A: 64KB, 50% READ
VRi AFAP
VRi AQUEDUCT
VRSi AQUEDUCT
0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Q
o
S

v
i
o
l
a
t
i
o
n

r
a
t
i
o
(b) Workload B: 2KB, READ only
Figure 6.9. QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State

0 1 2 3
Device Number
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Q
o
S

v
i
o
l
a
t
i
o
n

r
a
t
i
o
VRi AFAP
VRi AQUEDUCT
VRSi AQUEDUCT
0 1 2 3
Device Number
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Q
o
S

v
i
o
l
a
t
i
o
n

r
a
t
i
o
Figure 6.10. QoS violation ratio based on the relaxed spec 0.98IS
Worst QoS Violation
The worst QoS violations of AFAP and Aqueduct, and Aqueduct in steady state are
illustrated in Figure 6.11. For the AFAP baseline, the worst QoS violation is more than
10% of the spec (both workloads) for every device except for device 4. The worst QoS
violation of Aqueduct is lower than the AFAP baseline, but the difference is insignificant.
This because the worst QoS violations occur before the system settles to the steady state.
In comparison, the worst QoS violation of every device in the steady state WVS
i
< 3%

184
with both workloads, which is significantly lower than the worst QoS violation of AFAP
and Aqueduct throughout the run. It means that device iops never becomes significantly
lower than the spec in the steady state. The results on worst QoS violations and QoS
violation ratios together show iops guarantees are successfully achieved in the steady
state.
0 1 2 3
Device Number
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
W
o
r
s
t

Q
o
S

v
i
o
l
a
t
i
o
n
WVi AFAP
WVi AQUEDUCT
WVSi AQUEDUCT
0 1 2 3
Device Number
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
W
o
r
s
t

Q
o
S

V
i
o
l
a
t
i
o
n
Figure 6.11. Worst QoS Violations of AFAP, Aqueduct, and Aqueduct in steady state
Rise Time
The evaluation has shown that Aqueduct can successfully guarantee iops for every
device in steady state. The rise times measures an orthogonal metric, i.e., how fast can
Aqueduct settle to a steady state? In our experiments, the rise time T
s
of Aqueduct is
204.5(17.6) sec for Workload A, and 144.5(9.8) sec for Workload B. The rise time can
be further reduced if a shorter sampling period is used. Note that Workload B (request
size = 2KB, READ only) has a shorter settling time. This is because a smaller I/O size
leads to a larger process gain and therefore the system is more responsive to feedback
control.

185
(b) (2KB READ only) (a) (64KB READ 50%)
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
E
x
e
c
u
t
i
o
n

t
i
m
e

o
f

M
i
g
r
a
t
i
o
n

P
l
a
n

(
s
e
c
)
AFAP
AQUEDUCT

Figure 6.12. Execution Time of Migration Plan
Execution Time of the Migration Plan
The execution times of the migration plan for the AFAP baseline and Aqueduct is
illustrated in Figure 6.12. Aqueduct achieves QoS guarantees at the cost of a longer
execution time of the migration plan (T
DM
). For workload A, Aqueduct had T
DM
=
765.0(11.3) sec, 83% longer than the AFAP case. For Workload B, Aqueducts T
DM
=
1012(6) sec, 121% longer than the AFAP case.
In summary, the evaluation experiments demonstrates that 1) Aqueduct settles to a
steady state after a rise time of T
r
< 4 min for the tested workloads; 2) In the steady state,
every device achieves its iops spec in steady-state with a QoS violation ratio (based 98%
of the specs) less than 5% and a worst QoS violation less than 3% of the specs; 3)
Aqueducts execution time of the migration plan is (83% for Workload A and 121% for
workload B) longer than the AFAP baseline.
6.6. Conclusion and Future Work
The Aqueduct project demonstrates the applicability of our FCS framework in a non-

186
real-time application, i.e., networked enterprise storage systems. The major contributions
of the Aqueduct project are summarized as follows.
A performance study that demonstrates uncontrolled data migration may cause
significant performance penalty on concurrent application I/O.
A feedback control architecture called Aqueduct that dynamically adapts
migration speed to guarantee specified iops for every device in a storage system.
A control theory methodology including a system-profiling technique for
modeling the storage system, the Root Locus method to the controller tuning, and
the control theory analysis on the performance profile of Aqueduct design.
Implementation and evaluation of an Aqueduct prototype on a networked storage
testbed that demonstrate Aqueduct provide desired iops guarantees for all devices
during data migration.
The Aqueduct project has successfully made a case for the strength of FCS
framework in networked storage systems. Future work includes more realistic
workload/experiments on high end RAID, and a more general implementation that
interacts with performance monitoring tools. The division of each store into multiple
logical volumes cannot scale well for large stores because of limitations on the number of
logical volumes in a volume group and the LVM overhead. An efficient mechanism for
dividing moves/stores into submoves/substores needs to be developed. More
sophisticated Actuator mechanisms may also be developed, e.g., the Actuator may
dynamically change the plan to avoid busy devices in the next move.

187

Chapter 7
General Issues
In this Chapter, we summarize some insights based on the application of FCS
framework on three different types of applications. In Section 7.1, we argue that
controlling aggregate performance (instead of each individual task) is a scalable and
practical control scheme in many computer systems. In Section 7.2, we discuss the design
tradeoffs related to the sampling period of feedback control loops in computer systems.
Finally, in Section 7.3 we discuss some open questions on the robustness regarding our
linear models and control design in actual non-linear and time variant environments.
7.1. Granularity of Performance Control
The granularity of performance control is an important issue for designing feedback
control resource scheduling algorithms. Ideally, a performance guarantee should be
provided for each individual task such as each process/thread, each TCP connection, and
each I/O steam. Although the individual guarantee may be possible for small systems
such as PCs or simple digital embedded controllers, it may be impractical on a large
server system such as a web server or a storage server for e-business applications. First,

188
controlling each individual task is not scalable to server systems with large number of
tasks. For example, it is not uncommon for a web server to handle millions of users and
TCP connections every hour, and performance control of each connection may introduce
extremely high overhead at run time. Second, the noisy and random behavior of each
individual task cannot be easily described by differential/difference equations or
controlled by classical control functions.
To solve the above problem, we aim to control the aggregate performance in all the
three applications in this thesis research.
Our real-time CPU scheduling algorithms (Chapter 4) control the aggregate
deadline miss ratio and total CPU utilization of all the tasks in the system.
Our web server (Chapter 5) controls the average service delay of each service
class composed of hundreds of users.
The Aqueduct data migration executor (Chapter 7) provides performance
guarantees on the aggregate throughput of I/O streams on each storage device.
Compared with performance control at the individual task level, the aggregation
performance is more scalable and efficient at run time. We have also shown that all the
three applications can be sufficiently approximated with first-order or second order
difference equations for the purpose of controlling the aggregate performance. For
example, although the queuing delay and processing of each individual task can be
difficult to describe with differential equations, the aggregate CPU utilization can be
readily approximated with a discrete integration model. The simplicity in modeling
aggregate system behavior is because the aggregation of large number of individual tasks
tends to smooth out the noise of individual tasks and therefore is more amenable to

189
modeling based on classical differential/difference equations. The aggregated
performance guarantees are especially appropriate to applications where the individual
tasks of a same aggregation are equally important. For example, to support delay
differentiation in a web server, all the HTTP requests in a same service class should be
identical in term of QoS requirements.
However, the disadvantage of aggregate performance control is that it may not
provide guarantees to each individual task. Special consideration may be necessary to
provide individual performance guarantees with an aggregate control scheme. For
example, under certain assumptions, each task can be guaranteed to make its deadline by
combining aggregate CPU utilization control with the knowledge of schedulable
utilization bound in real-time scheduling theory in real-time CPU scheduling (see
Chapter 4). In the storage data migration executor, we make a design tradeoff between
the practicality of control and individual performance by choosing the throughput of each
device instead of individual I/O stream or the whole storage server as the controlled
variable (see Chapter 6). Individual guarantees may also be handled at the actuator
mechanisms. If a system has critical tasks that require hard real-time guarantees, a fixed
amount of resources should be reserved for such tasks. In our FC-RTS algorithms
presented in Chapter 4, the QoS optimization algorithm in the actuator assigns higher
QoS levels to tasks with higher values or importance.
7.2. Sampling Period and Overhead
The sampling period of feedback control is an important parameter and may require
some design trade-off. Intuitively, reducing the sampling period causes the Controller to
react faster to variations of run-time conditions and may result in better transient response

190
such as shorter settling time and lower overshoot. However, smaller sampling period also
means more control overhead. The size of the sampling period may also have a lower
bound imposed by the workload. For example, to control the service delays of a web
server, enough TCP connections per service class should be dispatched to server
processes so that the monitor can infer a smooth service delay of each class at every
sampling instant. Otherwise the sampled delay may be dominated by the noise of a small
number of connections. Therefore the arrival rate of the TCP connections and capacity of
the server determines a lower bound of the sampling window. This lower bound
decreases as higher the connection arrival rate and server capacity increases. This
property means that busier and more powerful web servers can benefit from a smaller
sampling period. Another design option is to use a low-pass filter in the monitor to
smooth out the performance samples [54]. However, the low pass filter also tends to slow
down the Controllers response to run-time conditions. The sampling period may need to
be compatible with the periodicity of the workload. For example, for a periodic task set
with harmonious arrival period, the sampling period of the CPU scheduling algorithm
should be multiples of the least common multiples of tasks arrival periods. Otherwise the
system may oscillate frequently due to noise in sampled miss ratio because of the
difference in the number of processed tasks instances in different sampling periods [51].
Therefore, the sampling period of performance control should be tailored to the specific
systems and workloads.
The overhead of an FCS algorithm includes time spent in the Monitor, the Controller,
and the Actuator. The Controller overhead is usually negligible if it uses simple linear

191
control algorithms on aggregated controlled variables as in all the case studies in this
thesis. On the other hand, the overhead introduced by the Monitor and Actuator may need
more consideration. The Monitor introduces overhead for the collection of performance
information to compute the controlled variables. For example, collecting the aggregated
CPU utilization (as in FC-U and FC-UM in Section 4.4.4) may be more efficient than
keeping track of the CPU utilization of each individual task in the system. The Actuator
introduces overhead for changing the manipulated variables. For example, if the
preemptive scheduling is adopted in the web server (Section 5.4.1), the Actuator may
terminate established TCP connections and dispatch new TCP connections to server
processes. In contrast, the non-preemptive scheduling causes the Actuator to be much
more efficient Actuator because it only needs to change the process budget variable of
each service class at each sampling instant. In case no efficient Monitor or Actuator
mechanisms is possible, the Control designer may be forced to increases the sampling
period to reduces the relative overhead at the cost of slower response to run-time
variations.
7.3. Robustness of Linear Models and PI Control
In all three applications, we approximate a non-linear and time-varying computer system
with a linear and time-invariant model based on analysis or system identification
experiments. We then use classical linear control theory to design a P (Proportional) or PI
(Proportional-Integral) Controllers based on the linear model. Our evaluation experiments
have shown that FCS algorithms developed using this linear control approach
demonstrate exceptional robustness despite of approximations in the linear models. In all
of our evaluation experiments in three applications, all the FCS algorithms provide

192
performance guarantees (including stability, small steady state error, and satisfactory
transient response) in face of considerable variations in system workload and non-
linearities in the actual system. There remain two open questions on our linear models
and control approach: What conditions may cause the linear models to significantly
deviate from the actual non-linear computer systems? What conditions may cause
classical Controllers such as PI control to fail and therefore necessitate more
sophisticated control algorithms such as gain-scheduling and adaptive controllers? [12]
The above two questions are different because PI Controllers may achieve satisfactory
performance even when the models are inaccurate. Answers to these questions may lead
to even more robust feedback control resource scheduling solutions in unpredictable
environments.

193

Chapter 8
Conclusions and Future Work
In this thesis we establish Feedback Control real-time Scheduling (FCS) as a unified
framework of adaptive real-time systems based on feedback control theory. The FCS
framework supports fundamental resource scheduling solutions that provide robust
performance guarantees for real-time systems operating in unpredictable environments.
Such systems include open systems on the Internet such as online trading and e-business
servers, and data-driven systems such as smart spaces and agile manufacturing. In
contrast to ad hoc approaches that rely on laborious design/tuning/testing iterations, our
framework enables system designers to systematically design adaptive real-time systems
with established analytical methods to achieved robust performance guarantees in
We first introduce the major components and methodologies of the FCS framework in
general terms. The FCS framework includes a general feedback control scheduling
architecture that map the feedback control structure to adaptive resource scheduling, a set
of performance specifications and metrics to characterize transient and steady state
performance of adaptive real-time systems, and a control theory based design

194
methodology for resource scheduling algorithms to satisfy their performance
specifications.
We then present our first application of the FCS framework, real-time CPU
scheduling. We develop an adaptive CPU scheduling architecture and a set of scheduling
algorithms that provide performance guarantees in terms of deadline miss ratio and CPU
utilization in CPU-bound real-time systems in face of unpredictable task arrivals and
execution time variations. These scheduling algorithms are analytically designed and
tuned with feedback control theory based on a novel model of generic CPU-bound real-
time systems. Our simulation experiments demonstrate that our scheduling algorithms
can guarantee stability, desired miss ratio and CPU utilization in steady state, and
satisfactory transient performance in response to severe overloads and considerable
workload variations.
Our second application of the FCS framework is an adaptive architecture that
provides relative and absolute service delay guarantees for different service classes on
web servers under HTTP 1.1. This architecture is based on feedback control loops that
enforce delay guarantees for classes via dynamic connection scheduling and process
reallocation. We develop a system identification tool that enables system designers to
establish mathematical models for computer systems with unknown dynamics based on
experimental data. Based on a web server model established with our system
identification tool, we use a control theory method called the Root Locus method to
design feedback controllers to satisfy performance specifications. The adaptive
architecture has been implemented by modifying an Apache web server. Experimental
results demonstrate that our adaptive server provides robust delay guarantees even when

195
user populations of different classes vary significantly. Properties of our adaptive web
server also include guaranteed stability, and satisfactory efficiency and accuracy in
achieving desired delay or delay differentiation.
We also extend our FCS framework to a non-real-time application: on-line data
migration in networked storage servers. We developed a data migration executor called
Aqueduct that dynamically regulates data migration speed while guaranteeing specified
I/O throughput of concurrent applications. We implemented an Aqueduct prototype at a
storage server testbed. Our performance evaluation experiments demonstrate that
Aqueduct completes data migrations while successfully maintaining the throughput
specifications.
The successful application of our approach in three significantly different applications
gives us confidence of the applicability of our FCS framework in a wide range of real-
time and non-real-time systems.
The presented work on FCS suggests many interesting future work and research
directions. This thesis is mostly concerned with only single resource management to
achieve single dimension of performance guarantees. One direction for future research is
on feedback control scheduling of multiple resources for systems where bottleneck
resources can change at run-time. For example, e-business servers supporting dynamic
web contents such as database transactions and video/audio streaming may need feedback
control scheduling of the CPU, networking, memory, and storage in order to handle
different run-time conditions. Multiple concurrent feedback control loops and adaptive
mode-switching mechanisms need to be developed for such applications.
Another direction for future research is to extend the single node solutions in this

196
thesis to distributed systems such as server farms and smart spaces composed of
networked embedded systems. Future research is necessary to develop scalable and
decentralized control architecture and models that coordinate networked controllers to
achieve aggregate performance guarantees.
From a control theory perspective, it is interesting to investigate the application of
robust and adaptive control theory in computer systems with non-linearities and
variations that cannot be handled by the classical linear control scheme used in this
thesis. Research in these areas may lead to another leap toward the robustness and
flexibility of real-time systems in extremely unpredictable environments.
The current applications of the FCS framework are implemented as separate
application level software or as simulators. It would be interesting to implement the FCS
architecture as part of an OS kernel or middleware that provide a general set of service
API which provides performance guarantees to applications.

197

Reference
[1] 3Com Corporation, Gigabit Ethernet Comes of Age, Technology white paper, June
1996.
[2] ANSI, Fibre Channel Arbitrated Loop, Standard X3.272-1996, April 1996.
[3] T. F. Abdelzaher, An Automated Profiling Subsystem for QoS-Aware Services, IEEE
Real-Time Technology and Applications Symposium, Washington D.C., June 2000.
[4] T. F. Abdelzaher, E. M. Atkins, and K. G. Shin, QoS Negotiation in Real-Time Systems
and its Application to Automatic Flight Control, IEEE Real-Time Technology and
Applications Symposium, June 1997.
[5] T. F. Abdelzaher and N. Bhatti, Web Server QoS Management by Adaptive Content
Delivery, International Workshop on Quality of Service, 1999.
[6] T. F. Abdelzaher and C. Lu, Modeling and Performance Control of Internet Servers,
39th IEEE Conference on Decision and Control, Sydney, Australia, December 2000.
[7] T. F. Abdelzaher and C. Lu, Schedulability Analysis and Utilization Bounds for Highly
Scalable Real-Time Services, IEEE Real-Time Technology and Applications
Symposium, Taipei, Taiwan, June 2001.
[8] T. F. Abdelzaher and K. G. Shin, "End-host Architecture for QoS-Adaptive
Communication," IEEE Real-Time Technology and Applications Symposium, Denver,
Colorado, June 1998.
[9] T. F. Abdelzaher and K. G. Shin, QoS Provisioning with qContracts in Web and
Multimedia Servers, IEEE Real-Time Systems Symposium, Phoenix, Arizona, December
1999, pp. 44-53.
[10] J. Almedia, M. Dabu, A. Manikntty, and P. Cao, Providing Differentiated Levels of
Service in Web Content Hosting, First Workshop on Internet Server Performance,
Madison, WI, June 1998.
[11] Apache Software Foundation, http://www.apache.org.
[12] K. J. Astrom and B. Wittenmark, Adaptive control (2nd Ed.), Addison-Wesley, 1995.
[13] C. Aurrecoechea, A. Cambell, and L. Hauw, A Survey of QoS Architectures, 4
th
IFIP
International Conference on Quality of Service, Paris, France, March 1996.
[14] P. Barford and M. E. Crovella, Generating Representative Web Workloads for Network
and Server Performance Evaluation, ACM SIGMETRICS '98, Madison WI, 1998.
[15] G. Banga, P. Druschel, and J. C. Mogul, Resource Containers: A New Facility for

198
Resource Management in Server Systems, Operating Systems Design and
Implementation (OSDI'96), 1999.
[16] G. Beccari, et. al., Rate Modulation of Soft Real-Time Tasks in Autonomous Robot
Control Systems, EuroMicro Conference on Real-Time Systems, York, UK, June 1999.
[17] N. Bhatti and R. Friedrich, Web Server Support for Tiered Services. IEEE Network,
13(5), Sept.-Oct. 1999.
[18] P. R. Blevins and C. V. Ramamoorthy, Aspects of a Dynamically Adaptive Operating
Systems, IEEE Transactions on Computers, Vol. 25, No. 7, pp. 713-725, July 1976.
[19] E. Borowsky, R. Golding, A. Merchant, L. Schreier, E.Shriver, M.Spasojevic, and J.
Wilkes, Using Attribute-Managed Storage to Achieve QoS, 5th Intl. Workshop on
Quality of Service, New York, June 1997.
[20] A. Bouch, N. Bhatti, and A. J. Kuchinsky, Quality is in the Eye of the Beholder:
Meeting Users' Requirements for Internet Quality of Service, ACM CHI'2000, Hague,
Netherland, April 2000.
[21] S. Brandt and G. Nutt, A Dynamic Quality of Service Middleware Agent for Mediating
Application Resource Usage, IEEE Real-Time Systems Symposium, December 1998.
[22] G. Buttazzo, G. Lipari, and L. Abeni, "Elastic Task Model for Adaptive Rate Control,"
IEEE Real-Time Systems Symposium, Madrid, Spain, pp. 286-295, December 1998.
[23] M. Caccamo, G. Buttazzo, and L. Sha, Capacity Sharing for Overrun Control, IEEE
Real-Time Systems Symposium, Orlando, FL, December 2000.
[24] Carr, R., Virtual Memory Management, Ann Arbor, MI: UMI Research Press, 1984.
[25] S. Cen, "A Software Feedback Toolkit and its Application In Adaptive Multimedia
Systems," Ph.D. Thesis, Oregon Graduate Institute, October 1997.
[26] M. E. Crovella and A. Bestavros, Self-Similarity in World Wide Web Traffic: Evidence
and Possible Causes, IEEE/ACM Transactions on Networking, 5(6):835--846, Dec 1997.
[27] C. Dovrolis, D. Stiliadis, and P. Ramanathan, Proportional Differentiated Services:
Delay Differentiation and Packet Scheduling, SIGCOMM99, Cambridge,
Massachusetts, August 1999.
[28] P. Druschel and G. Banga, Lazy Receiver Processing (LRP): A Network Subsystem
Architecture for Server Systems, Operating Systems Design and Implementation
(OSDI'96), Seattle, WA, October 1996.
[29] L. Eggert and J. Heidemann, Application-Level Differentiated Services for Web
Servers, World Wide Web Journal, Vol 2, No 3, March 1999, pp. 133-142.
[30] J. Eker, "Flexible Embedded Control Systems-Design and Implementation." PhD-thesis,
Lund Institute of Technology, Dec 1999.
[31] E-Soft Inc., Web Server Survey, http://www.securityspace.com.
[32] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee,
Hypertext Transfer Protocol -- HTTP/1.1, IETF RFC 2616, June 1999.
[33] G. F. Franklin, J. D. Powell and M. L. Workman, Digital Control of Dynamic Systems
(3
rd
Ed.), Addison-Wesley, 1998.
[34] G. F. Franklin, J. D. Powell and A. Emami-Naeini, Feedback Control of Dynamic
Systems (3
rd
Ed.), Addison-Wesley, 1994.

199
[35] S. Gribble, G. Manku, E. Roselli and E. Brewer, Self-similarity in File Systems,
SIGMETRICS98, April 1998.
[36] C. V. Hollot, V. Misra, D. Towsley, and W. Gong, A Control Theoretic Analysis of
RED, IEEE INFOCOM, Anchorage, Alaska, April 2001.
[37] HP OpenView Homepage, http://www.openview.hp.com/.
[38] N. I. Kamenoff and N. H. Weiderman, Hartstone Distributed Benchmark: Requirements
and Definitions, IEEE Real-Time Systems Symposium, 1991.
[39] Mathworks Inc., http://www.mathworks.com/products/matlab.
[40] R.P. Kar and K. Porter, "Rhealstone -- a Real Time Benchmarking Proposal," Dr. Dobbs'
Journal, 14(2), February 1989.
[41] D. L. Kiskis and K. G. Shin, SWSL: A Synthetic Workload Specification Language for
Real-Time Systems, IEEE Transactions on Software Engineering, 20(10), October
1994.
[42] D. L. Kiskis and K. G. Shin, A Synthetic Workload for a Distributed Real-Time
System, Journal of Real-Time Systems, 11(1), July 1996.
[43] M. Klein, T. Ralya, B. Pollak, R. Obenza, M. G. Harbour, A Practitioner's Handbook for
Real-Time Analysis Guide to Rate Monotonic Analysis for Real-Time Systems, Kluwer
Academic Publishers, August 1993.
[44] C. Lee, J. Lehoczky, D. Siewiorek, R. Rajkumar, and J. Hansen, A Scalable Solution to
the Multi-Resource QoS Problem, IEEE Real-Time Systems Symposium, Phoenix, AZ,
Dec 1999.
[45] J. P. Lehoczky, L. Sha and Y. Ding, The Rate Monotonic Scheduling Algorithm Exact
Characterization and Average Case Behavior, IEEE Real-Time Systems Symposium,
1989.
[46] B. Li and K. Nahrstedt, A Control-based Middleware Framework for Quality of Service
Adaptations, IEEE Journal of Selected Areas in Communication, Special Issue on
Service Enabling Platforms, 17(9), Sept. 1999.
[47] J. Liebeherr and N. Christin, Buffer Management and Scheduling for Enhanced
Differentiated Services, University of Virginia Tech. Report CS-2000-24, August 2000.
[48] C. L. Liu and J. W. Layland, Scheduling Algorithms for Multiprogramming in a Hard
Real-Time Environment, Journal of ACM, Vol. 20, No. 1, pp. 46-61, 1973.
[49] J. W. S. Liu, et. al., Algorithms for Scheduling Imprecise Computations, IEEE
Computer, Vol. 24, No. 5, May 1991.
[50] C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son, A Feedback Control
Architecture and Design Methodology for Service Delay Guarantees in Web Servers,
University of Virginia, Technical Report CS-2001-05, submitted to IEEE In submission to
IEEE Transactions on Computers, Special Issue on QoS Issues in Internet Web Services,
January 2001.
[51] C. Lu, J. A. Stankovic, T. F. Abdelzaher, G. Tao, S. H. Son and M. Marley,
Performance Specifications and Metrics for Adaptive Real-Time Systems, IEEE Real-
Time Systems Symposium, Orlando, FL, Dec 2000.
[52] C. Lu, J. A. Stankovic, G. Tao and S. H. Son, Design and Evaluation of a Feedback
Control EDF Scheduling Algorithm, IEEE Real-Time Systems Symposium, Phoenix, AZ,

200
Dec 1999.
[53] C. Lu, J. A. Stankovic, G. Tao, and S. H. Son, Feedback Control Real-Time Scheduling:
Framework, Modeling, and Algorithms, University of Virginia, Technical Report CS-
2001-06, submitted to Real-Time Systems Journal, Special Issue on Control-Theoretical
Approaches to Real-Time Computing, January 2001.
[54] Y. Lu, A. Saxena, and T. F. Abdelzaher, Differentiated Caching Services; A Control-
Theoretical Approach, International Conference on Distributed Computing Systems,
Phoenix, AZ, April 2001.
[55] J. M. Maciejowski, Multivariable Feedback Design, Addison-Wesley, 1989.
[56] T. Madell, Disk and File Management Tasks on HP-UX, Prentice Hall, 1997.
[57] P. Mejia-Alvarez, R. Melhem, and D. Mosse, An Incremental Approach to Scheduling
during Overloads in Real-Time Systems, IEEE Real-Time Systems Symposium, Orlando,
FL, Dec 1999.
[58] V. Pai, P. Druschel and W. Zwaenepoel, Flash: An Efficient and Portable Web Server,
USENIX Annual Technical Conference, Monterey, CA, June 1999.
[59] L. Palopoli, L. Abeni, F. Conticelli, M. D. Natale, and G. Buttazzo, Real-Time Control
System Analysis: An Integrated Approach, IEEE Real-Time Systems Symposium,
Orlando, FL, Dec 2000.
[60] G. Papadopoulos, Moores Law Aint Good Enough, keynote speech at Hot Chips ,
August 1998.
[61] S. K. Park and K. W. Miller, Random Number Generators: Good Ones Are Hard to
Find, Communications of ACM, vol. 21, no. 10, Oct. 1988, pp. 1192-1201.
[62] R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek, Practical Solutions for QoS-based
Resource Allocation Problems, IEEE Real-Time Systems Symposium, December 1998.
[63] D. Rosu, K. Schwan, and S. Yalamanchili, FARAa Framework for Adaptive Resource
Allocation in Complex Real-Time Systems, IEEE Real-Time Technology and
[64] D. Rosu, K. Schwan, S. Yalamanchili and R. Jha, "On Adaptive Resource Allocation for
Complex Real-Time Applications," IEEE Real-Time Systems Symposium, Dec 1997.
[65] M. Ryu and S. Hong, Toward Automatic Synthesis of Schedulable Real-Time
Controllers, Integrated Computer-Aided Engineering, 5(3) 261-277, 1998.
[66] D. Seto, J. P. Lehoczky, L. Sha, and K. G. Shin, On Task Schedulability in Real-Time
Control Systems, IEEE Real-Time Systems Symposium, December 1996.
[67] S. S. Skiena and S. Skiena, The Algorithm Design Manual, Telos/Springer-Verlag, New
York, November 1997.
[68] S. H. Son, R. Zimmerman, and J. Hansson, " An Adaptable Security Manager for Real-
Time Transactions," Euromicro Conference on Real-Time Systems, Stockholm, Sweden,
June 2000.
[69] D. C. Steere, et. al., "A Feedback-driven Proportion Allocator for Real-Rate Scheduling,"
Symposium on Operating Systems Design and Implementation, Feb 1999.
[70] J. A. Stankovic, C. Lu, S. H. Son, and G. Tao, "The Case for Feedback Control Real-
Time Scheduling," EuroMicro Conference on Real-Time Systems, York, UK, June 1999.

201
[71] J. A. Stankovic and K. Ramamrithitham (Eds), Hard Real-Time Systems, IEEE Press,
1988.
[72] J. A. Stankovic, M. Spuri, K. Ramamritham, and G. C. Buttazzo, Deadline Scheduling
for Real-Time Systems EDF and Related Algorithms, Kluwer Academic Publishers,
1998.
[73] K. G. Shin and C. L. Meissner, Adaptation and Graceful Degradation of Control System
Performance by Task Reallocation and Period Adjustment, EuroMicro Conference on
Real-Time Systems, June 1999.
[74] D. C. Steere, et. al., "A Feedback-driven Proportion Allocator for Real-Rate Scheduling,"
Symposium on Operating Systems Design and Implementation, Feb 1999.
[75] Veritas Software Corporation, Veritas Volume Manager,
http://www.veritas.com/us/products/volumemanager/.
[76] N. H. Weiderman and N. I. Kamenoff, Hartstone Uniprocessor Benchmark: Definitions
and Experiments for Real-Time Systems, Journal of Real-Time Systems, 4(4), December
1992.
[77] L. R. Welch and B. A. Shirazi, "A Dynamic Real-time Benchmark for Assessment of
QoS and Resource Management Technology," IEEE Real-time Technology and
[78] L. R. Welch, B. Shirazi and B. Ravindran, Adaptive Resource Management for
Scalable, Dependable Real-time Systems: Middleware Services and Applications to
Shipboard Computing Systems, IEEE Real-time Technology and Applications
Symposium, June 1998.
[79] W. Zhao, K. Ramamritham and J. A. Stankovic, Preemptive Scheduling Under Time
and Resource Constraints, IEEE Transactions on Computers 36(8), 1987.

Thesis PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Thesis PDF

Caricato da

Copyright:

Formati disponibili

Feedback Control Real-Time Scheduling

Potrebbero piacerti anche