Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
!"#$%&'()+,-./012345<yA|
M ASARYKOVA UNIVERZITA
FAKULTA INFORMATIKY
D IPLOMA T HESIS
Martina Kollarova
ii
Acknowledgement
I hereby give my thanks to my advisor, Marek Grac, for help with the orga-
nization of this work, my colleagues Attila Darazs, Peter Belanyi, Tal Kam-
mer and Fabio Di Nitto for the technical advice, and Red Hat for the re-
sources that allowed me to create this.
iii
Abstract
iv
Keywords
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 OpenStack and Fault Tolerance . . . . . . . . . . . . . . . . . . . . 5
2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Overview of the OpenStack Ecosystem . . . . . . . . . . . . . 7
2.3 About Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Sources of Failures . . . . . . . . . . . . . . . . . . . . 9
2.4 Highly-available OpenStack . . . . . . . . . . . . . . . . . . . 10
2.4.1 OpenStack Swift . . . . . . . . . . . . . . . . . . . . . 12
3 Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 About Fault-injection Testing . . . . . . . . . . . . . . . . . . 15
3.1.1 OpenStack and Fault-injection Methods . . . . . . . . 16
3.2 Related Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Tempest . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Chaos Monkey . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Gigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.4 ORCHESTRA . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.5 ComFIRM . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.6 Tools for Simulating Disk Failures . . . . . . . . . . . 18
4 Test Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 OpenStack Swift Tests . . . . . . . . . . . . . . . . . . . . . . 20
4.2 High-availability Tests . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 VM Creation and Scheduling . . . . . . . . . . . . . . 26
4.3 Other Tests Ideas . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Framework Design and Implementation . . . . . . . . . . . . . . 29
5.1 Support of Multiple Topologies . . . . . . . . . . . . . . . . . 31
5.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 State Restoration . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.1 Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3.2 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 System Deployment Script . . . . . . . . . . . . . . . . . . . . 35
5.5 The Frameworks Capabilities and Drawbacks . . . . . . . . 36
5.5.1 Unimplemented features . . . . . . . . . . . . . . . . 36
1
5.6 Implementation of Fault-injection Tests . . . . . . . . . . . . 38
5.6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Appendix A Attachments . . . . . . . . . . . . . . . . . . . . . . . . . 42
Appendix B User tutorial . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.2 Running the tested system in VirtualBox . . . . . . . . . . . . 44
B.3 Running the tested system inside OpenStack VMs . . . . . . 45
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2
Chapter 1
Introduction
3
1. I NTRODUCTION
4
Chapter 2
1. https://openstack.org/
5
2. O PEN S TACK AND FAULT T OLERANCE
2.1 Terminology
These are the most common terms used throughout this work. Many of
them were defined in the paper by Laprie [10], which sets the terminology
for fault-tolerant computing.
downtime when a user cannot get his work done, the system is con-
sidered down [11] (also called outage)
6
2. O PEN S TACK AND FAULT T OLERANCE
When a user wants to create a new virtual machine trough the Nova
API, the request is first sent to Keystone to authenticate him. Afterwards,
the Nova scheduler finds a server with a hypervisor3 with free resources
where it could run the VM. A request is sent to Neutron, which connects it
to the networks the user selected. Cinder creates block storage (disk) for it
7
2. O PEN S TACK AND FAULT T OLERANCE
and Glance finds the image that was chosen by the user, lets say a Fedora
Linux installation image (which could be stored directly on the file system,
or in Swift, or some other storage system) and boots the VM from it. Most
of the communication between servers is done trough a messaging service
that implements the Advanced Message Queuing Protocol (AMQP) and the
state of the virtual images is kept in an SQL database.
8
2. O PEN S TACK AND FAULT T OLERANCE
Other types of failures that can be encountered are disk failures or net-
work connectivity issues (e.g., overloaded networking switch), and failures
caused by the environment (flood, fire). A network partition happens when
a failure in the network causes the system to be split and unable to commu-
nicate with with the other part.
Mean time to failure, or MTTF, is a commonly used measurement [6] of
availability, defined as
uptime
MTTF = .
number of failures
A study by Google [6] characterized the availability properties of a cloud
storage system and found that the MTTF of a node is 4.3 months and less
than 10% of events (failures) last longer than 15 minutes. They also noted
9
2. O PEN S TACK AND FAULT T OLERANCE
that a large number of failures are correlated, for example because of power
outages or rolling upgrades of the system (scheduled gradual upgrade of
the system). Another study[9] found that the MTTF of a disk was 10-50
years, but the annual failure rate (probability that a component fails during
one year) of storage systems was between 2% and 4%, or even more with
certain kinds of disks. This is because disk failures are not always the dom-
inant factor that causes issuesdisks contribute to 20-55% of a storage sub-
system failures, physical interconnects (cables, networks, power outages)
contribute 27-68%.
In the book by Marcus and Stern [11], statistics show that unplanned
downtime is caused by system software bugs in 27% of the cases, by hard-
ware in 23%, human error 18%, network failures 17%, and by natural disas-
ters in 8% of the collected cases.
10
2. O PEN S TACK AND FAULT T OLERANCE
passive is that it takes time to detect the failure and replace the service
with the backup. To make the replacement faster, the service can be already
kept active on the backup node, which is called hot standby. In an active/ac-
tive setup, there is also a backup server for a service, but both of them are
used concurrently. The difficulty with active/active is that the state of all
the redundant services has to be kept in sync [11, 15]. A load balancer
(in OpenStack, HAproxy is commonly used) manages the traffic to these
systems, ensuring that the operational systems handle the request. The re-
placement of a failed node (also called failover) is provided in OpenStack by
Pacemaker [16].
Figure 2.2 shows how Red Hat plans to make OpenStack highly avail-
able. Some of the services are shown as single nodes, either because they
dont need to be highly available (e.g. Foreman, which is a deployment and
administering tool), or they are managed by the components itselfSwift
has HA capabilities of its own (see Section 2.4.1), Compute (Nova) nodes
are taken care of by the Nova scheduler, and the availability of tenant in-
stances is the responsibility of the end-users. The MongoDB and MariaDB
(previously named MySQL) represent the databases.
Mirantis describes [12] an alternative HA topology using similar con-
cepts and tools.
11
2. O PEN S TACK AND FAULT T OLERANCE
4. Source: https://github.com/fabbione/rhos-ha-deploy
5. https://aws.amazon.com/s3/
6. Horizontal scaling means that more nodes are added to the system, in contrast to verti-
cal scaling which adds more power to the existing nodes
12
Chapter 3
Problem Analysis
The goal is to make sure that the OpenStack system can handle failures.
Even though it seems to be highly available, the first fault can prove us
wrong in the belief. Since the failures that could trigger fault tolerance mech-
anisms are infrequent, we need to simulate them to verify the system.
Imagine a person, let us call him Mallet, 1 testing the systems fault toler-
ance. Mallet installs a part of OpenStack components to be highly available,
since installing all of them in an HA arrangement is a time and resources
expensive endeavour. He starts randomly restarting some services and do-
ing other damage to the system, always checking if it still works from the
normal users point of view. He could also try running part of the test suite
(see Section 3.2.1) and checking the system messages for errors.
Doing so would take him anywhere between hours to days, and once he
finds an unexpected error, he might not be sure what caused itwas it just
because of the last injected fault, or by the previous one (when he didnt
notice it), or an unlucky combination of them? Unsure of what exactly had
happened, he installs another system with the same topology and tries to
repeat the latest actions.
If Mallet is lucky, he will reproduce the error and report a bug on it. He
will provide the developers a reproducera list of commands or a script that
re-creates the problem, and information on how to set it up the way he did.
But even in this optimistic case, it could take the developer hours of her
time to reproduce the problem, and then even more when the bug fix has
to be verified.
However, if he is unlucky, he wont be able to easily re-create the prob-
lem. He will try repeating some of the other failures he did before, trying
them out in different order, looking into system messages for hints as to
what happened. Yet if he encountered a race condition that only appears
infrequently, he is out of luckthe error could prove forever elusive or will
1. Mallet is a name used in cryptography for the person who is a malicious attacker, in
contrast to Eve, which is usually just a passive eavesdropper
13
3. P ROBLEM A NALYSIS
only occur once a year on some customers setup, causing a big outage.
To make testing easier, we need some tool that can do all of Mallets
actions automatically. This tool has to provide:
repeatability, it needs to be possible to automatically repeat the con-
ditions under which the error occurred
14
3. P ROBLEM A NALYSIS
The term fault-injection covers a range of testing techniques, all the way
from white-box testing (on the level of source code) to black-box testing
(without peering into the internal workings); from damaging the pins on a
chip to giving the program random input. Most of the existing techniques
fall into five main categories[20], depending on what they are based upon:
hardware accomplished at physical level, for example by heavy ion
radiation or modifying the value of the pins on the circuit,
also called HWIFI (Hardware Implemented Fault Injec-
tion)
15
3. P ROBLEM A NALYSIS
Software-based fault injection testing is the best suited method for Open-
Stack. Hardware-based testing is meant for circuits, there is no formal model
of OpenStack to do simulation-based testing and the system is probably too
complex to efficiently create even an abstract formal model of it. Emulation-
based testing is meant for VHDL model testing, which is inapplicable.
The high availability of OpenStack has to be tested with higher-level
faults than normally. An example of a fault would be a server shutdown,
network partition or disk error. The tests could be named high-availability
tests, but this is not a commonly used term. Protocol fault injection could
be used to test the communication, but would be focused on the messag-
ing service. This combines features of black-box testing (it doesnt know
how things are implemented) and white-box testingit partially looks into
internal structures (e.g. checks the replica count by directly accessing the
Swift objects in tests described in Section 4.1). This type of testing is called
gray-box testing.
Compile-time fault injection of OpenStack would be possible, but it is
not the focus of this workwe are rather interested in the fault tolerance
and high availability of the whole system, not the small parts and com-
ponents of it, on which this testing method is focused on. This is because
it shouldnt matter to the system that a single service crashed (see Sec-
tion 2.3). Nonetheless, it can become a problem if a service doesnt crash,
but starts sending incorrect information to neighboring components. This
kind of fault would best be simulated by protocol fault injection, because
the services all communicate trough the REST API or AMQP.
There are existing tools that test OpenStack and there are frameworks for
fault-injection testing, but until now they didnt intersect. This section pro-
vides an overview of SWIFI tools that somehow relate to this work, and
existing OpenStack tests. We combine a few concepts from these and could
potentially use some of them as external libraries.
Most of the existing fault-injection tools are specific to some proprietary
technology. For example, ORCHESTRA (see Section 3.2.4) was originally
developed for the Mach operating system and later ported to Solaris. Nev-
ertheless, we can use the concepts from these tools and this section com-
pares them to the design of this work.
16
3. P ROBLEM A NALYSIS
3.2.1 Tempest
The main testing framework of OpenStack is called Tempest2 and is an open-
source project with more than 2000 tests. Its design principles3 require that
the tests only access the public interfacesno direct queries to the database
or remote commands to the servers are allowed, thus it is only meant for
black-box testing. It also strives to be topology independent, therefore the
tests dont know if OpenStack is installed on a single node or on hundreds
of nodes; neither is it possible to find out how many servers with a specific
service there are, or gain access to them.
These design principles make it impossible to create fault-injection tests
with Tempest, because to inject failures you need to have access to the ma-
chines (e.g., to simulate a disk failure, restart a service, etc.) and to know
where certain services are installed. Without control of the servers, it isnt
possible to restore the state of the system after an injected fault. Thus, this
framework is unsuitable for the type of testing we need.
However, Tempest could be used as an external tool to verify that a test
was successful and the system is still in working order. After each fault-
injection, a relevant subset of Tempest could be run to see if the tested com-
ponents respond correctly to API calls. Only a relevant part should be run,
because the whole test suite is big and therefore slow (it takes more than 30
minutes to run all the tests).
2. https://github.com/openstack/tempest
3. http://docs.openstack.org/developer/tempest/overview.html#
design-principles
4. https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey
17
3. P ROBLEM A NALYSIS
3.2.3 Gigan
3.2.4 ORCHESTRA
3.2.5 ComFIRM
The ComFIRM tool [4] is similar to ORCHESTRA, but inserts code directly
inside the Linux kernel, into the message exchange subsystem. It also sup-
ports message omission and timing faults, but is architecture independent
and is able to run on the versions 2.4 and 2.6 of Linux, therefore it would be
a good choice as an external tool for our framework in the future, if it will
be desirable to create protocol fault-injection tests of the messaging service
or REST API.
These tools will be necessary to simulate disk errors for the Swift tests (see
Section 4.1).
18
3. P ROBLEM A NALYSIS
This will overwrite the beggining of the disk with random bytes, thus
damaging the partition table and file system. A similar approach can also
simulate a full disk, but we could also use /dev/full, which is a special
device that always returns the error No space left on device.
5. https://www.kernel.org/doc/Documentation/fault-injection/
fault-injection.txt
6. http://blog.wpkg.org/2007/11/08/using-fault-injection/
19
Chapter 4
Test Design
The design of the framework was preceded by the design of the test, to cre-
ate an estimate of what tools and functions it should provide. Not all of the
tests were implemented, mostly because of a lack of a good OpenStack de-
ployment tool that would be able to install the required system topologies.
You can find more information on the implementation and results of the
test in Section 5.6.
The test design process has started with OpenStack Swift, the object stor-
age service, because reliability is its main focus and it has high-availability
capabilities of its own (see Section 2.4.1). Therefore, parts of it can be tested
without a complicated HA setup, which is useful especially because the
deployment tools are in development and often have problems deploying
highly-available setups.
As described in Section 2.4.1, replicas are copies of objects that are kept
for data redundancy. Usually there are three replicas of each object and this
is what the following tests assume. When a disk fails, the system recognizes
it and creates another replica in a different location so that there are three
copies of each object againthis process is called replica regeneration here.
In all the tests, there is a set time limit for the replica regeneration and
the test should fail if it takes any longer than that. In the beginning of each
test, Swift should already be populated with some random objects and have
all the replicas correctly distributed and consistent. Since it would be diffi-
cult to reverse-engineer a file that would get saved by Swift into the location
we are testing and observing, there should be enough files uploaded so that
each disk would have at least one. The same is true when a test is adding
additional data to Swift. Ideally, a tool should be created to provide statis-
tics on how the files are distributed, and if some disk wouldnt have any
data, more would be uploaded until this becomes the case.
20
4. T EST D ESIGN
Swift proxy
Keystone
Figure 4.1: Minimum OpenStack topology required for the replication tests.
1 upload random f i l e s t o S w i f t
2 damage d i s k on d a t a s e r v e r [ 0 ]
3 start time = get time ( )
4 while ( g e t s w i f t r e p l i c a c o u n t ( ) < 3 ) :
5 current time = get time ( )
6 i f c u r r e n t t i m e s t a r t t i m e > timeout :
7 fail test
Listing 4.1 demonstrates a basic replication test. The term damage disk
on line 2 can be implemented by force unmounting some disk or by using
the dd command as described in Section 3.2.6. The test could be repeated in
the same form, but with some tool that creates the disk errors at random. In
that case, the timeout for the replication regeneration should be increased,
21
4. T EST D ESIGN
since it could take Swift a while to recognize those errors. The same could
be done for the rest of the tests in this section.
Whether the test succeeded has to be verified by checking that each ob-
ject has the appropriate number of replicas and the contents of the object is
correct. Since Swift doesnt provide any API to check this (and it should not
be trusted even if it existed), it is necessary to look into the Swift object ring
as to where the objects are, and download each of them directly from the
data server. The administration tool swift-get-nodes1 should be used
for this, since it takes the hash of the object and the ring file as input and
produces the direct links to the files as output. In the example test in List-
ing 4.1, this is done by the get swift replica count function on line 4,
and the test fails if it doesnt recover into three replicas for each object.
1 upload random f i l e s t o S w i f t
2 damage d i s k on d a t a s e r v e r [ 0 ]
3 damage d i s k on d a t a s e r v e r [ 1 ]
4 start time = get time ( )
5 # wait u n t i l a t l e a s t a p a r t o f r e p l i c a s i s r ec o ve r e d
6 while ( g e t s w i f t r e p l i c a c o u n t ( ) < 2 ) :
7 current time = get time ( )
8 i f c u r r e n t t i m e s t a r t t i m e > timeout :
9 fail test
10
11 damage d i s k on d a t a s e r v e r [ 0 ]
12 while ( g e t s w i f t r e p l i c a c o u n t ( ) < 3 ) :
13 current time = get time ( )
14 i f c u r r e n t t i m e s t a r t t i m e > timeout :
15 fail test
The test in Listing 4.2 uses the same concepts as the previous one. First
it injects failures into two disks on different data servers, since Swift places
replicas as far away from each other as it can and we want to be sure we
damage at least two replicas of an object. After this, the replicas should
recover from their third copy onto handoff nodes, but we only have to wait
until there are two of each file. This should make it feasible to recover even
if a third disk gets damaged, which is done on line 11. In this case, it can
be done on any of the two data servers, but if there are more of them, the
damage should be made on another node. This is again because Swift places
copies of the data as far away from each other as possibleif different zones
are not an option, it will try using another server. If that is not feasible, it
22
4. T EST D ESIGN
will at least place them on different disks. After three disks are gone, there
should exist an object that is only on the handoff nodes, which is the special
case we wanted to test here.
The test could continue damaging the disks like described in Listing 4.2
until there is only a single disk left in the system, which would hold all
the data, assuming it has capacity for it. The user request for a file might
not work anymore, because the proxy server tries only a set number of
nodes before it gives up and declares the object missing; and writing new
data would stop working, because that requires the majority of writes to be
successful (in this case with replica count set to three, at least two replicas
would have to be created, which is no longer possible). However, the data
would still be there and could be recovered [13].
Most of the other tests are similar to Listing 4.1 and 4.2 and all of them
use a timeout for the replica regeneration, for example:
expected failure 1. damage three disks at the same time (or more,
if the replica count is higher)
2. check that the replicas didnt regenerate even
after some time period
3. fail if the replicas regenerated (this tests whether
the tests themselves are correct)
Swift will regenerate the replicas only if a disk failed, not if the entire
node is down [13, p. 133], therefore we dont need to test for replica re-
generation if a whole server fails. However, new objects should be written
onto the handoff nodes, so we should check if there is the correct number
of replicas of the new objects.
23
4. T EST D ESIGN
Another kind of damage to the system happens when a disk fills up.
This should be simulated manually and not trough Swift, because it tries
to distribute the files evenly. If a disk is full, it is handled similarly to a
damaged diskhandoff nodes get used to store the data instead.
A similar test should be done when the node isnt completely full, but
has some small amount of space left, and upload a file that is larger than
that. Another edge case would be to have three or more disks already filled
up and try to write new data.
Swift behaves similarly with zones as with servers, i.e. if the system
has only one zone, it behaves as if each server is a separate zone. When we
define zones (groups of servers, usually with independent power supplies),
Swift will try to put a replica into different zones, so that the loss of one
group of servers doesnt cause loss of data availability. Therefore, all the
tests should be repeated with actions like select disk in first data server
replaced by select disk in any server in first zone.
The tests could also be repeated while Swift rebalances are in progress
when a new disk or group of disk is added and the data are being evenly
redistributed in the data center. However, the expected behaviour of this
doesnt seem to be specified and it would first have to be studied, but as a
minimum it should not affect the users file uploads and downloads.
To test network partitions, two Swift regions could be created and the con-
nection between them damaged (imagine cutting the cable between a data
center in Prague and another one in Brno). Swift uses eventual consistency
24
4. T EST D ESIGN
and the latest change should be the one with the priority. This should be
tested by uploading new files into both data centers while the connection is
cut. A user in the first region and another user in the second region would
both write into the same files in a shared project. Afterwards the connection
would be restored and the tests would wait until the replication is done,
then check whether the correct (latest) changes are written into the shared
files.
An observational study could be made of what happens when Swift gets
slowly filled up with data. To study this efficiently, the disks should have a
small capacity. This might be made into a test case, but the test result would
be difficult to measure, as we expect it to fail at some point.
To test a stateless service, all that is required is to damage it and check if the
system still works. The basic outline of such tests is as follows:
The second step could be done with some selected API calls to the ser-
vice, or running a relevant part of Tempest. It should be done immediately
if the service is set up as active/active. Ideally, some API call would be
repeated with high frequency in parallel with the fault injection and mon-
itored for failures. If the component is set up as active/passive or in hot
standby, the service should be checked after a timeout. In case of nova ser-
vices, the command nova-manage should be used to check whether the
damage is shown in the output. Depending on how many nodes there are
in the system, the test can be repeated multiple times, until there is only one
node with the service left.
The most sensitive service could be the messaging queue, which is state-
ful and all the services have to reconnect to it after a failure. A full messag-
ing queue failure should be performed, when all the nodes running it are
restarted, after which the whole OpenStack system should be checked.
The tests should be independent of the specific load balancer and clus-
ter resource manager software if possible. Pacemaker itself (or any other
clustering software used instead of it) has to be tested, since it manages all
the resources and its failure could cause a cascading failure.
25
4. T EST D ESIGN
Controller
nova scheduler, api,
database, storage, etc.
The Figure 4.2 shows how the basic topology for virtual machine cre-
ation and scheduling tests should look like, but it can be extended to more
nodes. The Compute nodes contain the nova-compute service, hypervisor
and the necessary networking services; the Controller has everything else,
for example the Nova scheduler, Keystone, database, storage, etc. A load
balancer or Pacemaker arent necessary.
Nova scheduler selects a Compute server (the one that contains the hy-
pervisor) that is reporting it is functioning properly and has enough re-
sources. It creates the requested VM on it and contacts the other services for
resources (networking, storage). Afterwards, Nova checks whether the VM
was created correctlyif not, it throws it aways and tries it on another node
(by default, it tries this three times) [14]. If it fails even after that, the VM
is set to an error state. By default, the scheduler doesnt remember which
nodes failed before, but there is a filter scheduler which assigns the nodes
weights based on their availability and other factors.
The basic outline of most of the tests:
26
4. T EST D ESIGN
The steps of the tests could be executed again with the next selected
node being damaged, as many times as the scheduler is set to re-try it. A
test case should be made that does the damage more than the set number of
re-tries and would expect that the VM will report an error state, after which
the VM should be removed to test whether it is responsive and doesnt get
stuck.
Instead of damaging the compute service, we can inject fault into the
resource it requires, but only for the short duration of the first attempt at
VM creation, after which the damage should be restored. Examples of such
resource faults:
The tests could be extended to the filter scheduler, where repeated dam-
age and failed VM creation would be done on one node, and it would mon-
itor whether it stops scheduling new VMs onto this node.
Additionally, a study could be done about the behaviour about the sys-
tem when the memory on all nodes starts running out, but since there is no
defined behaviour for this, it should not be made into a test case.
Communication
In related works [3, 4], the ORCHESTRA and ComFIRM tool manipulate
the communication between the nodes of a distributed system by modify-
ing, dropping or delaying messages (see Section 3.2). Most of these faults
dont need to be tested at the OpenStack level and should rather belong
27
4. T EST D ESIGN
Configuration
All of the services could be tested by damaging the configuration files and
restarting the service, as a kind of fuzz testing. This would simulate a hu-
man error which is a main contributor to unscheduled downtime of sys-
tems [11]. When the damage is expected to completely fail the service, the
restart should fail and the system should handle it the same way as in the
high-availability tests described in Section 4.2 (assuming the system is set
up as HA). However, a small damage to the configuration that doesnt fail
the service restart should not break the other services. The behaviour de-
pends on the service under test and might not always be specified, therefore
it requires further study.
28
Chapter 5
The tool is designed to emulate the actions that a person testing the system
by trying to damage services would take, as described in Chapter 3. The
tests are deterministic (though it would be possible to create random tests
too) and are essentially implemented as remote commands to the servers
on which the OpenStack system under test is installed. The created frame-
work, called DestroyStack, keeps complete control of all the nodes. To pro-
vide state restoration, repeatability and isolation, the framework uses virtu-
alization to create snapshots of the system. The Gigan tool [8] used a similar
approach, but DestroyStack doesnt use the virtualization to inject low-level
failures into the hardware, thus it isnt directly dependent on it and can run
on physical hardware too, if state restoration isnt necessary. It is flexible
enough to support multiple topologies and doesnt require that the system
is reinstalled after the injected failures damage the system too much, thus
is resource efficient.
The language of choice is Python, because the whole OpenStack ecosys-
tem uses it and there are Python libraries available for each component,
whereas another language would force us to work on the level of the REST
API and make development slower. To control the servers by remote com-
mands, the Python Paramiko library1 is used to communicate trough the
Secure Shell (SSH) protocol. It doesnt provide the convenient commands
that the deployment tools Fabric2 or Ansible3 have, but sadly they are un-
suitable to be used as libraries from Python, since they expect to have full
control of the program execution, and it would be difficult to integrate them
with the rest of the tools.
The framework uses JSON (JavaScript Object Notation) for configura-
tion, which is an open standard format that is human-readable and easy to
1. http://www.paramiko.org/
2. Fabric is a simple imperative (i.e. sequence of commands) deployment tool written in
Python, available from: http://fabric.readthedocs.org
3. Ansible is also an imperative deployment tool in Python, but provides more higher level
functionality than Fabric, see https://github.com/ansible/ansible
29
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
{
timeout : 3 6 0 ,
servers : [
{
ip : 1 9 2 . 1 6 8 . 3 3 . 1 1 ,
r o l e s : [ s w i f t p r o x y , keystone ]
},
{
ip : 1 9 2 . 1 6 8 . 3 3 . 2 2 ,
e x t r a d i s k s : [ vdb , vdc , vdd ] ,
roles : [ swift data ]
},
{
ip : 1 9 2 . 1 6 8 . 3 3 . 3 3 ,
e x t r a d i s k s : [ vdb , vdc , vdd ] ,
roles : [ swift data ]
}
],
keystone : {
user : admin ,
password : 123456
},
management : {
type : manual
}
}
The server with the roles swift proxy and keystone represents the
Swift proxy server in the diagram, while the other two are the Swift data
servers. The key extra disks points to the disk devices in the /dev/ di-
rectory on the server. They will be used for Swift data, and most of the in-
jected failures in the Swift tests will be performed on them. The keystone
section contains authentication for the OpenStack clients and the manage-
ment part is related to state restoration, which is explained more closely
in Section 5.1. A JSON schema for the configuration file is provided in the
source code, both as documentation and a validation tool.
A user gets an example file like Listing 5.1 and usually just needs to
update the addresses of the servers to match his topology. He can add any
number of servers and assign them the roles that match his installation,
30
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
i f l e n ( manager . g e t a l l ( r o l e = s w i f t d a t a ) ) < 2 \
or manager . g e t ( r o l e s =[ keystone , s w i f t p r o x y ] ) i s None :
r a i s e SkipTest
This allows for flexibility in both the system deployment tools and test
groupingit is possible to create a topology which satisfies only the re-
quirements of a certain subset of the tests that interest us, and theoretically
a topology that satisfies all of them. The latter might not always be possi-
ble, since some tests could require a maximum number of a certain kind of
server roles, another would require a minimum higher than that. In those
special cases, it can be possible to work around it by shutting down the
extra services, but some requirements may be contradictory and the frame-
work has to take it into consideration.
31
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
The roles are also used by the installation script to deploy the system
and decide what to install where based on these data. For more information
on these tools, see Section 5.4.
5.2 Virtualization
Since the nature of the tests doesnt require high performance, using vir-
tual machines for the tested OpenStack system is possible. It allows fast
and flexible deployment, since it doesnt require that the person running
the tool searches for extra hardware, and it can provide virtual networks,
thus allowing us to create various system topologies without manual re-
configuration. Virtualization is a commonly used tool for testing with fault
injection, for example in the Gigan test framework [8].
A problematic behavior may only manifest when a rare ordering of
events occurs. To discover them, we need to be able to run the tests often
and ideally without human intervention. If we used physical machines, a
failure from which the system couldnt recover would stop the test run, and
somebody would have to re-install the system. Using VMs allows us to re-
store the state of the system under test (see Section 5.3) into the state before
the failure, and makes it possible to isolate and repeat the failures. How-
ever, using virtualization is not enforced and the tests dont use it directly.
In Gigan [8], the tests inject failures into CPU registers (causing bit-flips)
using virtualization that gave them direct access to the hardware, but in the
tests we have designed for OpenStack, such tight coupling isnt necessary.
This allows us to use different virtualization managers and even physical
hardware, if state restoration isnt necessary (but could be implemented
even there, see Section 5.3.2, under LVM).
OpenStack should be able to survive the individual tests, but not necessar-
ily a combination of them. The injected faults tend to damage the system
in a way that is difficult to recover from. Some of them would even require
that the whole system must be reinstalled, which is currently a slow pro-
cess. The goal of the framework was to be resource efficient, to make the
full run of tests faster and require less work by the user. State restoration
makes this feasible, along with providing repeatability and isolation of the
tests, also making it possible to even create negative test casestests where
the system is expected to fail, which checks whether the tests themselves
32
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
5.3.1 Manual
5.3.2 Snapshots
33
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
Meta-OpenStack
Since the future users of DestroyStack are testers and developers of Open-
Stack, it is likely they have access to a large and stable OpenStack cloud
where they can create VMs. Thus they would create a number of virtual
machines in a meta-OpenStack, inside of which they would install the sys-
tem that they want to test.
For the tool to use this kind of state restoration, it has to be given the
credentials to the managing OpenStack system, like shown in Listing 5.3,
which is a part of a configuration file as shown in Listing 5.1. In this case, it
is required that DestroyStack is running from a separate server and not on
the tested system, since it cannot restore the state of the instance on which
it is running by itself.
management : {
type : metaopenstack ,
a u t h u r l : h t t p :// myopenstack . com: 5 0 0 0 / v2 . 0 / ,
user : myuser ,
t e n a n t : mytenant ,
password : 1234
}
Listing 5.3: Part of the configuration file when DestroyStack uses meta-
OpenStack snapshots to restore state.
Vagrant
For the users which dont have a meta-OpenStack available, the tool Va-
grant4 has been chosen, since it can easily create and manage VirtualBox5
and libvirt6 virtual machines (although the latter is still in development7 ). It
would have been possible to support both of them with native commands,
but Vagrant allows us to use a simple command to snapshot all the VMs
and provides a unified interface.
4. http://www.vagrantup.com/
5. https://www.virtualbox.org/
6. http://libvirt.org/
7. https://github.com/pradels/vagrant-libvirt
34
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
LVM
The most general solution for state restoration might be LVM8 (Linux Vol-
ume Manager) snapshots, since they could be used on physical machines.
However, the system images available at the time of writing didnt use LVM
and creating a general method of restoring the full contents of a systems
disk, including the root partition, is not trivial. Due to these problems, snap-
shotting with LVM has been left as an option and possible feature in the
future, if there is demand for it.
8. https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)
9. https://github.com/stackforge/packstack
10. https://github.com/redhat-openstack/khaleesi
35
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
Theoretically, the framework can be used for most kinds of tests. There are
no artificial restrictions, you can run any command on any of the systems
servers, and the state restoration is a tool you can use, but dont have to if it
is not necessary. However, it is recommended that if a test doesnt require
direct access to the servers or the state restoration, it should be put into the
main test suit, Tempest (see Section 3.2.1).
DestroyStack was not designed for performance or scalability tests
state restoration usually requires that the nodes are virtual machines. If the
performance tests dont need state restoration or if the alternative methods
get developed (LVM snapshots, some manual method), it could be possible
to use it this way.
The framework is mainly designed to provide tools for OpenStack tests,
but it isnt tightly coupled with it and the tools could be used for tests of
other systems. Especially the state restoration functionality is helpful for
general fault-injection testing. If necessary, DestroyStack could be split into
two projectsthe tools and the tests. In that case, the set of tools could be
packaged and imported into other projects.
Because of the state restoration, the framework requires that the nodes
are virtual machines in one of the supported virtualization managers and
the user can snapshot them. However, not everybody has access to these
kinds of resources. It is still possible to use the best-effort manual restora-
tion or disable the restoration completely, but this might not be able to run
all the tests successfully (see Section 5.3).
The tests could be used on a system before it is deemed stable and
production-ready. In this case, the state restoration would be disabled and
the administrator would run a single test that would verify if his setup is
truly highly available and reliable. Preferably, the Tempest test suite would
be run before and after this destructive test. However, while Tempest can be
used even on a production system, because it shouldnt cause any damage
to it, using DestroyStack would be dangerous and might cause downtime.
Therefore, the fault injection tests should not be used as a setup verification
mechanism.
This section contains an overview of ideas for new functionality and ele-
ments that have been delegated to other tools.
36
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
Physical Hosts
Logging
Collection of the system logs is left to the script that will be running the
tests, which will most likely be Khaleesi in the future. Khaleesi already sup-
ports this. DestroyStack only collects the logs from the framework and tests
themselves. There are already multiple logging monitoring tool that can
match the events from the system logs and DestroyStack logs and display
the evens in a human readable format. The OpenStack Ceilometer service
provides this functionality, but its installation and usage is currently left to
the user and collecting these data will be also left to Khaleesi in the future.
37
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
The tests use the Python nosetests11 framework. It was chosen because at
the time the project was created, the main set of OpenStack tests, Tempest,
was using it too. It is usually used for unit testing (testing of small units
of code), but it has features like module and package-level test setups that
make it an acceptable tool for other kinds of tests. It is able to collect the
output from the tests and report the results in XML format.
The basic outline of how tests are implemented is in the template file
shown in the Listing 5.4, meant to be used as a starting point for creating
new tests. The ServerManager object keeps track of all the servers and
provides functions to filter them by role, and to save and restore the state.
The requirements function on line 5 specifies on what conditions should
the tests run, as described in Section 5.1. A snapshot of the system is taken
on line 19, in the setupClass method, which is ran only once per group of
tests. If the snapshots already exist, they wont be created again, so its pos-
sible to create multiple groups of tests and the operation wont be repeated.
On the other hand, a group of tests can specify a tag for it and thereby create
its own set of snapshots. After the state of the system is saved, the setUp
method is executed and a file is created on one of the servers. The test on
line 28 only checks if the file exists. After this, the state of the system is
restored in the tearDown method and the file wont be there anymore.
11. https://nose.readthedocs.org
38
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
5.6.1 Results
Some of the Swift tests from Section 4.1, were implemented as a demonstra-
tion of how the framework is to be used. The nosetests tool finds all the tests
in the directory and prints the results on the command line, while collecting
the log output and standard output from the tests, as shown in Listing 5.5.
The results are also written into an XML file that can be used by other tools.
So far, DestroyStack just found issues related to the OpenStack deploy-
ment tool.12 However, the test two disks down, third later (also described
12. Bug #1072070 - Packstack fails if more than one Swift disk is specified
Bug #1072099 - Cannot specify Swift disk, only loopback device
Bug #1020480 - swift-init exits with 0 even if the service fails to start
39
5. F RAMEWORK D ESIGN AND I MPLEMENTATION
TestSwiftSmallSetup
test disk replacement #1 FAIL
test one disk down #2 OK
test one disk down restore #3 OK
test two disks down #4 OK
test two disks down third later #5 FAIL
in Listing 4.2) is failing in approximately 50% of the test runs, because only
two copies of the objects get found instead of three. It could be an issue
with Swift, but so far the failure wasnt traced down and reported. The disk
replacement test failure is probably just an issue with the test implementa-
tion, since it was successful before the tools got recently redesigned.
It takes approximately two minutes to execute a Swift test successfully,
because the replica regeneration takes some time. In case of an unsuccessful
test, it depends on the timeout set in the configuration. The time expanse
of the state restoration depends on the speed of the system running the
virtual machines. It usually takes between one and five minutes to take
the snapshots of all the virtual machines, which is done in parallel and so
it shouldnt get worse when bigger topologies are added, though it might
cause a big increase in I/O operations on the underlying system. Restoring
the VMs back to the state before the failure takes another 1-2 minutes, but is
also done in parallel. A full test run, not including the system deployment,
therefore currently takes approximately 20 minutes.
40
Chapter 6
Conclusion
1. https://github.com/mkollaro/destroystack
41
Appendix A
Attachments
destroystack/test *
Fault-injection tests, each file contains a group
of tests that have similar requirements about
the tested system.
destroystack/test template.py.sample
Template for tests, new users can use it as start-
ing point to understand the tests.
destroystack/tools/
Source code of the framework, containing server
management tools and state restoration mech-
anisms.
etc/
Configuration file samples.
etc/schema.json
JSON schema specifying the configuration for-
mat and options.
README.md
Project description and usage tutorial, similar
to the contents of Appendix B.
TEST PLAN.md
Simple description of the tests with ASCII draw-
ings of the required topologies, similar to the
contents of Chapter 4, but less detailed.
42
Appendix B
User tutorial
B.1 Requirements
You will either need access to some VMs running in an OpenStack cloud
or VirtualBox locally (script for setting them up VirtualBox VMs is already
provided). Using VMs is necessary because the machines are being snap-
shotted between the tests to provide test isolation and recover from faults
that damaged the system. Support for Amazon AWS and libvirt VMs might
be added in the future. If you need bare metal, you can add support for LVM
snapshotting, or you can use the manual best-effort recovery.
The tests dont tend to be computationally intensive. For now, you should
be fine if you can spare 2GB of memory for the VMs in total. Certain topolo-
gies need extra disks for Swift, but their size isnt important - 1GB is enough
per disk.
So far, it has been tested only with RHEL and Fedora Linux, plus the
OpenStack versions RDO Havana or RHOS 4.0 (Red Hat OpenStack), in-
stalled by Packstack1 . The tests themselves dont really care what or how
is it deployed. The tests use the nosetests framework and the OpenStack
clients, both of which will be installed as dependencies if you install this
repository with python-pip.
1. https://github.com/stackforge/packstack
43
B. U SER TUTORIAL
You can try the tests out with Vagrant and VirtualBox (libvirt may be added
later). While easier to use, it isnt fast - creating the virtual machines will
take a few minutes, installing OpenStack on them another 15 minutes and
the tests themselves take a while to run.
1. install the latest version of Vagrant2 and VirtualBox3
2. install Vagrant plugin for creating snapshots
$ cd destroystack/
$ vagrant up
$ cp etc/config.json.vagrant.sample etc/config.json
$ python bin/packstack_deploy.py
9. run tests
$ nosetests
2. http://www.vagrantup.com/downloads.html
3. https://www.virtualbox.org/wiki/Downloads
44
B. U SER TUTORIAL
4. https://github.com/redhat-openstack/khaleesi
45
Bibliography
[3] D AWSON , S., J AHANIAN , F., M ITTON , T., AND T UNG , T.-L. Testing
of fault-tolerant and real-time distributed systems via protocol fault
injection. In Fault Tolerant Computing, 1996., Proceedings of Annual
Symposium on (1996), IEEE, pp. 404414.
[5] F IFIELD , T., F LEMING , D., G ENTLE , A., H OCHSTEIN , L., P ROULX , J.,
T OEWS , E., AND T OPJIAN , J. OpenStack Operations Guide. OReilly
Media, May 2014. Available at http://docs.openstack.org/
ops/.
[6] F ORD , D., L ABELLE , F., P OPOVICI , F., S TOKELY, M., T RUONG , V.-A.,
B ARROSO , L., G RIMES , C., AND Q UINLAN , S. Availability in globally
distributed storage systems. In Proceedings of the 9th USENIX Sym-
posium on Operating Systems Design and Implementation (2010).
46
[9] J IANG , W., H U , C., Z HOU , Y., AND K ANEVSKY, A. Are disks the dom-
inant contributor for storage failures?: A comprehensive study of stor-
age subsystem failure characteristics. Trans. Storage 4, 3 (Nov. 2008),
7:17:25.
[11] M ARCUS , E., AND S TERN , H. Blueprints for High Availability: De-
signing Resilient Distributed Systems. John Wiley & Sons, Inc., 2003.
47
Index