Sei sulla pagina 1di 6

A Study on Fault Tolerance methods in Cloud

Computing

Amal Ganesh Dr. M.Sandhya Dr. Sharmila Shankar


Research Scholar Dept. Of Computer Science Dept. Of Computer Science
Dept. Of Computer Science B.S Abdur Rahman University B.S Abdur Rahman University
B.S Abdur Rahman University Chennai, India Chennai, India
Chennai, India sandhya@bsauniv.ac.in sharmilasankar@bsauniv.ac.in
amalganesh@gmail.com

Abstract—The emergence of Cloud Computing has brought new


dimension to the world of information technology. Even though
Cloud Computing provide various benefits like agility, on-
demand provisioning of resources, reduced cost, multi-tenancy
etc., there are risks and flaws associated with it. One key
research challenge in Cloud Computing is to ensure continuous
reliability and guaranteed availability of resources provided by
it. So there is a need for a robust Fault Tolerant (FT) system in
Cloud Computing. To better understand FT in Cloud
Computing, it is essential to understand the different types of
faults. In this paper, we highlight the basic concepts of fault
tolerance by understanding the different FT policies like Reactive
FT policy and Proactive FT policy and the associated FT
techniques used on different types of faults. A study on various
fault tolerant methods, algorithms, frameworks etc., has been
carried out which are developed and implemented by research
experts in this field. This is an area where lot of research is
happening and these studies will guide us to build a robust FT
technique in Cloud.

Keywords—Cloud Computing, Fault Tolerance.

Fig. 1. Overview of Cloud Computing


I. INTRODUCTION The Figure 1 provides an overview and the different
The advent of Cloud Computing is considered to be the services provided by Cloud Computing. Based on the services
single largest change in Information Technology. This change provided, Cloud is differentiated into Software as a Service
has stimulated everyone, from individuals to community and (SaaS), Platform as a Service (PaaS) and Infrastructure as a
the large corporations. Now, the popularity of using Cloud are Service (IaaS). In SaaS, Cloud providers develop, host and
widely accepted that organizations are moving their traditional operate domain specific applications which can be accessed by
information processing systems to Cloud services for storing the end users on a pay-as-you-go demand basis. In PaaS, the
large volumes of data. platform is provided as service where application developers
can build application without any stress of managing or
A gold standard definition for Cloud Computing is provided buying large servers or underlying developing tools. And in
by National Institute of Standards and Technology, USA.
IaaS, infrastructure resources like operating system, storage,
According to NIST, “Cloud Computing is a model for enabling
convenient, on-demand network access to a shared pool of processors and networking components are offered as service
configurable computing resources (e.g., networks, servers, where companies can deploy and run arbitrary software.
storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or It is notable that Cloud Computing is not a new technology,
service provider interaction.” instead it brings together the existing technologies like
Virtualization and Utility-based pricing to perform an
operation in a more efficient and cost effective manner. The

978-1-4799-2572-8/14/$31.00 2014
c IEEE 844
benefits offered by Cloud Computing are immense that it has 1) Data Failures: This type involves failures due to
brought a new dimension to provide services and resources to corruption of data, missing source data and other flaws in the
its users. data.
A. Key Characteristics of Cloud Computing
2) Computation Failures: It involves all types of hardware
• Improves agility as users can easily and or infrastructure failures like faulty or slow VMs, storage
inexpensively facilitate technological resources. access exception, etc.
• Reduce cost by converting capital expenditure to
operational expenditure. End users do not need to Most of the applications hosted by the Cloud are real-time
purchase and manage hardware, servers etc. High Performance Computing (HPC) systems which require
• Can be accessed from any location using variety of higher level of fault tolerance. According to studies conducted
devices like smart phones, laptops, PCs etc. having by Schroeder and Gibson [3], most of the faults in Cloud
internet connectivity. occur due to hardware failures mainly in processors, hard disk
• Multi-tenancy enables users to share the resources drive, integrated circuit sockets, and memory. There are large
and cost among large pool of users allowing number of processors provisioned in Cloud. These processors
increased peak capacity and efficient usage of under- create virtual instances, communication links, integrated
utilized resources. circuit sockets etc. One way or another, these processors are
• Provide scalability via on-demand provisioning of prone to failure. It has been predicted that a system with
resources on a real time basis. 1,00,000 processors will experience a processor failure every
few minutes [4]. Other than hardware faults, there are other
Even though, Cloud Computing is a general trend in all faults such as software faults causing application failure and
industries; there are issues in Cloud which need to be network faults due to server overload, network congestion etc.
addressed. One such issue is to ensure continuous reliability which inhibits the communication between the Cloud and the
and guaranteed availability of the services provided by Cloud end users. So there is a need for having seasoned fault
Computing. Although the services offered by Cloud are tolerance method which manages faults in diverse aspects.
beyond the traditional approach, benefits are always
accompanied with some risks & failures. For example, This paper is organized as follows. Section II describes the
Amazon’s Elastic Compute Cloud (EC2) experienced failure concepts of fault tolerance, Section III summarizes the related
in Elastic Block Storage (EBS) drives and network works & draws analytical comparison among different FT
con¿guration, bringing down thousands of hosted applications models, and finally Section IV presents the conclusion.
and websites for 24-72 hours [1].
II. FAULT TOLERANCE – AN OVERVIEW
In a conventional system, Fault Tolerance (FT) deals with
quick repairing and replacement of faulty devices to retain the
system. Whereas in Cloud Computing, fault tolerance is the
ability of the Cloud to withstand the abrupt changes which
occur due to hardware faults, software faults, network
congestions etc.

In a Cloud network, fault management depends on two


important parameters namely Recovery Point Objective (RPO)
and Recovery Time Objective (RTO). The metric RPO defines
the amount of data to be lost during a fault or disaster whereas
RTO determines the minimum downtime for recovering from
faults. A healthy Cloud will have minimum value for both RPO
and RTO [5].

There are mainly two standard Fault Tolerant (FT) policies


Fig. 2. Different Layers in a Cloud available for real-time applications hosted in Cloud namely
Proactive Fault Tolerance Policy and Reactive Fault Tolerant
As shown in Figure 2, a single Cloud consists of different Policy. Based on these fault tolerant policies, various
layers which can be affected with various types of faults. So techniques are used to provide fault tolerance.
these layers requires different levels of fault tolerant A. Proactive Fault Tolerance
techniques in order to provide seamless service. The failures
The principle of proactive fault tolerance policy is to avoid
that occur in Cloud Computing can be classified into two
failures by proactively taking preventative measures. These
classes namely [2]:
measures are reserved by studying the pre-fault indicators and

2014 IEEE International Advance Computing Conference (IACC) 845


predicting the underlying faults. The second step is to apply techniques are more efficient, they are not often used
proactive remedial measures at the development time by compared to reactive techniques. This is because the system is
changing the code or replacing the components which are less affected by incorrect predictions due to proactive fault
prone to failure. Proactive fault tolerance makes sure that the tolerance and reactive methods are relatively simple to
job gets done completely without any reconfiguration. implement as FT techniques are not applied during the
Preemptive migration and Software Rejuvenation are two fault development time. On the other hand, reactive techniques may
tolerant techniques which are used based on proactive fault not be suitable for systems which require higher availability of
tolerance policy [6]. VMs or clusters because once a failure occurs, availability
decreases dramatically.
1) Preemeptive Migration: It makes use of feedback-loop
control system where applications are constantly monitored III. RELATED WORK
and analyzed. Alain Tchana et al [9] propose a fault tolerance method in
which both the Cloud customers and providers will
2) Software Rejuvenation It is a technique in which collaboratively share their responsibilities in order to provide
periodic reboots are scheduled for the system. After each the required fault tolerance. According to Tchana, application
reboot, the system resumes with a clean state. faults can be detected and repaired at the customer level. But
the Virtual Machine (VM) and Hardware faults can be
B. Reactive Fault Tolerance detected & repaired at the Cloud provider level. The
Reactive fault tolerance policy deals with measures which recovery/restoration of the applications running on the
are applied to reduce the effect of the faults already occurred refurbished VMs can be requested and performed at the
in Cloud. Some of the fault tolerant techniques which are used customer level. Checkpointing technique is used to create
based on the reactive fault tolerance policy are restore points for the recovered VMs.
Checkpointing/Restart, Replication & Task Resubmission.
Wenbing Zhao et al [10] put forward a Low Latency Fault
1) Checkpointing/Restart: When a failure occurs, the Tolerance (LLFT) middleware framework for fault tolerance
applications can be restarted from the checkpoint prior to the which replicates the processes of applications, using the
point of failure rather than rebooting from starting point. It is leader/follower replication approach. This framework is
an efficient fault tolerance technique for high computation- equipped with a LLFT Messaging Protocol which ensures
reliable communication between the replicated processes.
intensive applications hosted in Cloud.
LLFT Membership Protocol ensures that the entire replicated
process group has a consistent view of their membership.
2) Replication: Replication is a popular fault tolerance
technique used based on reactive fault tolerance policy. Ravi Jhawar et al point out a method to exploit the
Replication in Cloud Computing is the process of keeping virtualisation layer to offer required fault tolerance properties
multiple copies of data or object [7][8]. In a replication to the applications as an on-demand service. This is done by
technique, client requests for a copy from a set of replicas. adding a service layer which acts as a Fault Tolerance
Different replicas run using different resources until the task is Middleware (FTM) providing the required properties to
completed or crashed. The tools like HAProxy, Hadoop and facilitate revamp support to its applications [1][11][12].
AmazonEC2 can be used to provide replication in a Cloud.
One challenge of replication process is that it adds redundancy Sidirglou et al [13] discuss about Assure, an autonomous
in the system. Another task is to maintain consistency among fault management system in Cloud environment by
replica, replica management, degree of replication etc. A introducing rescue point technique. However, one potential
replication protocol can be used to provide consistency problem is that there is a chance that one of the major rescue
between the replicas of the same object. Consistency issue will points will be called frequently during normal execution
be raised when, only one of the replicas is updated by a user. creating higher overhead for server applications.
Also, as the number of replicas increase the cost of
maintaining the consistency also increases. Shelp [14] is another autonomous FT system proposed by
Gang Chen et al which uses checkpointing as FT technique in
3) Task Resubmission: When a fault is detected, the task is virtual environment.
submitted either to the same or to a different resource at a
runtime without interrupting the workflow of the system [6]. Liying Wu et al [15] propose a Dynamic Data Fault-
Tolerance System for Cloud storage (DDFMCS). DDFMC
Both proactive and reactive fault tolerance policies have implement lightweight conversion of data using Hadoop and
advantages and disadvantages. There are some experiment the experimental result shows that DDFMCS can save the
results which clearly points out that migration technique storage space and helps to improve the performance of data
(proactive FT policy) are efficient than checkpoint/restart access. DDFMCS dynamically determines the various data for
technique (reactive FT policy). Even though proactive fault tolerant mechanisms such as the file access frequency

846 2014 IEEE International Advance Computing Conference (IACC)


ratio stored in the file access frequency table, the number of in N=2f+1 replicas ensuring that only f+1 active replicas
file fault tolerance conversions and the time at which files are execute during the intrusion-free stage. While the remaining
stored in the system. replicas are all put into passive mode. The traditional
Byzantine fault-tolerant algorithm tolerates f faulty replicas
K. Ganga [6] et al discuss about the different fault tolerance require N = 3f +1 replicas.
techniques in Cloud Computing and concentrates on how task
replication is done in Cloud Computing based on scientific In [23], Altino M. Sampaio et al discuss about two
workflow. algorithms namely; MTTE and RTTE algorithms which
dynamically map Virtual Machines (VMs) to Physical
In another paper, the behaviour and performance of the Machines (PMs), subjected to PM failure. Their objective is to
hybrid Cloud is analyzed using Queuing Petri nets model by reduce the faults by increasing the power efficiency in Cloud
Min Lu et al [16]. It can also be used to formulate a fault Computing infrastructure with lower impact on the
tolerant strategy which recovers the virtual node failure performance required by the users. For a particular job
occurring at resource provision phase. execution request, Minimum Time Task Execution algorithm
(MTTE) provides full access to resources and particular task
Naixue Xiong [17] et al talk about Self-tuning Fault can access maximum necessary resources at minimum
Detection system (SFD) which detects faults in Cloud necessary time for task completion. Whereas Relative Time
Computing. According to them, unlike other fault detecting Task Execution (RTTE) uses Xen credit scheduler which
systems SFD can adjust fault detecting control parameters strictly reserve the necessary amount of resources required for
which ensures better fault detection. task completion within its deadline.

Jameela Al-Jaroodi [18] et al propose a delay-tolerant fault JiSu Park [24] et al concentrate on mobile Cloud
tolerance algorithm which adapts failures by effectively Computing and provide a monitoring technique towards fault
reducing execution time and thus minimizing the fault tolerance. In mobile Cloud Computing, mobile devices are
discovery & recovery overhead in the Cloud. The algorithm used as resource which is unstable as the state information
claims to be used efficiently in places like Cloud which changes dynamically. Based on Markov Chain model, a
handles distributed tasks. According to them, the algorithm monitoring technique is created to collect the state
ensures that data gets downloaded reliably from replicated information. This state information is necessary to calculate
servers and efficiently executing applications on independent the reliability of fault tolerance in mobile Cloud Computing.
multiple distributed servers in the Cloud. With this technique, it is possible to change monitoring time
interval dynamically.
Yilei Zhang [19] et al propose a BFTCloud, a Byzantine
Fault Tolerant framework for Cloud Computing. Replication Guisheng Fan et al [25] put forward a model based
technique is used to provide the basic fault tolerance. In Byzantine fault detection technique. In this technique, Cloud
addition to it, BFTCloud select voluntary nodes based on QoS Computing Fault Net (CFN) is created to model the different
characteristics and reliability performance. According to components of Cloud Computing such as service resources,
authors, their extensive experiments on various types of Cloud detection and failure process etc. Petri net is used to create the
environment shows that BFTCloud guarantees robustness of different components of Cloud Computing which gets
integrated dynamically into CFN model. Based on CFN model,
systems when up to ݂ of totally 3݂ + 1 resource providers are
the properties of the components are analyzed developing a
faulty, including crash faults, arbitrary behaviour faults, etc. fault detection strategy at each level which dynamically detects
But Giuliana Santos Veronese [20] et al claim that it is the faults in the execution process.
possible to reduce the number of replicas to 2f + 1 preserving
the same properties of traditional BFT algorithms. This is Thanyalak Chalermarrewong [8] et al propose a fault
achieved by using a simple trusted service which will reduce management framework which provide emphasis on hardware
the number of replicas, which in turn reduces the cost of fault tolerance. An ARMA model with a fault tree and fault
infrastructure in Cloud. Peter Garraghan [21] et al also discuss analysis technique is employed which act as proactive fault
about harnessing the potential and feasibility of Byzantine tolerance techniques to predict the system failures. Based on
fault tolerant system by developing a framework called FT-FC the prediction, the resource manager decides whether the
and apply them to federated . machine requires task migration to prevent possible fatalities.
In order to get accurate prediction results, the framework
Yuesheng Tan [22] et al suggest a better fault tolerant includes a model adequacy checking function which can be
system compared to tradition Byzantine fault tolerant used to adjust the prediction model as required.
algorithm. They developed a virtualization intrusion tolerance
system by adopting the method of hybrid fault model. This FT In another paper, Magdalena Slwainska et al [26] focus on
model undergo active and passive replicas, updating the state heterogeneity in Cloud Computing named Unibus. This paper
of the system, state transfer and proactive recovery. Their discuss on how to employ Unibus to orchestrate the resources
results show that the system allows tolerating f faulty replicas

2014 IEEE International Advance Computing Conference (IACC) 847


FT PROPERTIES
SI FT MODEL PROACTIVE REACTIVE PERFORMANCE RESPONSE TIME RELIABILTY
NO (Y/N) (Y/N) (H/L/A) (H/L/A) (H/L/A)
1 LLFT N Y H A H
2 FTM N Y N A H
3 ASSURE Y N H A A
4 SHelp N Y A H L
5 SFD Y N H A A
6 BFTCloud N Y H A H
7 VFT Y Y H H H

TABLE I. COMPARATIVE ANALYSIS OF FT MODELS

(Yes = Y, No = N, High = H, Low = L, Average = A)

and provide fault tolerance platform capable of executing are used to analytically evaluate these FT models. Table I
messages using Message Passing Interface (MPI). In order to illustrates the comparison among FT models based on these
support fault tolerance in Unibus, a Distributed MultiThread parameters. The different parameters are:
Checkpointing (DMTCP) is used which enables checkpointing 1. Type of FT technique - which can be proactive or reactive.
at the end user level.
2. Performance – checking the efficiency of the system.
Sheheryar Malik et al [27] propose a fault tolerance model
3. Response Time – amount of time required to respond to a
for real time Cloud Computing. In this model, the faults are
particular procedure or algorithm. The value should be
managed based on the reliability of processing nodes or virtual
minimum.
machine. According to authors, the reliability of nodes
changes in every computational cycle. The proposed fault 4. Reliability - which targets to give accurate results within a
tolerance model collects and analyses the performance or real time environment.
reliability metrics of a particular virtual machine. If a
particular VM can produce the correct results within the IV. CONCLUSION
speculated time, that node or VM is considered to be worthy
Over the past years, Cloud Computing has become a
node and its reliability increases. There is a minimum value
popular computational technology across all industries. Cloud
for reliability for which a particular node is to be considered brings forth vast advantages like providing access to large
worthy or fault tolerable VM. And if a node fails to produce amount of data & resources, on-demand service provisioning,
the minimum result within the specified time, its reliability reduced cost of managing the infrastructure etc. making it
decreases and the system undergoes backward recovery or unique from other technologies. As the famous quote says,
safety measure. with great power, comes great responsibility, Cloud Computing
with its immense benefits has to ensure continuous reliability
And finally, Pranesh Das et al [28] propose a Virtualization and guaranteed availability of the services provided. So there is
and Fault Tolerance (VFT) technique by increasing the system a need for an efficient fault tolerance method which shields the
availability and reducing the service time. This reactive fault Cloud from faults or failures. In this paper, we concentrate on
tolerant technique consists of a Cloud Manager (CM) module the standard fault tolerant concepts in Cloud Computing. Since
and a Decision Maker (DM) which are used to manage the Cloud Computing is a new field of research compared to other
virtualization, load balancing and to handle the faults. The first technologies, lot of research works are being carried out,
step involves virtualization & load balancing and in the second especially in developing a standalone fault tolerance method.
step fault tolerance is achieved by redundancy, checkpointing There are numerous FT methods proposed by the research
and fault handler. The virtualization includes a fault hander. experts in this field. Our ultimate aim is to analyze these FT
Not all the faults are recoverable. Fault handler finds these methods, understand the limitations and to develop a FT
unrecoverable faulty nodes and restricts these virtual nodes method which manages all type of faults in diverse aspects.
from future requests or usage. It also helps to remove the
temporary software faults from recoverable nodes making them V. REFERENCES
available for future requests.
[1] R. Jhawar, V. Piuri, and M. D. Santambrogio, "A comprehensive
conceptual system-level approach to fault tolerance in cloud
Based on some metrics obtained from these FT models, an computing." In Systems Conference (SysCon), 2012 IEEE International,
analytical comparison is done on some of the generally used pp. 1-5. IEEE, Mar 2012.
FT models. A certain number of parameters or FT properties

848 2014 IEEE International Advance Computing Conference (IACC)


[2] J. Li, M. Humphrey, Y.W. Cheah, Y. Ryu, D. Agarwal, K. Jackson, and Technologies (EIDWT), 2013 4th International Conference on, pp. 95-
C. van Ingen. "Fault tolerance and scaling in e-Science cloud 99. IEEE, 2013.
applications: observations from the continuing development of
MODISAzure." In e-Science (e-Science), 2010 IEEE 6th International [16] M. Lu, and H. Yu. "A Fault Tolerant Strategy in Hybrid Cloud Based on
Conference on, pp. 246-253. IEEE, 2010 QPN Performance Model." In Information Science and Applications
(ICISA), 2013 International Conference on, pp. 1-7. IEEE, 2013.
[3] B. Schroeder, and G. A. Gibson. "A large-scale study of failures in high-
performance computing systems." Dependable and Secure Computing, [17] N. Xiong, A. V. Vasilakos, J. Wu, Y. R. Yang, A. Rindos, Y. Zhou, W.
IEEE Transactions on vol. 7, no.4, pp. 337-351, Oct 2010. Z. Song, and Y. Pan. "A self-tuning failure detection scheme for cloud
computing service." In Parallel & Distributed Processing Symposium
[4] I. P. Egwutuoha., S. Chen, D. Levy, and B. Selic, “A Fault Tolerance (IPDPS), 2012 IEEE 26th International, pp. 668-679. IEEE, 2012.
Framework for High Performance Computing in Cloud.” In Cluster,
Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM [18] J. Al-Jaroodi, N. Mohamed, and K. Al Nuaimi. "An Efficient Fault-
International Symposium on pp. 709-710, May 2012. Tolerant Algorithm for Distributed Cloud Services." In NCCA, pp. 1-
8.IEEE, 2012.
[5] J. Panneerselvam, Lu Liu, R. Hill, Yongzhao Zhan, and Weining Liu.
"An Investigation of the Effect of Cloud Computing on Network [19] Y. Zhang, Z. Zheng, and M. R. Lyu. "BFTCloud: A byzantine fault
Management." In High Performance Computing and Communication & tolerance framework for voluntary-resource cloud computing." In Cloud
2012 IEEE 9th International Conference on Embedded Software and Computing (CLOUD), 2011 IEEE International Conference on, pp. 444-
Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, 451. IEEE, 2011.
pp. 1794-1799. IEEE, 2012.
[20] G. S. Veronese, M. Correia, A. N. Bessani, L. C Lung; P. Verissimo,
[6] K. Ganga., and S. Karthik. "A fault tolerent approach in scientific "Efficient Byzantine Fault-Tolerance," Computers, IEEE Transactions
workflow systems based on cloud computing." In Pattern Recognition, on , vol.62, no.1, pp.16,30.IEEE, Jan. 2013.
Informatics and Medical Engineering (PRIME), 2013 International
Conference on, pp. 387-390. IEEE, 2013.
[21] P. Garraghan, P. Townend, and J. Xu. "Byzantine fault-tolerance in
federated cloud computing." In Service Oriented System Engineering
[7] I. Rodero, F. Guim, J. Corbalan, "Evaluation of coordinated Grid (SOSE), 2011 IEEE 6th International Symposium on, pp. 280-285.
scheduling strategies." In High Performance Computing and IEEE, 2011.
Communications, 2009. HPCC'09. 11th IEEE International Conference
on, pp. 1-10. IEEE, 2009.
[22] Y. Tan, D. Luo, and J. Wang. "Cc-vit: Virtualization intrusion tolerance
based on cloud computing." In Information Engineering and Computer
[8] T. Chalermarrewong, T. Achalakul, and S. C. W. See. "The design of a Science (ICIECS), 2010 2nd International Conference on, pp. 1-6. IEEE,
fault management framework for cloud." In Electrical 2010.
Engineering/Electronics, Computer, Telecommunications and
Information Technology (ECTI-CON), 2012 9th International
[23] A..M. Sampaio, J.G. Barbosa, "Dynamic Power- and Failure-Aware
Conference on, pp. 1-4. IEEE, 2012.
Cloud Resources Allocation for Sets of Independent Tasks," Cloud
Engineering (IC2E), 2013 IEEE International Conference on , vol., no.,
[9] A. Tchana, L. Broto, and D. Hagimont. "Approaches to cloud computing pp.1,10, 25-27.IEEE, March 2013
fault tolerance." In Computer, Information and Telecommunication
Systems (CITS), 2012 International Conference on, pp. 1-6. IEEE, 2012. [24] J. Park, H. C. Yu, K. S Chung, and E. Y. Lee. "Markov chain based
monitoring service for fault tolerance in mobile cloud computing." In
[10] W. Zhao, P. M. Melliar-Smith, and L. E. Moser. "Fault tolerance Advanced Information Networking and Applications (WAINA), 2011
middleware for cloud computing." In Cloud Computing (CLOUD), 2010 IEEE Workshops of International Conference on, pp. 520-525. IEEE,
IEEE 3rd International Conference on, pp. 67-74. IEEE, 2010. 2011.

[11] R. Jhawar, V. Piuri, and M. D. Santambrogio, "Fault tolerance [25] G. Fan, H. Yu, L. Chen, and D. Liu. "Model Based Byzantine Fault
management in IaaS clouds." In Satellite Telecommunications (ESTEL), Detection Technique for Cloud Computing." In Services Computing
2012 IEEE 1st AESS European Conference on, pp. 1-6. IEEE, 2012. Conference (APSCC), 2012 IEEE Asia-Pacific, pp. 249-256. IEEE,
2012.
[12] R. Jhawar, V. Piuri, and M. D. Santambrogio, "Fault Tolerance
Management in Cloud Computing: A System-Level Perspective," [26] M. Slawinska, J. Slawinski, and V. Sunderam. "Unibus: Aspects of
Systems Journal, IEEE , vol.7, no.2, pp.288-297, June 2013 heterogeneity and fault tolerance in cloud computing." In Parallel &
Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010
[13] S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. IEEE International Symposium on, pp. 1-10. IEEE, 2010.
Keromytis. "Assure: automatic software self-healing using rescue
points." In ACM Sigplan Notices vol. 44, no. 3, pp.37-48, 2009. [27] S. Malik, and F. Huet. "Adaptive Fault Tolerance in Real Time Cloud
Computing." In Services (SERVICES), 2011 IEEE World Congress on,
[14] G. Chen,, H. Jin, D. Zou, B. B. Zhou, W. Qiang, and G. Hu. "SHelp: pp. 280-287. IEEE, 2011.
Automatic Self-healing for Multiple Application Instances in a Virtual
Machine Environment." In Cluster Computing (CLUSTER), 2010 IEEE [28] P. Das, and P. M. Khilar. "VFT: A virtualization and fault tolerance
International Conference on, pp. 97-106. IEEE, 2010. approach for cloud computing." In Information & Communication
Technologies (ICT), 2013 IEEE Conference on, pp. 473-478. IEEE,
[15] L. Wu,, B. Liu, and W. Lin. "A Dynamic Data Fault-Tolerance 2013.
Mechanism for Cloud Storage." In Emerging Intelligent Data and Web

2014 IEEE International Advance Computing Conference (IACC) 849

Potrebbero piacerti anche