Sei sulla pagina 1di 14

1

Cloud Forensics
Irina Mihai, Catalin Leordeanu, Alecsandru Patrascu
Automatic Control and Computers Faculty
POLITEHNICA University of Bucharest
Email: oana.irina.mihai@gmail.com, catalin.leordeanu@pub.ro, alecsandru.patrascu@gmail.com

Abstract
In the last years the Cloud environment has become a major attraction not only for developers, but even for
ordinary users. Not worrying about where to keep your data or where to run your applications most definitely
describes an ideal context in the digital world of today. If we add scalability, flexibility and low costs to that we
could just say Evrika and enjoy using the Cloud. A thing that should not be forgotten is that such massive and
popular services have always become main targets for cybercriminals who plan attacks not necessarily for money or
fame, but sometimes out of pure curiosity. This is where Cloud Forensics comes in, providing the necessary means
for tracking this attacks and offering security solutions for the still-evolving Cloud.
The scope of this paper is to give a general view of the Cloud - when, where and why it appeared - and to present
some of its existing implementations together with the challenges in conducting a Cloud Forensics investigation.
Several approaches for Cloud Forensics are presented and also an idea for future research in this area.

I. I NTRODUCTION
There is a very real phenomenon that people are not aware what the software they use is doing behind its nice
interface. Sometimes people dont know if and how a software is using the Internet, furthermore the Cloud. Another
phenomenon just as real is that other people are very much aware that others have no idea that they are using the
Cloud. With the continuous growth of this new type of environment, who could really keep track of everything?
The Cloud impresses not only with its novelty status, but mostly with what is wants and has to offer: flexibility,
redundancy, fast data transfers, practically unlimited pool of resources and everything on demand and not at a
high price. There hasnt been yet found a complete definition for this new emerged paradigm, but until now it has
been worldwide agreed that the Cloud comes with three types of services - Infrastructure as a Service, Platform
as a Service, Software as a Service - and four deployment models - Private Cloud, Public Cloud, Hybrid Cloud,
Community Cloud - some of which will be presented in the present paper.
There are many companies and institutions that have chosen to implement a Cloud solution. Amazon, Microsoft
and Google offer Public Cloud Solutions, part of them free, others paid. There are also projects for developing
Private Clouds, each with its own structural and logical architecture. Section 3 presents some of these solutions
and even how much they cost. A short comparison of the current prices with those from a few years ago shows
that more and more people and even more, enterprises wish to use the Cloud.
This popularity has its advantages and disadvantages. While the advantages have been mentioned earlier, one of
the main issues is that such a popular service draws the attention of attackers. The bigger and complex a system is,
the higher the probability of finding a breach, exploiting it and getting away with it. This is why Cloud Forensics
is a must: to develop the tools for detecting malicious behavior and to try and suggest improvements for avoiding
data loss or tampering and confidentiality violation.
All the nice things that the Cloud provides may come as a burden into a forensics investigation. We dont just have
to deal with technical aspects, but also with legal and organizational aspects. It would not be of much use to have
all the necessary tools for data collection and analysis but not gaining access to the physical machine due to legal
aspects. Having Cloud sites on so many different locations, with differents laws about security and confidentiality,
it should become a must for Forensics aspects to be specified, a priori to any actual resource rent or usage, between
the Cloud Provider and the customer. There is also a need for standardization in the Forensics world, but all in due
time. These aspects are presented in section 4, leaving section 5 for a short description of a future framework for
Cloud Forensics.

II. C LOUD S YSTEMS


1

NIST probably gave the most popular definition for the Cloud [23]:
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or service provider interaction.
Therefore, cloud computing is the model that should provide not only fast access to shared data, but also extremely
low resource management; its the latest distributed and parallel paradigm that guarantees to offer reliable services
implemented on modern servers, on top of virtualized resources [13].
An older model is the Grid whose infrastructure was first thought of in the 90s when the ideas of on-demand
computing and data sharing among dynamic resources have emerged [14]. One of the most popular definitions of
the Grid was given by Buyya in 2002, at the Grid Planet Conference:
A Grid is a type of parallel and distributed system that enables the sharing, selection and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability,
performance, cost and users quality-of-service requirement.
Nowadays, when both models have reached a certain level of popularity and maturity, a question that could emerge
is: why isnt the Grid enough and why would the Cloud be? For answering this question, lets make a rather quick
analysis of the two.
A. Grid computing
One of the most popular definitions of the Grid was given by Buyya [13], at the Grid Planet Conference:
A Grid is a type of parallel and distributed system that enables the sharing, selection and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability,
performance, cost and users quality-of-service requirement
The main purpose of the Grid is to give a solution to those problems that cant be solved on a singular machine.
Thus, in a Grid we may find not only personal computers, but also data centers, servers, clusters, supercomputers,
all configured to share organization information and communicate in order to find resolution for an extremely
computationally expensive task. One of the most important and new aspects of the Grid is that all these resources
may as well be placed in different geographical areas and still function properly.
There are a number of pros and cons in using the Grid.
Advantages: One of the advantages is that an application may benefit not only by storage resources[20], but also
by services running on the Grids machines. Considering the fact that a Grid may also contain specialized devices,
there is no need for an application to be over a certain level of generality for being executed. Another advantage is
the redundancy given by the distributed configuration of the Grid. If one site suddenly fails, the user may not even
be aware of this fact since his application can be easily moved on another site. The Grid has also found a solution
for the computationally expensive tasks: its scheduler is designed to efficiently allocate tasks to those machines
that report a low utilization level, ensuring the users quality of service requirements.
Disadvantages: Considering that the resources from a Grid are not physically placed together, there have been
conflicting policies in the process of data sharing between domains. Another disputed aspect of the Grid was the
provided security: the virtualization method covers the data and the resources, giving the impression of a one large
set of resources. There have been many challenges in ensuring the needed security for accessing these resources,
such as dynamic services, multiple security mechanisms, dynamic apparition of multiple trust domains [37] that
1

National Institute of Standards and Technology

Figure 1: Grid concept [18].


often made the Grid an unwanted solution for large scale applications.
The heterogeneity of the Grid and its other limitations made it clear that there is a need for something more flexible.
This need led to the apparition of Cloud.
B. Cloud computing
R. Buyya gave one of the first complete definitions of Cloud:
A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized
computers that are dynamically provisioned and presented as one or more unified computing resources(s) based on
service-level agreements established through negotiation between the service provider and consumers. [13], just
as shown in Figure 2.

Figure 2: Cloud concept.

From that definition it can be concluded that the Cloud is able to offer the same services as the Grid and even
more. There are mainly three new hardware aspects that the Cloud brought [12]:
the impression of limitless resources, at request, thus excluding the users necessity to plan in advance for
their projects needs
the invalidation of a prior obligation of the Cloud users, thus offering the possibility to ask for more resources
only if and when they are needed
the capability to pay per use, thus giving the possibility to rent resources for as long as the users needs
them, even for short periods, and to free them at any time.
Luis M. Vaquero et al. [36] gave a more complex definition of the Cloud, one that they believe sums up many
other definitions and the main aspects of what Cloud is today:

Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development
platforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale),
allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay-per-use
model in which guarantees are offered by the Infrastructure Provider by means of customized SLAs2 .
They have repeatedly stated that the Cloud still does not have a final and valid definition, given the fact that it is
a computing solution not yet finalized for which there are many ongoing usages and requirements for which rules
and policies are to be defined.
C. Cloud computing: features and challenges
The above definitions are presenting the idea of Cloud Computing and what it should offer, but when going from
theory to actual implementation, the challenges are considerable.
In [12], Michael Armbrust et al. present a series of said challenges. Some of these include:
Service availability. The Cloud is expected to be permanently available and to never fail. Using Cloud
Computing services from multiple providers is a solution to this issue.
Data Lock-In. Due to the lack of standardization of Cloud architecture, the applications and data cannot be
easily moved from one Cloud to another. When a Cloud provider is having problems that require his prices
to go up, the Cloud user is forced to pay the increased prices in order not to lose outside access to his data.
Obviously, the solution to this situation stands standardizing the APIs for the Cloud platforms.
Security. There is a popular belief that Cloud storage is easily accessible by anyone. However, making the
Cloud secure is not so difficult. Encrypting the data before uploading it to the Cloud and changing the encryption
key at various periods of time would offer all the needed confidentiality.
Data Transfer. Nowadays, the amount of data on which applications operate is constantly increasing. The
cost of sending the data via a courier depends only on the weight of the hard drives while sending the same
data via Internet costs more for each sent megabyte. There is a point in the amount of transferred data when it
becomes more advantageous to use the courier rather than the Internet. Sending physical disks, coupled with
reducing the cost of inter-cluster transfers - once the data is in the Cloud - aims at making the Cloud more
affordable.
Scalability. The system has the capability of permanently providing pay-per-use resources and even logs
without affecting the overall performance and with sustainance for indexing.
III. C LOUD C OMPUTING I NFRASTRUCTURE
All the Cloud services presented up until now are classified in three categories [23]:
Software as a Service (SaaS). The Cloud provider offers applications that the user can access and use online.
Platform as a Service (PaaS). The user is given the possibility to upload his own applications on a Cloud
site and use it from there. In this case, one must pay attention to the compatibility of the used programming
language with the platform.
Infrastructure as a Service (IaaS). The consumer has now access to power supplies and storage and other
raw resources. This type of service is the only one where the user can control the operating systems, can
manage the storage and even configure elements of the network.
Figure 3 shows what these services consist of, how they interact and who offers them.
The NIST Definition [23] of Cloud Computing also presents the deployment models of Cloud Computing:
Public Cloud. In this model, the location of the site belongs to the Cloud provider. Here, more organizations
can own and share the services provided by the Cloud, but their data and applications are kept separated. There
are several public Clouds in continuous development: Amazon EC2, Google AppEngine, Microsoft Azure.
Private Cloud. This model can also be considered internal cloud or enterprise cloud [11]. The infrastructure
of the Cloud is specifically designed to provide services for a single organization which usually owns the
2
Service Level Agreement: official written agreement between a Cloud System Provider and a customer that contains what services will
be offered by the provider.

location of the site. Multiple solutions for setting up a private Cloud are used by organizations: VMWare,
VirtualBox, OpenStack.
Hybrid Cloud. As the name suggests, this model implies the collaboration of two or more different Cloud
infrastructures which remain separated, but can share data and even applications based on standardized policies.
There is a number of reasons for choosing this model: the ease of moving applications between different sites,
the ability to rapidly access the data from another cloud, the possibility to combine the resources provided by
several clouds and use them as a whole.

Figure 3: Cloud services [17].


Each solution given for the Cloud models comes with its own approach and ways for making the Cloud what
the users expect it to be. In the following section we will summarily present these approaches and what were the
obtained results.
A. Public infrastructure
In this section we will focus on the Cloud solutions offered by Amazon Elastic Compute Cloud (EC2), Google
App Engine and Microsoft Azure.
1) Amazon Elastic Compute Cloud (EC2): EC2 offers an IaaS type of service. As stated in [25], Amazon
Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS)
cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy
applications faster. You can use Amazon EC2 to launch as many or as few virtual servers as you need, configure
security and networking, and manage storage.
What EC2 actually offers is the support for running multiple applications, Linux and Windows based. On the same
set of physical resources, several virtualized machines are mapped and given to users. For accomplishing this, EC2
uses the Xen open-source middleware.
When an user wants to use the EC2 resources, the first step is to send a request for one of the available machines,
here called instances. There are several types of instances the user can choose from, some of them presented in
Table I, the others found in [8].
The prices for data transfer in and out of Amazon EC2 depend on the source and destination of the transfer
and vary between $0 and $0.085 per GB. There are also instances which offer an optimized storage, like i2.xlarge,
i2.2xlarge, i2.4xlarge and i2.8xlarge and whose costs are between $0.853 and $6.820 per hour.
After choosing the instance, the user has to specify the VM3 he wants to deploy on ECs physical machine. Once the
deployment is completed, the instance starts its booting process and after that it can be used as a normal computer,
usually via ssh. There are two statuses that the physical resource has during the process: running, during the boot,
3

Virtual Machine

Instance type
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge

ECU (Nr of cores)


1(1)
4(2)
8(4)
5(2)
20(8)

RAM (GB)
1.7
7.5
15.0
1.7
7.0

Architecture (bit)
32/64
64
64
32/64
64

Disk (GB)
160
840
1680
350
1680

Linux cost ($/hour)


0.044
0.175
0.350
0.130
0.520

Windows cost ($/hour)


0.075
0.299
0.598
0.210
0.840

Table I: Amazon EC2 type of instances

and installed, during the actual usage (after boot)[1]. The costs presented in TableI are calculated for the time a
resource is marked as installed.
One user can run up to 20 instances simultaneously. The elastic term refers to the fact that user can completely
control his infrastructure by opening and closing instances in the 20 allowed limit.
2) Google App Engine (GAE): Google developed another technique (different than server virtualization used by
Amazon EC2) technique-specific sandbox [30]. Google states that Google App Engine is a Platform as a Service
(PaaS) offering that lets you build and run applications on Googles infrastructure. App Engine applications are
easy to build, easy to maintain, and easy to scale as your traffic and data storage needs change. With App Engine,
there are no servers for you to maintain. You simply upload your application and its ready to go.[6]
The initial scope of GAE was to enlarge the Web and to make it even more appealing. Googles infrastructure
offers support for applications written in a variety of programming languages[6]: Java, Python, PHP, Go.
Given the fact that App Engine is mostly destined to Web applications, there is a rather large number of requests
and replies which are quantified. These requests and replies are CPU intensive and thats why they are rationed.
For example, if an application is extremely popular and receives thousands of request per day some of them will be
processed freely, within the free quota and the others will be charged. This is part of Googles pricing philosophy:
use for free within a certain amount of resources and pay for more.
The term sandbox means that an application is allowed to do as many things as the Google Cloud provider allows
it to do. The Python developed environment is permanently checking all the actions and stops those potentially
unsafe. If one application is detected with this kind of unsafe operations, it is automatically shut down to ensure that
no other applications are harmed. The user cannot install whatever APIs he wants (except for the supported ones)
because everything is carefully monitored. In addition, the user is not aware of the requests that his application
receives and even more, of where and how his application is being ran. Everything is taken care of by the GAE. To
sum up, GAE has a copy of your application, somewhere on one of its data centers. If your application is not at all
popular, it may not even be ran, but if it is ran, than it may be ran on several sites. When the application receives a
request, GAE identifies it as being for you and knows how to send it to the destination. The same thing happens for
the reply. The only concern of the developer is to efficiently use the resources for managing the received requests.
Google App Engine was integrated in a broader project: Google Cloud. Google Cloud offers products for computing,
storage, networking, Big Data, API translation and prediction and deployment management. These services are
presented in the below image.

Figure 4: Google Cloud Platform [3].


Besides from the PaaS solution, Google App Engine, Google also developed an IaaS solution, Compute Engine

which comes with a large number of possible configurations for the virtual machines. The number of available
standard operating system images is quite diverse: CentOS6, CentOS7, Debian 7 Wheezy, Debian 7 Wheezy
Backports, Red Hat Enterprise Linux, SUSE, Ubuntu, Windows Server.
The first step in using Compute Engine is to activate this service from Google Developers Console4 . After that an
virtual machine can easily create a virtual machine with the desired characteristics: machine type, image type, disk
type. It is also advisable to set up a firewall for enabling a first level of filtering from the Internet. The created VM
can now be accessed via ssh and used for development.
Google Compute Engine offers two ways for managing the projects: a command-line tool and a Compute Engine
Console which comes as a graphical user interface [7].
3) Microsoft Azure: In [12] it is stated that Azure is intermediate between complete application frameworks
like AppEngine on the one hand, and hardware virtual machines like EC2 on the other.
Through the applications written using .NET libraries, Azure offers now support not only for Windows Servers
but also for Linux based virtual machines. Users can write their programs in .NET, PHP, Node.js, Java, Ruby and
Python. The Microsoft Azure platform is formed of three components[15]:
Windows Azure provides the platform for running Windows based applications and for managing the needed
data.
SQL Azure is the main service that manages the data operations.
Windows Azure platform AppFabric assures that the applications and the data in the Cloud are able to
communicate and share information.
From the infrastructure point of view, Azure offers not only the possibility to create, deploy and work on your
virtual machines, but also to use pre-configured environments and to get directly to business. The separation of
instances is made through the Windows Azure Hypervisor (WAH).
Windows Azure classifies its services into four categories [35]:
Compute offers computing power and includes:
Virtual Machines
Web Sites
Cloud Service
Mobile Services
Network offers a way for the users to be provided with Windows Azure applications and contains:
Virtual Network
Traffic Manager
Data offers the means for data management such as collection, analysis, storage and assumes:
Data Management
Business Analytics
HDInsight
Cache
Backup
Recovery Manager
App offers support for elements as security, performance improvement and includes:
Media Services
Messaging
Notification Hubs
Active Directory
Multifactor Authentication
Table II is a summary of the Microsoft Azure instances and their prices found in [2]. The machines are paid
per minute and there are two tiers of services: Base Tier, for not so demanding applications and Standard Tier for
4

GUI that provides support for managing the services Google offers

Instance (Standard tier)


A0
A1
A2
A3
A4
A5
A6
A7

Cores
1
1
2
4
8
2
4
8

RAM (GB)
0.75
1.75
3.5
7
14
14
28
56

Disk size (GB)


20
70
135
285
605
135
285
605

Price ($/hour)
0.02
0.09
0.18
0.36
0.72
0.33
0.66
1.32

Table II: Windows Azure Standard tier


those applications that require more memory, more CPU, faster networking operations.
B. Private infrastructure
In this section we will present three of the open-source softwares for private Cloud: OpenStack, Eucalyptus and
OpenNebula.
1) OpenStack: The OpenStack Project, initially developed by NASA[29] offers an IaaS model which not so long
ago had only seven components[5], but now has developed three more. The OpenStack architecture contains [4]:
Dashboard (Horizon) comes with a Web GUI for the others OpenStack services.
Compute (Nova) offers the necessary software for working with virtual machines. It is similar to Amazon
EC2, assuring scalability and redundancy.
Network (Neutron) ensures connectivity between machines running OpenStack software.
Object Store (Swift) offers the software for managing the storage of data to the order of petabytes for long
periods of time.
Block Storage (Cinder) offers, as the name suggests, block storages for guest only virtual machines.
Image Service (Glance) provides services for virtual disk images.
Identity (Keystone) manages the security for the OpenStack services.
Telemetry (Ceilometer) is responsible for billing, metering, rating, autoscaling.
Orchestration (Heat) conducts the interactions between different Cloud applications.
Database Service (Trove) offers support for relational and non-relational databases.
The interaction of the components presented above are illustrated in Figure 5.

Figure 5: OpenStack architecture [5].


If an user wants to use OpenStack he will to that either through the Dashboard service, either through the APIs
every service provides. The Identity services ensures the authentication and after that, all the other services are

accessible.
The project was developed in Python and runs with an Apache2 license. The provided GUI offers the user the
liberty to create not only virtual machines with default values of CPU and RAM, but to set those numbers [32].
Given the fact that the OpenStack Project aims to be able to support a considerable large infrastructure, it uses
more hypervisors for creating and running the virtual machines: Xen, KVM, HyperV, Qemu[19].
2) Eucalyptus: Unlike OpenStack which supports only private infrastructures, Eucalyptus also provides solutions
for hybrid infrastructures. It is written in several programming languages: Java, C and Python.
Eucalyptus was designed to work under an hierarchical architecture, using an emulation of Amazon EC2s SOAP.
This way, the user can create, access, manage and terminate virtual instances just like on Amazon EC2 presented
in Section 3.1.1. In the beginning, the virtual machines were running only on top of the Xen hypervisor, but know
it also offers support for KVM/QEMU and VMware.
Figure 6 shows the hierarchical architecture of Eucalyptus and its main four components Node Controller are in
charge of the virtual machines from the physical machine on which they are installed. Cluster Controller gathers
information from a number of node controllers and manages VM scheduling and instances. Cloud Controller is
the core of the architecture; it takes care of the high-level aspects of the system giving commands to the Cluster
Controllers about scheduling and resource allocation. Storage Controller (Walrus) includes Walrus which is a storage
service[19]. It provides storage for the VM and it can be used as a HTTP put/get solution

Figure 6: Eucalyptus architecture [9].


3) OpenNebula: OpenNebula is a Linux based for mostly private but also public Cloud infrastructures. The
virtualization methods used are Xen, VMware and KVM. As stated in [34], the solution is not environment
dependent, offering flexibility and modularity through its three components: the OpenNebula core, the Capacity
Manager and Virtualizer Access Drivers.
The architecture upon OpenNebula was developed is more classical, offering a front-end part and a series of clusters
for running the virtual machines[39]. The programming languages used for development are Ruby, C++ and Java.
Lately, a decrease in the popularity of OpenNebula has been seen, while OpenStack and Eucalyptus are being
preferred [39].

10

IV. F ORENSICS
A. Introduction to Cloud Forensics
Cloud Forensics does not aim to secure the systems, but to detect the infiltrations and to offer the authorities a
way to track the source of the attacks.
The notion of Cloud Forensics has been addressed for the first time in 2009 at a large scale [38]. While the
companies continue to offload their IT applications infrastructure to the Cloud, criminals who target these systems
are more and more attracted to them. No matter how secure a system is, there always is a breach to be found and
Cloud systems are no exception. As the rewards that can be acquired by a criminal are better the larger the system
is, attacks on these systems can not be avoided.
Cloud Forensics does not aim to secure the systems, but to detect the infiltrations and to offer the authorities a way
to track the source of the attacks.
The tools used by Cloud Forensics are different than those used for studying standard computer systems, as the
latter ones are being insufficient when applied to large scale.
In [31], Ruan et al. presented Cloud Forensics as a cross-discipline between Cloud computing and digital forensics,
to which NIST gives the following definitions: Cloud computing is a model for enabling ubiquitous, convenient,
on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned and released with minimal management effort or service
provider interaction and digital forensics is considered the application of science to the identification, collection,
examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain
of custody for the data.
Also, they have divided the Cloud Forensics problem into three main categories: organizational, legal and technical.
The legal aspect covers the multi-jurisdiction and multi-tenancy issue which demands for the Cloud Forensics
operations not to come in contradiction to any laws and not to tamper the confidentiality of others. There are also
the Service Level Agreements (SLAs) which contain a set of rules and demands that govern the provider-customer
interaction at the service level. The SLAs must refer to the conditions that the Cloud Forensics investigators must
respect during their work.
The organization aspect establishes the personnel, the collaboration permitted and the external necessary involvement for the forensics action to take place efficiently and successfully. The following roles have been found
necessary in conducting these activities: investigators, IT professionals, incident handlers, legal advisors and external
assistance.
The technical aspect is the one where the action really takes place. For this kind of Cloud investigations designated
tools must be used with respect to the legal aspect presented above. There are several characteristics that are a
must for the forensics tools: the data collection must be done carefully, not to violate the space of several other
tenants and to ensure the data integrity. Given the dynamic architecture of the Cloud, the forensics tools must be
of several types: elastic, static and live[31]. Another definite need for these tools are the capability to segregate the
collected data. Evidence that does not belong to the person whose Cloud space has been hacked will clearly put the
investigation on a wrong track. Since the Cloud is a mostly virtualized system, the tools for examining intrusions
should be able to analyze such an environment.
B. Forensics Challenges
As it can be seen in the former chapter, the rules and regulations for conducting a Cloud Forensics investigation
are not few. Adding the complexity architecture of the Cloud systems and its extremely large usage, there have
been several challenges that Cloud Forensics has met, mapped to the technical aspects that the investigation tools
must provide.[31]

11

Data collection is the first step in the Cloud Forensics process. For collecting the data, one must get access to it.
It is clear that depending on the Cloud deployment model, there are several degrees of difficulty in doing that. If
we are dealing with an IaaS model, the access to the data is relatively easy to obtain. This is not the case for the
SaaS model where the customer has no idea where his application is actually run or where his data is kept. The
completeness of SLAs is here put to doubt since the forensics aspects are not enough elaborated. For example, the
customers dont have access to log files such as IP logs and recent data from the Cloud or former virtual machines.
The biggest challenge for elastic, static and live forensics appears to be synchronization [31]. Considering the
extremely high degree of availability that the Cloud offers and the number of devices - fix and mobile - that can
access it, synchronizing logs from different physical machines, placed in geographically different locations becomes
an issue. Another issue is the unification and conversion of logs, mostly because of their formats which come in a
very large number.
Through the present paper it has many times come into discussion how the instances that run on the same
physical machine are segregated. Usually, this is done by the hypervisor which also plays an important role in
the Forensics world. Maintaining the instances separated, keeping logs of what happens underneath the hypervisor
(shared resources) and not going into a Cloud neighbors space during an investigation is crucial in offering a valid
result in the end. Another issue appears here if the data kept on the Cloud has been encrypted prior to upload and
if that data becomes subject to forensics. The keys can be obtained only after agreements between the customer,
the Cloud service provider and the law representants.
If talking about virtualized environments, the hypervisor represents the main element. The hypervisor takes care
of how a virtual machines runs, so its of no surprise that the attackers would want to hack it. If the hypervisor is
tempered, than all the virtual machines on top of it are compromised and thus, all the data kept on that physical
machine. Unfortunately, until now there hasnt been a set of policies to be respected by the hypervisor developers
in order to make hypervisor forensics easier.
The challenges are not only of technical level, but also legislative and human. The SLAs are usually incomplete,
lacking conditions for the forensics situations, the countries where Cloud sites reside have different laws regarding
access to data and confidentiality and even the personnel may not always have the required experience to deal with
a complex investigation.
Although these presented difficulties are permanent, there are several direction for diminishing the Cloud Forensics
challenge.
C. Forensics directions
Besides the increased difficulties that the Cloud design broughts in the forensics actions, there are also characteristics that help in their progress. Given the large usage of Cloud services, any implementation made here is
cheaper, forensics included. Another important aspect is that data deletion is probably never complete, thanks to
the provided redundancy, so data recovery might not be so hard to do. Also, the fact that virtual machines can be
cloned on demand gives the opportunity to conduct a parallel analysis on those machines. The pay-per-use character
of the Cloud has its benefits here since logs can also be generated and stored on demand, giving a broader view
over the evolution of a certain instance.
There are presented several aspects in [26] that the Cloud should consider in the process of becoming more secure:
Information security. It is recommended for all data to be encrypted, no matter where its stored. Also,
tracking who has access to information, knowing which machine uses which data, monitoring the operations
on data would be a considerable steps forward in making the Cloud a more safe environment.
Trust management in remote servers. Third parties responsible of data and security audit should be involved
in the Cloud usage process if more companies are to use this service.
Information privacy. If data is to be encrypted when uploaded on the Cloud, then new search and indexing
mechanisms are needed, which dont need decryption for returning a valid result. Homomorphic encryption

12

and PIR5 come to support even more the transition towards fully encrypted data within the Cloud.
Rafael Marty gives a more complex view of what logging means in Cloud Computing[22]. Aside from the
fact that logs are kept on multiple different machines, they are only available for relatively short periods of time.
Moreover, every tier of the Cloud system generates logs: the operating system, the network services and almost all
the running applications. In addition to all these, not all users have access to the logs; certain users have access
to certain logs and only for a while. Once the needed logs are gathered, processing and analyzing them is made
heavier by the excessive number of formats they come in. There is at least one rule that all logs should abide: they
all must answer the questions when?, why?, who?, what?. Marty also presents the main steps for setting
a log management system: every infrastructure and application must first enable logging, then ensure the transport
of the logs and in the end provide the mechanisms for processing the logs.
A framework that follows these steps is presented in [28]. The implemented architecture contains five layers:
management layer, virtualization layer, storage layer, a layer for data analysis and one for result centralization. All
these layers are represented by jobs in a distributed environment. Figure 7 presents these five layers and the way
they interact. For ensuring a minimum exposure to violation, all the files involved in the logging process are being
hashed.

Figure 7: Cloud Forensics Logging Framework [28].


A forensics framework for Cloud Computing is presented in [27]. The proposed architecture is shown in Figure 8.
There are seven constituent modules of the application:
the GUI offers a web interface where the user can choose the number of instances, the software to be used
and can establish for how long he wants to use those instances.
the Frontend module filters the requests granting access to the User Manager only for those which are
accurate.
the User Manager module has role of authenticator and lease6 validator.
the Lease Manager modules handles the leases and creates the needed jobs after analyzing them.
the Scheduler decides on which physical machine to run a certain lease by calculating an average number of
instances for each node.
the Hypervisor Manager has no other role than to manage the hypervisors from the system
the Monitor module keeps an eye over the entire system and decides the number of needed virtual machines
for every lease.
the Database Layer offers support for keeping the information needed by the Lease Manager, the Scheduler
and the Hypervisor Manager.
5
6

Private information retrieval


renting contract existing between the user that requests certain resources and the system that offers them[27]

13

Figure 8: System Architecture [27].


The implementation of this framework showed that enabling forensics modules does not necessarily bring a
considerable overhead, in this case reaching a maximum of 8% of the total load.
V. C ONCLUSION AND F UTURE W ORK
The scope of the present paper was not only to analyze the Cloud and the tendencies in Cloud Forensics, but to
also open a way towards a new Cloud Forensics solution.
As a virtualization solution, besides the hypervisors, there is a new concept that can offer the same environment
as a virtual machine, but with less overhead: containers. LXC - Linux Containers - represent a set of tools which
allow control over Linux kernel components. It is a free software written in Python 2, Go, Ruby and Haskell. There
has been only one LXC release so far, 1.0 in February 2015 which will have support until 2019.
We propose a Cloud Forensics framework composed of two main elements: containers based audit and intrusion
detection set off by policy violation.
An important conclusion to be drawn from this paper is that here is never enough: never enough memory, never
enough space, never enough security. What we tried to emphasize was the complete dynamic and fast evolution of
Cloud Computing and thus, the need for security and audit tools for evolve just as fast.
Studying the past, the present and the state of the art for both Cloud Computing and Cloud Forensics brought the
idea of a new Forensics framework based on a current and stringent need and brand new development tools.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]

Amazon Elastic Compute Cloud User Guide for Linux APIVersion 2014-10-01.
http://azure.microsoft.com/en-us/pricing/details/virtual-machines/#windows. Technical report.
http://cloudacademy.com/blog/google-cloud-platform-new-announcements-and-features/. Technical report.
http://docs.openstack.org/admin-guide-cloud/content/ch getting-started-with-openstack.html. Technical report.
http://ken.pepple.info/. Technical report.
https://cloud.google.com/appengine/docs/whatisgoogleappengine. Technical report.
https://cloud.google.com/compute/docs/. Technical report.
http://www.ec2instances.info/. Technical report.
http://www.institut-numerique.org/summary-51c0279d01413. Technical report.
12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 22-24 June 2003, Seattle, WA, USA.
IEEE Computer Society, 2003.

14

[11] Public or Private Cloud: The Choice is Yours. Technical report, Aerohive NETWORKS, 2013.
[12] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel
Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):5058, 2010.
[13] Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud computing: Vision, hype, and reality for delivering
it services as computing utilities. In High Performance Computing and Communications, 2008. HPCC08. 10th IEEE International
Conference on, pages 513. Ieee, 2008.
[14] Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: Enabling scalable virtual organizations. International journal
of high performance computing applications, 15(3):200222, 2001.
[15] Jianya Gong, Peng Yue, and Hongxiu Zhou. Geoprocessing in the microsoft cloud computing platform-azure. In Proceedings the Joint
Symposium of ISPRS Technical Commission IV & AutoCarto, page 6. Citeseer, 2010.
[16] Eugene Gorelik. Cloud computing models. PhD thesis, Massachusetts Institute of Technology, 2013.
[17] CN Hofer and G Karagiannis. Cloud computing services: taxonomy and comparison. Journal of Internet Services and Applications,
2(2):8194, 2011.
[18] Bart Jacob, Michael Brown, Kentaro Fukui, Nihar Trivedi, et al. Introduction to grid computing.
[19] Srivatsan Jagannathan. Comparison and evaluation of open-source cloud management software. 2012.
[20] Kiranjot Kaur and Anjandeep Kaur Rai. A comparative analysis: Grid, cluster and cloud computing.
[21] Jerome Lauret, Matthew Walker, Sebastien Goasguen, and Levente Hajdu. From grid to cloud, the star experience, 2010.
[22] Raffael Marty. Cloud application logging for forensics. In Proceedings of the 2011 ACM Symposium on Applied Computing, pages
178184. ACM, 2011.
[23] Peter Mell and Tim Grance. The NIST definition of cloud computing. 2011.
[24] Daniel Nurmi, Richard Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, and Dmitrii Zagorodnov. The
eucalyptus open-source cloud-computing system. In Cluster Computing and the Grid, 2009. CCGRID09. 9th IEEE/ACM International
Symposium on, pages 124131. IEEE, 2009.
[25] Simon Ostermann, Alexandria Iosup, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick Epema. A performance analysis of
ec2 cloud computing services for scientific computing. In Cloud computing, pages 115131. Springer, 2010.
[26] Alecsandru Patrascu, Diana Maimut, and Emil Simion. New directions in cloud computing. a security perspective. In Communications
(COMM), 2012 9th International Conference on, pages 289292. IEEE, 2012.
[27] Alecsandru Patrascu and Victor Valeriu Patriciu. Implementation of a cloud computing framework for cloud forensics.
[28] Alecsandru Patrascu and Victor-Valeriu Patriciu. Logging framework for cloud computing forensic environments. In Communications
(COMM), 2014 10th International Conference on, pages 14. IEEE, 2014.
[29] Ken Pepple. Deploying openstack. OReilly Media, Inc., 2011.
[30] Ling Qian, Zhiguo Luo, Yujian Du, and Leitao Guo. Cloud computing: An overview. In Cloud Computing, pages 626631. Springer,
2009.
Carthy, Prof.Tahar

[31] Keyun Ruan, Prof.Joe


Kechadi, and Mark Crosbie. Cloud forensics: An overview.
[32] Omar Sefraoui, Mohammed Aissaoui, and Mohsine Eleuldj. Openstack: toward an open-source solution for cloud computing.
International Journal of Computer Applications, 55(3):3842, 2012.
[33] Charles Severance. Using Google App Engine. OReilly Media, Inc., 2009.
[34] Borja Sotomayor, Ruben Santiago Montero, Ignacio Martn Llorente, and Ian Foster. Capacity leasing in cloud systems using the
opennebula engine. In Workshop on Cloud Computing and its Applications, volume 3, 2008.
[35] Mitch Tulloch. Introducing Windows Azure for IT Professionals. Microsoft Press, 2013.
[36] Luis M Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in the clouds: towards a cloud definition. ACM
SIGCOMM Computer Communication Review, 39(1):5055, 2008.
[37] Von Welch, Frank Siebenlist, Ian T. Foster, John Bresnahan, Karl Czajkowski, Jarek Gawor, Carl Kesselman, Sam Meder, Laura
Pearlman, and Steven Tuecke. Security for grid services. In 12th International Symposium on High-Performance Distributed Computing
(HPDC-12 2003), 22-24 June 2003, Seattle, WA, USA [10], pages 4857.
[38] Stephen D Wolthusen. Overcast: Forensic discovery in cloud environments. In IT Security Incident Management and IT Forensics,
2009. IMF09. Fifth International Conference on, pages 39. IEEE, 2009.
[39] Sonali Yadav. Comparative study on open source software for cloud computing platform: Eucalyptus, openstack and opennebula.
International Journal Of Engineering And Science, 3(10):5154, 2013.

Potrebbero piacerti anche