Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
AbstractIn today's corporate society, the productivity, data technologies like MapReduce and Hadoop, it was very
stability, and management of an organization relies upon the difficult, error-prone and time-consuming to process large
power of databases. Most organizations outsource their datasets, therefore, with these technologies, dataset
databases in the form of big data and then transfer it into processing is much easier -than the traditional processing
cloud. Although cloud computing technology brings many tools.
benefits for an organization, their security risk factor still Many organizations collect this complex data by doing
remains as a big barrier for its wide-spread adoption. worldwide surveys to improve their decision-making process
Therefore, this problem poses a critical question such as: Is which is extremely important to sustain a healthy future for
information secure in cloud? Due to this uncertainty, the
their business [1]. Thus, cloud comes with security
primary aim of this study is to describe and identify most
vulnerable aspects of security threats in cloud environment
challenges as storing big data into the cloud, even though
through content analysis and highlight and evaluate gaps in the content owners do not know where their data is kept do not
literature to draw scholarly attention. This paper analyzed trust the cloud. First of all, let me introduce what cloud
content to source data that helps to identify gaps in the computing is? Cloud computing is a set of IT services that
literature. These gaps then have been identified and evaluated provides to end-users over the internet in order to scale up or
to answer questions with possible solutions. This research will down their service requirements. Cloud computing is the
help both vendors and users about security issues that have fastest growing sector in IT industry as its capacity increased
been heightened with recent population advancements and dynamically with no investing in new infrastructure and
demands that have been pointed out for improvement. This licening software. There are many advantages of cloud
study has reviewed literature in the field over the span of six computing such as cost efficient, storage capacity, back-up
years, and endeavors to seek answers for the question and cast and recovery, quick deployment, easy to access information
solutions through thorough evaluation and analysis: the from anywhere and much more. Despiteits advantages,
security related issues in cloud computing associated with big some disadvantages are also there that we need to be aware
data must be taking into account by security practitioners of, before using this computing. For instance, technical
when assessing the needs of service providers. This study has issues (network or connectivity problem), security, prone to
found that cloud environment is an innovation, and the blend attack (hack attacks). However, its pros outweigh its cons
of parallel computing and cloud computing can offer various and there is only one thing that we need to be aware of which
advantages. It is ideal for different kinds of applications that
is the fact that cloud computing technology needs to be
can suit different needs if expenses of application modifications
consolidate the cost of setup and maintenance of cloud
secured that it would not cause any leaks to stored sensitive
computing. This study has analyzed content and has also information [2]. Moreover, cloud computing provides virtual
found that management solution of only one big secure data resources to their consumers via the internet so that they can
after integrating it with cloud needs yet to be designed. use cloud infrastructures, services, and softwares. This is a
cheaper way than any other computing systems with zero
Keywords-Big Data, MapReduce, Cloud computing, RDBM maintenance cost. The service provider is only responsible
for providing the availability of services, nonetheless with
I. INTRODUCTION services that are also known as "IT on demand" or utility
computing.
Big data plays an essential role in the business world, as Google has introduced MapReduce, as a framework that
storage of significant amounts of complex data securely is uses Hadoop's distribution file system (HDFS) [3]. In
one of the most crucial aspects of corporate operations. In MapReduce, the vast amount of data can be converted into
actual, big data is s term that describes huge amounts of tuples and then these tuples can be deduced as an input and
structured, semi-structured and unstructured data. The then reduced further by dividing these tuples into smaller
demand of innovative method of processing data in a cost sets of tuples. This way, big data can be managed
effective method, in order to provide process automation nonetheless, at the same time, it can still create a problem of
and enable decision making is a need and sparks scholarly security, due to business growth and monitoring. However,
interest. Big data has 3Vs: high-volume of data, high- MapReduce is not sufficient either, as it lacks security of
velocity of data, and high variety of types of information that sensitive data, and confidentiality becomes a huge problem.
needs to be mined. Although, the specific quantity of big Additionally, we will discuss how Hadoop frame will be
data is not mention anywhere, it can be in petabytes or used to solve this issue by using different techniques.
Exabyte. According to Peter Skomoroch, before having big Although there are many security and privacy issues, they
369
security policies. The weakness is that it is unique to processing taking on board the example of healthcare data, if
Amazon, and it could contribute more if it would be general this sensitive information leaks to the third party, for instance,
[14]. insurance companies who can access this data, then the
insurance company can find out about medical conditions
that would result in the increase of their premiums. Therefore,
to protect data from breaches of confidentiality, it is crucial
to provide strong security, this proves that Airavat is the best
technology to secure confidentiality. With the use of this
technique, the untrusted MapReduce program is sent to
Airavat then it could be protected as seen in Figure 1. After
performing the computations of MapReduce, it could cause
leakage of information. It uses a unique system called Linux
(SELinux) to add Mandatory access code when Airavat is
implemented on Hadoop. This technique provides strong
Figure 1. The percentage of compromised attributes in cloud environment security and privacy by preventing leakage of sensitive data
associated with big data and uses access control mechanism as it is the first system
that calculates access control with differential privacy
A new Architecture and Transparent Cloud Protection without auditing untrusted codes. The weakness of this
System (TCPS) has been discussed by Lombardi et al. [15] system is that it supports not only small sets of reducers and
to improve the security of cloud resources due to the generates but also enough noise to assure the differential
integrity protection problems in the cloud environment. They privacy of values [17].
guaranteed that they have recognized the integrity protection Cloud computing is a valuable tool. However,
problems, and to address the integrity issues, they have organizations still need to be understood and managed in
proposed a framework called TCPS to expand the security of depth and prioritize execution of any agreements.
cloud assets. As indicated by them, their proposed Fortunately, there are some mitigation strategies if cloud
framework, TCPS can be utilized to watch the visitor's customers can follow; it may reduce the level of risks [19].
integrity, and still keep honesty and virtualization. In TCPS Gatewood [20] suggests that deciding a vendors inside the
system, in order to manage the image systems, they have audit process, on how frequently the vendor evaluated
used image filter and scanners in order to detect malicious external organizations, the principles of the merchant is held
images to prevent from security vulnerability and security to, regardless of whether it is interested in being examined
attacks. The strength of this work is that, it proposed an consistently. Keeping up consistently with security
instrument that gives enhanced security, transparency, and arrangements and administrative prerequisites can be hard to
interruption identification system. The limitation is that they illustrate. Gatewood recommends that as merchants hurry to
haven't accepted their work, nor have they sent it in expert create and introduce cloud-based methods; they may miss the
cloud computing situation [15]. mark on including the essential records of administration
MapReduce is massive amount of data that can be controls. Moreover, investigating a various security features
converted into tuples and then these tuples can be provided [21-23] could be an interesting path to explore in the future
to reduce it as an input and then reduce these divided tuples to protect Big Data [24].
into a smaller set of tuples [16], [17]. This is a way; that big Researchers have discovered many issues in a cloud
data can be managed although, at the same time, it will still environment and start working on these matters in order to
create a problem of security, because data monitoring and minimize these problems. There are threats for using utility
business continuity can cause glitches. However, computing, as some of the significant results, corresponds to
MapReduce is not sufficient due to lack of security of the our given results in the table below. This table summarizes
sensitive data. Therefore, the proposed method to reduce the different techniques used to address the security and
security issues in the cloud is, Airavat. Airavat is a privacy of the big data in the cloud computing by a different
MapReduce-based system that is used to store and provide study. There are new security and privacy issues that are
high security and privacy of sensitive data (Healthcare, identified in the rest of the papers that is obsolete to the
shopping transactions, etc.). It is a new integration of access argument of this study.
control. The Airavat uses MapReduce on clusters in parallel
TABLE I. THE COMPARISON OF DIFFERENT CLOUD SECURITY TECHNIQUES
370
2. Bertino et al. XML Document- In this, the queries can be processed The information size was XML data document is used for a
(2014) [12] Cryptography and according to the policy provided by increased in XML secure environment to access
digital signature cloud provider, instead of processing format, and it created control of the third party, which
technique all queries. some integrity issues in introduces another trusted layer of
government, health, and security to the model.
finance area because of
the mode of delivery of
content.
3. Kevin Cryptographic The sensitive data can be stored in Managing private and If Intruder can get the database,
Hamlen encrypted form in the database rather public key however, they cannot get actual
(2013) [2] than plain text. data due to encryption of data.
4. Zhou et al. Declarative This technique is used to explore the Data management issues Data-centric security provides
(2016, 2012) Secure Distributed security premises of secure data are listed below: - secure query processing, efficient
[10, 9] Systems (DS2) sharing between the apps hosted on System analysis and end-to-end verification of data,
the clouds. forensic system analysis and forensics
Distributed query
processing
Query correction
assurance
5. Rongxing et Bilinear pairing This system uses five steps to control Data forensics and post Difficult to implement because it is
al. (2010) technique unauthorized user access and resolves examination based on a complex mathematical
[13] disputes of big data. The five steps model. However, this system
are: Setup, key generation, pushes the use of cloud computing
AnonyAuth, AuthAccess, and for full recognition to the public
Provenance tracking
6. Bleikertz et Amazons EC2 Amazon's EC2 have applied Reachability audit of Amazon EC2 provides a robust
al. (2010) specialized query policy language for Amazon security graphs analysis of security attacks and
[14] security analysis model and weigh up and groups vulnerabilities to enhance the
it for the practical domain. This security policies. However, it is
security analysis has been unique to Amazon, and it could be
implemented in Python and weighs up contributed more if it would be
that was calculated on Amazon EC2 general.
7. Lombardi et Transparent Cloud TCPS can be utilized to watch the Cloud security TCPS gives enhanced security,
al. (2010) Protection System visitors integrity and keeping the vulnerabilities and transparency, and interruption
[15] (TCPS) honesty and virtualization. attacks identification system, however, they
have not accepted their work, nor
they have sent in expert cloud
computing situation.
8. Roy et al. Airavat This technique provides strong -It supports a small set of Airavat is the first system that
(2010) [16], MapReduce-based security and privacy by preventing reducers. calculates access control with
[17] system leakage of sensitive data using access -Airavat generates differential privacy without auditing
control mechanism enough noise in order to untrusted codes
assure the differential
privacy of values.
9. Mladen et al. Virtual VCL is an open source end-to-end service A theoretical concept so it is not
(2008) [18] Computing implementation that provides NYU insulation via VPN, SSH proposed as much. It could have
Laboratory students with virtual access to tunnels, and VLANs contributed if the practical things
technique software applications that are were discussed in this work.
academically relevant.
371
leading providers like Amazon, Google, etc. is facing [13] Rongxing et al, Secure Provenance: The Essential Bread and
security issues. Therefore, the decision of adopting cloud Butter of Data Forensics in Cloud Computing, ASIACCS10,
Beijing, China.
computing is still in the progress and could be based on
[14] S. Bleikertz et al, "Security Audits of Multi-tier Virtual
ration of benefits to eliminate threats and risk. Infrastructures in Public Infrastructure Clouds", 2010.
[15] F. Lombardi and R. Pietro, "Transparent Security for
REFERENCES Cloud", SAC '10 Proceedings of the 2010 ACM Symposium
on Applied Computing, no. 978-1-60558-639-7, pp. 414-415,
[1] B. Matturdi, X. Zhou, S. Li and F. Lin, "Big Data security 2010.
and privacy: A review", China Communications, vol. 11, no. [16] I. Roy et al., "Airavat: Security and Privacy for MapReduce",
14, pp. 135-145, 2014. NSDI, vol. 10, pp. 297-312, 2010.
[2] T. Erl, R. Puttini and Z. Mahmood, Cloud computing. 2013 [17] I. Roy, "Airavat: Security and Privacy for MapReduce",
[3] W. Wei and X. Gu, "SecureMR: A Service Integrity google.com, 2016.
Assurance Framework for MapReduce", Proceedings of [18] M. Vouk, "Cloud Computing- Issues, Research and
IEEE CCIS2012, pp. 240-244, 2012.. Implementations", Journal of Computing and Information
[4] A. Gholami and E. Laure, "Big Data Security and Privacy Technology, vol. 16, no. 4, p. 235, 2008.
Issues in the CLOUD", International Journal of Network [19] T. Betcher, "Cloud Computing: Key IT-Related Risks and
Security & Its Applications, vol. 8, no. 1, pp. 59-79, 2016. Mitigation Strategies for Consideration by IT Security
[5] F. Shaikh and S. Haider, "Security treats in Cloud Practitioners", 2010.
Computing", Int. Conf. for Internet Technology and Secured [20] B. Gatewood, "Clouds On the Information Horizon: How To
Transactions (ICITST), Abu Dhabi, pp 214 219, 2011. Avoid The Storm", CRM, vol. 43, no. 4, pp. 32-36, 2009.
[6] P. Hoving and J. Essn, "Minutes from the first meeting of [21] D. V. Pham, A. Syed, A. Mohammad and M. N. Halgamuge,
TC 11, security and protection in information processing "Threat Analysis of Portable Hack Tools from USB Storage
systems", Computers & Security, vol. 4, no. 2, pp. 149-152, Devices and Protection Solutions", International Conference
1985. on Information and Emerging Technologies, pp 1-5, Karachi,
[7] "IEEE Cloud Computing Special Issue on Cloud Security", Pakistan, June 2010.
IEEE Cloud Comput., vol. 2, no. 5, pp. c2-c2, 2015. [22] D. V. Pham, A. Syed, M. N. Halgamuge, Universal serial bus
[8] C. Pfleeger, Security in Computing. Upper Saddle River, NJ: based software attacks and protection solutions, Digital
Prentice Hall PTR, 1997. Investigation 7 (3), 172-184, 2011.
[9] Y. Zhang and Y. Zhou, "TransOS: a transparent computing- [23] D. V. Pham, M. N. Halgamuge, A. Syed P. Mendis,
based operating system for the cloud", International Journal Optimizing windows security features to block malware and
of Cloud Computing, vol. 1, no. 4, pp. 287, 2012. hack tools on USB storage devices, Progress in
[10] W. Zhou, "Towards a Data-centric View of Cloud Security", electromagnetics research symposium, 350-355, 2010.
2016. [24] V. Vargas, A. Syed, A. Mohammad, and M. N. Halgamuge,
[11] A. Narwal and S. Tomar, "Kerberos Protocol: A Review", "Pentaho and Jaspersoft: A Comparative Study of Business
IJERT, vol. 4, no. 04, 2015. Intelligence Open Source Tools Processing Big Data to
[12] V. Inukollu, S. Arsi and S. Rao Ravuri, "Security Issues Evaluate Performances", Int. Journal of Advanced Computer
Associated with Big Data in Cloud Computing", Science and Applications (IJACSA), vol 7, no 10, pp 20-29,
International Journal of Network Security & Its Applications, November 2016.
vol. 6, no. 3, pp. 45-56, 2014.
372