Sei sulla pagina 1di 5

SECURITY ISSUES AND CHALLENGES OF BIG DATA ANALYTICS

AND VISUALIZATION
1
Bipin Bihari Jayasingh, 2M. R. Patra, 3D Bhanu Mahesh
1
IT Dept., CVR College of Engineering, Ibrahimpatan (M), RR Dist-501510, Telangana State.
Email: bbjayasingh9@rediffmail.com
2
Dept. of Computer Science, Berhampur University, Berhampur-7, Orissa, India.
Email: mrpatra12@gmail.com
3
IT Dept., CVR College of Engineering, Ibrahimpatan (M), RR Dist-501510, Telangana State.
Email: dbhanumahesh@gmail.com

Abstract- The big data environment supports to aggregate the diverse data sets and correlate to
resolve the issues of cyber security in terms of present the data sets using statistics in a dashboard to
finding the attacker. There are security data scientist. It is the major requirement for a data
challenges of big data as well as security issues the scientist how to develop a tool that can correlate and
analyst must understand. In this paper, the consolidate the diverse datasets and to make as a
challenges faced by an analyst include the fraud source for longer period [2].
detection, network forensics, data privacy issues
and data provenance problems are well studied. It
is more focused on the tools and techniques of
data mining for the use of big data analytics in The big data visualization and analytic techniques are
terms of security and the use of techniques for useful to analyze the huge network traffic data
security to protect big data in terms of applying available in the storage hub to address the cyber
encryption capabilities. This rigorous study leads security [2, 8]. The other way of analyzing the
to make use of available tools and techniques of network traffic data after the attack is also important
data analytics in the systematic implementation of to extract the evidence against the crime is through
a security system as well as forensic system. The big data analytics and visualization [3, 4]. The
paper also proposes the Bayesian classification analysis after the attack using big data analytics in the
algorithm of classification mining is suitable for storage hub of an enterprise is called a post hoc
predicting the attack type in the internetworked forensic analysis.
environment. The forensic aspects of big data
issues and challenges are more emphasized in There is a work related to HACE theorem that shows
order to help the forensic investigator in case of a the large volume of data, diverse data sources,
network attack. centralized and distributed control of data which
require to explore the relationships among the data. It
Keywords: Network Forensics, Big Data Analytics, also includes the processing model and the resolution
Fraud Detection, Privacy Issues, Data Provenance of big data in the perspective of data mining [15].
Problems, Visual Analytics. The data mining emerged with privacy preserving
data mining (PPDM) where data can be prepared
I. INTRODUCTION according to the suitability of the data mining
algorithms without compromising the security [16].
There are various techniques of big data for data
analysis, future data prediction and the incidence The remainder of the paper is followed as:
monitoring [10]. The more attractive system for section 2 focuses more on the security challenges of
many organizations is security incident and event the big data; section 3 discusses specifically the
management (SIEM) which is used for fraud security issues of big data analytics and the use of
detection and play a major role in the security security mechanism to protect the big data; section 4
environment. The organizations deploy various proposes the Bayesian classification algorithm to
intrusion detection sensors, sensor for network predict the class level; section 5 provide the
monitoring, tools for login and other tools to protect concluding remark.
the networked system which in turn generates diverse
data types is very difficult to manage. The security
vendors have to develop such a system which can

978-1-5090-5256-1/16/$31.00 2016
c IEEE 204
II. SECURITY CHALLENGES challenge which includes the problems associated in
The big data analytics has several security challenges accessing the network devices. The challenges face
that must be addressed to understand the real by the forensic investigator can be resolved through
potential. The challenges we consider in this section big data analytics only. The paper discusses a data
are the fraud detection, network forensics, data mining classification technique called Bayesian
privacy issues and data provenance problems. classification may be very useful to predict the class
level of an attack.
A. Fraud Detection
C. Data Privacy Issues
There are quintillion bytes of data [1] are created and
10 to 100 billion events are generated by a large The privacy of data is entirely dependent on the
enterprise [2]. As the events are generated from organization how the organization uses the data. The
multiple and heterogeneous sources, the enterprise data has to be used for the purpose it is collected and
must deploy more devices, hire more employees and no more apart from that particular. The data should
run more software for post hoc forensic analysis. As not be shared among others to get business profits or
a result, the efficacy of the big data is achieved to get names in the market. However, there are many
through efficient analysis of data and prediction of tools and techniques in big data environment that
data class. Existing analytical techniques are not extract the private data and make violations easier.
sufficient in the large scale analysis and processing of The security policies have to be enforced along with
big data events, so the big data analytics has attracted privacy policy for all the entities involved in the
the interest of the security community [6]. process. Now the requirement is to develop an
application by understanding the privacy policies and
The current visible challenge in the society is the practices to avoid the privacy violations [2].
fraud detection that must use big data analytics. The
most affected companies who really need fraud The role of the architects and designers play a major
detection are commercial Banks, credit cards ad role in safeguarding the data stores as well as the
phone companies etc. however, it is not enough privacy of data. The privacy cannot be preserved due
economically to mine big data with a custom built to availability of data to many parties to attract
infrastructure [7]. Hence, big data technologies are towards marketing and advertising. Sometimes the
useful though it facilitate a wide variety of private data can be used by the law enforcement
institutions with an affordable infrastructure for agencies for national security. There is also
security monitoring. In particular, there are a variety possibility of stealing the private data by the intruders
of big data technologies in the market like the to pretend to be authorized personnel [2]. The privacy
Hadoop map reduce that includes Pig, Hive, Mahout, issues related to data mining are viewed from a wider
and RHadoop. There are also analysis tools which perspective and investigate various approaches that
work faster with an unprecedented scale when it can help to protect sensitive information [16].
apply to large scale heterogeneous datasets are like
mining of stream, processing of complex event and
NoSQL databases. These are the only technologies
that facilitate the storage, maintenance, and analysis D. Data Provenance Problem
of security information. However, the examples of The term data provenance refers to the
efficient data process for security are dependent on origin of data i.e. when and where the data was first
MapReduce that is used by the WINE and Bot- generated and the location of that data [16]. Data
Cloud2. provenance is one of the problems in big data
because data is coming from so many sources which
B. Network forensics some may be trust worth and some may not. Data
provenance causes problems related to
Network forensics is a system that continuously trustworthiness of data. One example might be old
monitoring the network traffic in order to uncover the articles or news about a company may be posted by
unusual behavior in the normal traffic or to discover the big data analytical tools which cause to fall down
the abnormal pattern in the network traffic that may of shares of that company. So we must consider the
give indication towards some kind of network attack authenticated and trustworthy data when analyzing
[7]. Network forensics is an investigation process that the data using big data tools. Integrity and
presents handling of large amount of network traffic authenticity are the two parameters that we have to
and analyzes the huge volume of data in the consider when we want to analyze the data. We must
internetwork environment. There are also severe implement statistical and machine learning

2016 2nd International Conference on Contemporary Computing and Informatics (ic3i) 205
algorithms to examine the malicious data [2]. Big data refers to huge and heterogeneous data where
Researchers have to develop some innovative traditional database management system cannot
techniques to overcome data provenance problem [5]. process. In Big data the data is gathered in real-time
Though there are problems with data provenance it is from different sources like IoT devices, facebook,
having its own advantages like we can keep track of twitter etc for analysis. Big data tools are used to
data from origin of the data to analysis of data. More analyze huge amount of data in short time to extract
exploration is need in data provenance and interesting patterns. Big data is implementing by
visualization techniques will help to easily identify organizations to gain insights into the patterns present
the data provenance problems [17]. in the data in order to increase revenue, to overcome
competitiveness and providing better customer
III. SECURITY ISSUES OF BIG DATA satisfaction [3]. Big data mining helps decision
makers to gain insights of data which will help in
Every day the world is generating zetta bytes of data. viewing future opportunities in business [4].
The data that is generating by world in this 2 years is
equal to 90% of data generated earlier. Big data can
be viewed in terms of four dimensions. They are B. Use of Security to Protect Big Data
Volume, Velocity, Variety, and Veracity. Analytics is
the science of processing large volumes of data with Protecting the data is one of the important task
the help of mining algorithms and high end to be performed by any organization. When talking
performance computing devices and generates results about the big data there must be sophisticated
within no time or taking considerable amount of time. mechanisms to protect the data along with analysis of
Security issues is one of the major concerns when we data. The security breaches may be big for big data as
are dealing with Big Data. Some of the security the data is coming from different sources. Different
issues in big data includes analyzing the data in real solutions for securing the big data is security at
time, storage of transaction logs and data in a secured application level using API, at the level of columns in
manner, access controls at granular level and data databases and at the level of file system, account
provenance. monitoring and maintaining log analysis. Log
analysis can be performed from multiple sources with
There is a need to process the vast and variety of the help of intrusion detection and intrusion
data with a lighting speed as the data is coming with preventive systems. Some of the encryption methods
huge velocity. The analysts must use various methods includes transparent encryption and application
to find the patterns and store the data securely. The encryption. Transparent encryption controls access at
security mechanisms includes encryption and the file level. Application encryption encrypts
decryption of data and must be part of analysis as the specific columns in an application before it writes the
unethical bigdata experts may analyze or gather the fields to a database.
data without users notice. Big Data is changing the
landscape of the security technologies for network Some vendors offer big data encryption capabilities
monitoring and forensics. New engineers must be at the level of nodes but not at internal levels of data.
educated in knowing the value of privacy issues and Big data poses serious threats without right
to develop the tools according to the commonly encryption and right security. The encryption
agreed privacy guidelines[2]. techniques even do not protect the log files due to the
intelligence of the attacker beyond the limitation of
security. Further, the IT experts have to contend the
A. Use of Big Data Analytics for Security key fragment, policy and administration to apply
standards consistently to improve the performance in
Encrption and decryption must be part of analysis as big data environment [5].
the unethical big data experts my analyze or gather
the data without user or organization notice. We can
use some of the encryption algorithms like blow C. Visual Analytics
fish,Rijndeal 3DES algorithms for encryption. In big
data analytics we can explore the data and gain new
insight knowledge using advanced data mining Visualization is one of the most powerful
techniques. Hadoop is one of the tool used to process representations of data [2]. It helps humans in
the data and we need tools to improve the security of viewing the data in the form of graphs, images,
systems. Traditional security mechanisms are not piecharts etc. Even though considerable work was
sufficient to examine the huge amount of data [2]. done visualizing the big data is one of the challenges

206 2016 2nd International Conference on Contemporary Computing and Informatics (ic3i)
researchers have to focus. Visual analytics is not class labels. The basic assumption made in Naïve
about representing simply the data but providing an Bayesian classification is that each attribute is
environment to visualize and tools to analyze the data independent of other attribute. This kind of
which will help to gain insights into the data. assumption we call as class conditional
Dynamic data visualization is one the area which is independence. This assumptions help in retrieving the
becoming popular in today’s world. There are many results in a effective and fast manner. This Naïve
software which provides visualization for processing Bayesian classifier is used when we want to predict
of big data like tableau, d3.js, timeline etc. in order to the class labels probabilistically. The naïve Bayesian
make overviews, summaries and drill down to a level classifier is discussed below.
where we can extract patterns and correlation from
the data sets [6]. Visual Analytics is the combination Consider D be the data set which consists of
of big data analytics and interactive visualization tuples. Let T be a tuple in the Dataset D, and in naïve
techniques. The main challenge of visual analytics is Bayesian classification T is treated as “evidence”.
to embed or to support the big data to represent the We consider T belongs to a class C using some
data. The Visual Analytics must help in visualizing hypothesis S. According to Bayes Theorem
the data and support the different views of big data
analysis. Application of visual analytics includes Probability(S|T)= Probility(T|S) * probability(S) /
early fraud detection in usage of credit card, weather probability(T).
monitoring, network analysis and forensic analysis.
Where Probability(T) is the prior probability of T.
Probability(S) is the prior probability of S.
The data analyst may put complex analytical queries Probability(T|S) is the post probability of T
which will initiate the underlying analytical conditioned on S. Here we have to find the post
algorithms that must be supported by the machine probility of S conditioned on T i.e. Probability(S|T).
learning and statistics to get accurate results from In Naïve Bayesian Classifier, the classifier estimates
larger datasets. The results can be visualized through that T belongs to a class having highest posterior
graphical presentations and interactive user interfaces probability. It means that a tuple T belongs to class C
for further manipulations [9]. if and only if Probability(Ci|T) > probability (Cj|T).

IV. BAYESIAN CLASSIFICATION MINING


V. CONCLUSION
Forensic profiling is the technique used to develop We propose to develop a system using big
information and helps in deriving the evidence with data analytics technique that can provide an analyzed
the help of data mining. The forensic profiling information for forensic experts and reduce the time
patterns will be derived or generated from different and cost of forensic analysis because not all the
types of data sources. Applying data mining information captured or recorded can be useful for
techniques on storage devices of portable type is not analysis or evidence. The study undertakes more
focused heavily by the researchers [12]. The emphasis on advanced data mining tools and
extraction of historical data can be drawn from techniques relating with big data analytics. In future
SCADA (Supervisory Control And Data Acquisition) the analysis of particular network traffic will be
systems is one of the area the where we combine the carried out in order to maintain sufficient information
data mining and network forensics. We propose the as evidence against an unusual event using big data
Naïve Bayesian Classifier to predict the class level of analytics.
network flows under the category of network
attacks[13,16]. Decision trees, regression methods
are the algorithms used for SCADA systems. But the REFERENCES
similarity based machine learning techniques are
more robust and can be implemented using sparse [1] “IBM What Is Big Data: Bring Big Data to the
matrix [18]. Recent architectures also incorporate Enterprise,” http://
mechanisms for monitoring process behavior, www-01.ibm.com/software/data/bigdata/, IBM, 2012.
analyzing trends, and optimizing plant performance [2] Alvaro A. Cárdenas, Pratyusa K. Manadhata,
[14]. Sreeranga P. Rajan, Big Data Analytics for Security,
IEEE Security & Privacy, 1540-7993/13/$31.00 ©
Naïve Bayesian classification is a supervised 2013 IEEE, pp.74-76.
learning algorithm which will help in predicting the

2016 2nd International Conference on Contemporary Computing and Informatics (ic3i) 207
[3] A. Katal, M. Wazid, R. Goudar. “Big data Issues, International Conference on Advanced
challenges, tools and good practices”, in the sixth Communication Technology (ICACT), DOI:
international conference on contemporary computing, 10.1109/ICACT.2016.7423270, Jan. 31 2016-Feb. 3
Aug. 2013, pp. 404-409. 2016, pp. 50 – 55.
[4] D. F. Nettleton, commercial data mining: [11] Sherif Saad, Issa Traore, Method ontology for
processing analysis and modeling for predictive intelligent network forensics analysis, Eighth IEEE
analytics projects. 1st ed. Boston, United states, Annual International Conference on Privacy Security
Morgan Kauffman Publishers-Elsevier, 2014. and Trust (PST), DOI: 10.1109/PST.2010.5593235,
[5] Domenico Talia, Clouds For Scalable Big Data 17-19 Aug. 2010, pp. 7 – 14.
Analytics, IEEE Computer Society, 0018-9162/13, [12] V.H. Bhat, “A Novel Data Generation Approach
2013, pp.98-101. for Digital Forensic Application in Data Mining,”
[6] R. Nambiar, R. Bharadwaj, A. Sethi and R. Proc. 2nd Int’l Conf. on Machine Learning and
Vargheese.“ a look at challenge and opportunities in Computing (ICMLC 10), IEEE, 2010, pp. 86-90.
big data analytics in health care”, in IEEE [13] F. Camastra, A. Ciaramella, and A. Staiano,
international conference in Big Data, Oct. 2013, pp. “Machine Learning and Soft Computing for ICT
17-22. Security: An Overview of Current Trends,” J.
[7] Khoa Nguyen, Dat Tran , Wanli Ma, Dharmendra Ambient Intelligence and Humanized Computing,
Sharma, An approach to detect network attacks Oct. 2011; doi:10.1007/s12652-011-0073-z.
applied for network forensics, IEEE 11th [14] T. Kilpatrick et al., “An Architecture for
International Conference on Fuzzy Systems and SCADA Network Forensics,” Proc. IFIP Int’l Conf.
Knowledge Discovery (FSKD), Digital Forensics (IFIP 06), Nat’l Center for Forensic
DOI:10.1109/FSKD.2014.6980912 19-21 Aug. 2014, Science, 2006, pp. 273-285.
pp. 655 – 660. [15] Xindong Wu, Xingquan Zhu, Gong-Qing Wu,
[8] E. J. Palomo, J. North ; D. Elizondo ; R. M. Wei Ding, Data Mining with Big Data, IEEE
Luque, Visualisation of network forensics traffic data TRANSACTIONS ON KNOWLEDGE AND DATA
with a self-organising map for qualitative features, ENGINEERING, VOL. 26, NO. 1, JANUARY 2014,
IEEE International Joint Conference on Neural pp. 97-107.
Networks (IJCNN), The 2011, DOI: [16] Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan,
10.1109/IJCNN.2011.6033434, July 31 -Aug. 5 2011, Yong Ren, Information Security In Big Data: Privacy
pp. 1740 – 1747. And Data Mining, DOI
[9] D. Bruschi, M. Monga ; E. Rosti, Trusted Internet 10.1109/Access.2014.2362522, IEEE ACCESS,
forensics: design of a network forensics appliance, October 20, 2014, Pp. 1149-1176.
IEEE Workshop of the 1st International Conference [17] Y. L. Simmhan, B. Plale, and D. Gannon, A
on Security and Privacy for Emerging Areas in survey of data provenance in e-science,'' ACM
Communication Networks, 2005, DOI: Sigmod Rec., vol. 34, no. 3, 2005, pp. 31-36.
10.1109/SECCMW.2005.1588292, 5th -9th Sept. [18] Dorit S. Hochbaum, Philipp Baumann, Sparse
2005, pp. 33 – 35. Computation for Large-Scale Data Mining, IEEE
[10] Yangseo Choi, Joo-Young Lee ; Sunoh Choi ; TRANSACTIONS ON BIG DATA, VOL. 2, NO. 2,
Jong-Hyun Kim, Introduction to a network forensics
APRIL-JUNE 2016, pp. 151-174.
system for cyber incidents analysis, 18th IEEE

208 2016 2nd International Conference on Contemporary Computing and Informatics (ic3i)

Potrebbero piacerti anche